Is OLAP Dead?

Poor ol’ OLAP. Like the snazzy flip phone when the iPhone was introduced, OLAP is being passed over for what’s sexy and new. Next generation visualization tools and fast technologies like columnar and in-memory databases are causing people to question the relevancy of the OLAP cube.

But is OLAP really dead?

This webinar recording explores various facets of OLAP to answer this question and discusses real-life client scenarios for a pragmatic perspective.

We look at

Traditional OLAP and its role and limitations in modern BI
The pros and cons of the big three current OLAP technologies — IBM Cognos Dynamic and Transformer Cubes and Microsoft SSAS – as well as TMI and Hyperion Essbase
OLAP vs new generation visualization tools such as Tableau, Qlik, Cognos Analytics (v11)
OLAP vs columnar and in-memory databases
Vendors and open source solutions that are creating new scalable OLAP for big data

Presenter

Pedro Ining
Senior BI Architect
Senturus, Inc.

Pedro Ining joined Senturus in 2010 and brings over 20 years of BI and data warehousing experience to his role. Before joining Senturus, Pedro held technical roles in data and BI architecture as well as leadership management roles in companies such as Visa Inc., LeapFrog Toys, Chevron Inc., AT&T and the University of California at Berkeley.

Pedro has been instrumental in implementing data warehousing systems from scratch and has experienced the evolution of the BI industry through several iterations of BI products like Cognos, MicroStrategy and Tableau.

Presentation outline

OLAP, OLTP, DWH Defined

OLTP – Online Transaction Processing
- Core data repository of data flowing into business systems (ERP)
- Optimized for quick data entry and relational integrity
- Not optimized for reporting and data analysis
- Typically very complex schema design with many normalized tables that facilitate high volume throughput of transactions
DWH – Data Warehouse
- Central repository of multiple ERP systems or other data sources
- The data warehouse (DWH) is a database system separate from the OLTP
- Architected in an optimized fashion for easy reporting and analysis
- Organizes and hides the complexity of the OLTP systems OLAP – Online Analytical Processing
- Edgar F. Codd father of relational database coined the term OLAP, Arbor/Essbase went on to market the term
- The data warehouse plus a central repository that defines the relationships between tables (facts/dimensions) and stores complex business rules/calculations (e.g. YTD, YTD LY, Margins, Inventory Turns, etc.)
- Allows for high performing interactive analysis
- Generally referred to as cubes for efficient, timely and accurate reporting
Modern BI Architecture
Typical Best Practices BI System with OLAP Layer
Why OLAP
- Historically, size and speed limitations of databases limited query performance
- Central repository for relationships and complex business calculations
- Buffers the business user from complex native database structures and sensitive calculation logic
- Cubes generally have higher performance vs. relational queries
- Fast, simple, drag-and-drop ad-hoc analysis and reporting
  - Self-service with guardrails
- Visual exploration
  - Multi-dimensional view of data
  - Drill-down on hierarchies
- Many business users love the interface and are used to querying by governed data dimensions and measures that are prebuilt for them
OLAP Familiar Interfaces
Why Not OLAP
- Massive increase in data volumes
  - Latency – large cubes increase cube build times, impacting SLAs
  - Large cardinality dimensions and many dimensions
  - Real-time updates are difficult if not impossible
- Movement of data into another proprietary structure
- Upfront investment in cube modeling
  - Measures, dimensions, hierarchies all defined upfront
  - Not a flexible agile BI environment
  - New cube builds and designs are required as the business changes
- Continued developer maintenance and administration
- CPU power, memory and powerful servers are very affordable – do we still need the OLAP layer

The Current State of OLAP Architecture

Traditional OLAP Architectures
- MOLAP – Multi Dimension OLAP
  - Most traditional OLAP design
  - Data is stored in the multidimensional cube
  - Data is moved from the relational database to the cube
  - Data is pre-aggregated and allows for very fast analysis
- ROLAP – Relational OLAP
  - Modeled on top of the relational star schema database
  - Data storage is kept in the relational database
  - Utilizes SQL to query the DB in an OLAP manner
  - May use proprietary in-memory caching techniques
- HOLAP – Hybrid OLAP
  - Combines the advantages of MOLAP and ROLAP
  - Stores summary data in MOLAP structure
  - Can drill-through to relational database for more detail
Top OLAP Products
- For Dimensional BI Uses
  - IBM Cognos Transformer Cubes (MOLAP)
  - Microsoft SQL Server Analysis Services (SSAS)
    - Dimensional and Tabular (MOLAP/HOLAP)
  - IBM Cognos Dynamic Cubes (ROLAP)
  - MicroStrategy (ROLAP)
- Typically For Finance Use
  - IBM Cognos TM1 (writeback)
  - Hyperion Essbase (writeback)
Cognos PowerPlay (Transformer)
- Advantages
  - Performance (vs. relational)
  - Easy to use and develop
  - ETL-like capabilities (limited) – i.e. no star schema needed
  - Can act as meta-data layer
  - Great relative-time calc capabilities (YTD, Rolling 13 months)
  - Less intensive hardware requirement
- Challenges
  - Significant cube size limitations
  - Limited categories per dimension level
  - Cube builds take time and cubes exist as separate files (.mdc)
  - Lacks capabilities now available in other OLAP tools
  - Row-level (dimensional) security is very challenging to maintain
  - Unclear product support going forward
  - Only works in the IBM Cognos stack
- IBM Cognos Dynamic Cubes
  - IBM Cognos Dynamic Cubes was added to the Cognos 10.2 BI suite as an in-memory relational OLAP product that could address the challenge of high-performance/low latency interactive analysis against terabytes of data
  - The last significant update to Dynamic Cubes occurred in version 10.2.2; IBM has since focused most development efforts on the Cognos Analytics v11 release
  - No current plans for IBM to enhance the Dynamic Cubes product
  - Advantages
    - Scalability – limited only by database and RAM cache sizing
    - Handles large dimensions well, allows dimension attributes
    - Built-in relative time calcs on par with Transformer
    - MDX scripting – can set up just about any type of calculation
    - Dynamic security – can set up dimensional filtering so that all security is derived from SQL tables
    - Aggregate aware – can dynamically select database aggregate tables or in-memory aggregates for fast results
  - Challenges
    - Requires star or snowflake schema as data source
    - Cache needs to be warmed for decent performance
    - Requires 64-bit application server and may require significant memory footprint for large cubes (e.g. 64-128GB)
    - Report authors require dimensional reporting experience
    - CAN ONLY BE USED BY COGNOS BI STACK (Senturus has developed the Analytics Connector to access Tableau)
- Dynamic Cubes in Play
  - Large Health Insurance Provider Deployed Dynamic Cubes
    - Finance project used IBM Cognos Dynamic Cubes to replace legacy Cognos Transformer cubes, went into production Q1 2017
    - Large number of reports were converted or created on top of the Dynamic Cube to provide a guided set of highly formatted reports that allowed drill-down
    - Many complex business calculations were developed and stored in the cube, report writers can leverage a central set of calculations without having to write them in the report
  - Major American Clothing Company
    - Re-architected an older Oracle based data warehouse to a SQL Server
    - User community already very familiar with cube technologies
    - Wanted to use SSAS OLAP cubes for their advanced relative time calcs
    - Ability to create complicated advanced inventory calcs and on the fly currency conversions
    - Ability to set defaults for certain dimensions such as currency type
    - SSAS fits into their corporate strategy for multiple tools
    - SSAS Tabular was chosen for performance and flexibility
- Microsoft SQL Server Analysis Services (SSAS) Tabular Model
  - Introduced in SQL Server 2012
  - Model paradigm = tables and relationships
  - Data stored in-memory
  - Uses a different engine (xVelocity) and uses a columnar DB structure
  - Combines the functionality of MOLAP cubes and relational DBs
  - Advantages
    - Simpler data development model, faster to develop
    - Generally much faster than MOLAP
    - DAX learning curve is easier than MDX
    - Fast COUNT DISTINCT queries
  - Challenges
    - Dependent on server memory footprint (DirectQuery mode available in 2016)
    - Some multidimensional features are not available (e.g. many-to-many)
    - Complex calculations may be difficult to implement
    - Large datasets

New Generation Visualization Tools

Over the last few years desktop visualization tools have sprouted on desktops throughout the enterprise
IBM Cognos Analytics v11 allows similar functionality over a web interface
Rich visualizations are easily created by business users without the help of IT
Decentralized model of data governance
No waiting on developers to create next iteration of an OLAP cube
Allows users to integrate data on the desktop/web
Creation of desktop micromodels (Tableau data extracts)
Can use OLAP data sources, but works best with non-OLAP sources
Can begin to have performance issues when creating large data extracts or going against large data sources
Tableau Data Extracts
- TDE is a compressed snapshot of data stored on disk and loaded into memory as required
- Data engine can be described as its own “in-memory analytic database”
- Stores data in a columnar store structure
- Dramatically reduces the input/output time required to access and aggregate values
- New in upcoming 10.5 – hyper data engine
- Reasons to Use TDEs
  - Better performance vs. connected data sources
  - Reduced load on connected data sources
  - Portability – can be bundled in a packaged workbook for easy sharing
  - Pre-aggregation – option to aggregate data for visible dimensions “aggregated extract”
- Tableau and Cubes
  - Leverages
    - One version of the truth
    - Complex calcs that are already created in the cube
    - Billions of rows response time across pre-aggregated calcs – faster
  - Tableau will work if you stay within the structure of the cube
  - Functional differences
    - No cube extracts (10.4 supports BW cubes)
    - No user-defined hierarchies
    - Aggregations controlled in the cube
  - Supports
    - Oracle Hyperion Essbase
    - Teradata OLAP
    - Microsoft Analysis Services (SSAS)
    - SAP NetWeaver Business Warehouse
    - Microsoft PowerPivot
    - Analytical Views in SAP Hana
- IBM Cognos Analytics v11
  - Cognos Analytics v11 architecture adds data modules which represent a major shift in the central metadata layer (framework) paradigm
  - Data modules allows end users to quickly add new data sources and model new data subjects without having to wait for DWH changes
  - Uploaded files and data sources can be stored as ‘snapshots’ on the server’s file system using the Apache Parquet columnar file storage mechanism
  - Allows for fast query response times

Fast Columnar and In-Memory Databases

Columnar Databases
- Traditional databases store data by each row
- Columnar databases store data in columns rather than in rows
- This storage architecture can result in high-performing queries especially aggregation queries
- Example DBS:
  - Sybase IQ
  - IBM DB2 with BLU Acceleration
    - A capability built into DB2, not a separate install component
    - Focus on analytics
    - Dynamic in-memory, does not require all data to be in-memory
    - Columnar and traditional row-based tables
- SQL Server 2014/16
  - Columnar store indexes
  - In-memory OLTP tables
But If Remove the OLAP Layer
- Raw queries will be fast, but what about the semantic layer?
- You could use relational models with some level of metadata and calculations
- But complex calcs, dimensions, drill downs would be missing

The Current State of OLAP Architecture

Big OLAP on Big Data
- Several new vendors and open source solutions are creating new scalable OLAP on Hadoop products: Kyvos Insights, At Scale, Apache Kylin, Druid
- Slow performing queries on big data implementations are driving new OLAP technologies
- Classic OLAP technologies on big data necessitated movement of Hadoop data into traditional relational data warehouses further increasing latency
- New OLAP technologies are architected to be part of the Hadoop stack and allow queries across Hadoop with no additional movement of data

Summary

Is OLAP Dead?
- The concepts of OLAP – dimensions, measures, hierarchies, drill-down are still alive and well but the technology that surfaces those concepts are changing
- Business users will always want a high performing BI layer that is easy to use and allows for interactive BI
- Some will want a central repository that contains all the relationships, hierarchies and complex business rules already developed
- Other users like data scientists and advanced business analysts will want a more agile free form solution with have high performance

Presenter

Presentation outline

Connect with Senturus