<< VIEW FULL RESOURCE LIBRARY

Demystifying In-Memory Technologies

May 13, 2015

Analytics Demystified, Dashboards Reporting and Visualizations, Data Architecture, Data Preparation

Best Uses and Competitive Advantages for the Enterprise

With technology advances and an increasingly attractive price-performance, in-memory technologies have proliferated from their early days to take center stage on buzzword bingo cards. You may be wondering, what is in-memory, and what can it be used for, particularly for the enterprise?

In this webinar, we address these questions and more, as we discuss the origins and history of in-memory technologies with a review of high-level architectures. We also dive into several primary applications and use cases, including transactional, operational, and analytical, and examine the different ways this technology may benefit the enterprise by providing a competitive advantage.

PRESENTER

Michael Weinhauer
Practice Area Director and Solutions Architect
Senturus, Inc.

PRESENTATION OUTLINE

Current Challenges

  • Setting the Stage
    • Can the speed at which your current reports run best be described asglacial?
    • Are the number and type of data sources in your organization best being described asexploding?
  • Current Challenges: Data Explosion
    • Volume
      • In 2005, mankind created 150 Exabytes of information. In 2011, 1,200 Exabytes will be created (The Economist)
    • Velocity
      • Worldwide digital content will double in 18 months, and every 18 months thereafter (IDC)
    • Variety
      • 80% of enterprise data will be unstructured spanning traditional and non traditional sources (Gartner)
  • Setting the Stage
    • Is the speed of your business processes best communicated using geologic time?
    • Do changes and additions to your EDW require superhuman effort?
    • Are the demands for real-time information in your enterprise skyrocketing?
  • Time Value of Information
    • Varies greatly but time is decreasing and value is potentially unlimited in some cases
    • Latency occurs between data collection, analysis, and decision points
    • There are three dimensions of latency on the continuum between when an event occurs and when an informed action is taken: data latency, analysis latency, and decision/action latency
      • Data latency is the time it takes to collect raw data, prepare it for analysis, and store it where it can be accessed and analyzed. Important functionality here includes data profiling, extraction, validation, cleansing, integration, transformation, delivery, and loading. There are a wide variety of tools and products that address one or more of these aspects of data latency. These tools fall into several categories, including extract, transform, and load (ETL); data replication; enterprise application integration (EAI); enterprise information integration (EII); master data management (MDM); and others. The destination platform of the data is typically a data warehouse or data mart.
      • Analysis latency is the time it takes to access the data, analyze the data, turn the data into information, apply business/exception rules, and generate alerts if appropriate. Analysis may be done by a user or an application.
      • Decision latency is the time it takes to receive an alert, review the analysis, decide what action is required, if any, based on knowledge of the business, and take action.
  • Setting the Stage
    • Have you dreamt of brilliant new ideas that would catapult your company light years ahead of the competition, but can’t suggest them because you’ll sound crazy?
  • The Problem with Hard Drives: Latency = SLOW
    • 10ms for spinning physical disk I/O
    • .2ms for SSD (50x faster)
    • .0001ms for RAM (2000x faster than SSD, 100,000x faster than physical disk)
  • Server Technology State in 1995
    • Existing systems were created using the optimal architecture at the time
    • RAM: small and expensive
    • CPU: smaller, single-core, non hyper-threaded, no cache
    • HDD: smaller, slower
    • Slower LAN/WAN
    • Addressable memory (16-bit/32-bit)
    • In 1995, 16-bit applications were still the norm
  • Currently Implemented Technology State
    • RAM: larger, still relatively small, 64-128GB
    • CPU: quad-core, hyper-threaded
    • HDD: faster, some SSD
    • Clustering: 4-8 node
    • LAN/WAN: gigabit Ethernet, Optical OC-48
    • 32-bit Applications, some 64-bit
  • Current Available Technology State
    • RAM: 12TB, NVRAM
    • CPU: 16 sockets, multi-core, MPP
    • HDD: high capacity SSD
    • Clustering: infinite nodes
    • LAN/WAN: 100 gigabit Ethernet, Optical OC-96
    • 64-bit Applications
  • Trends: Logarithmic Drops in Cost and Increase in Capacity
  • High End Server Configurations

What is In-Memory?

  • In-Memory Defined
  • Traditional Architecture
  • So Why Don’t I Just Get a Box with a Bunch of RAM?
  • In-Memory Architecture
  • Types of In-Memory
    • TM1
      • Cube engine resides in memory, early 64-bit application
      • Duplicates data from source
      • Highly indexed bitmapped array, i.e. not columnar
    • Dynamic Cubes
      • In-memory ROLAP engine, sits on top of RDBMS (DB2)
      • Replicates data from source
    • Qlik, Spotfire, Tableau
      • In-memory, columnar stores, replicate data
    • DB2 BLU, Oracle 12c, MSFT
      • Accelerator
      • Duplicates data into separate store to improve performance
      • RDBMS instance lives in-memory
      • Extend existing RDBMS skills and technology
    • Exalytics, PureData, Teradata
      • Appliance optimized for analytics
      • Combines in-memory with SSD
      • Built-in intelligence for optimizing in-memory
    • SAP Hanna
      • Appliance-like
      • Completely in-memory (OS, Engines, Data)
      • SSD for log files
      • Engines exist natively on the same instance

Use Cases: In-Memory Technology

  • Transactional/Operational
    • Simplify and optimize business processes
      • Batch-oriented or process-intensive areas like MRP
    • New business processes
      • Real-time re-pricing
      • Real-time offers
    • Embed operational reporting into transaction processing
      • Order fulfillment
    • Eliminate/minimize need for ETL
  • Landscape Simplification
    • Eliminate redundant data and hardware
    • Less power, cooling, floor space, manpower, maintenance
    • Easier H/W upgrades
    • Less/simpler ETL
    • Bring the engine(s) to the data
  • Analytical
    • Agile Analytics
    • Real-time
    • Sentiment Analysis
    • Ads
    • Re-pricing
    • Complex Event Processing of Streaming Sensor Data
    • Predictive/Prescriptive
    • Fraud Detection
    • Machine Learning
    • Geospatial
    • Text Analysis
    • Image/Video
  • Simplifying the Stack = Speed and Agility
    • A key advantage of running in-memory and having HW acceleration is that you no longer need data marts, aggregates and indices.
    • You can create logical representations of these in memory using views because you are running 1 million times faster than disk.
    • Also, because you are using columnar technology, you see 2.5x -5x or more compression. This compresses the size of your DW.
    • This is important because it significantly reduces the total amount of data that requires managing.
    • In some cases, where the engine is embedded in the server, as in the appliance case, you can eliminate another layer.
    • Finally, we reduce latency by enabling real-time replication of information into super fast analytical appliance. You don’t need an operational data store anymore.
  • Operations and Analytics Together
    • Use a single environment for both analytics and applications
  • Proof Point: Step-by-Step Process Today: Demand Forecasting (APO-DP Variant)
  • How Processes Change with In-Memory: Demand Forecasting (APO-DP Variant)
  • Success Stories
    • MKI: sequences DNA from biopsies, delivers targeted treatment regimen in 20 minutes, down from 2-3 days
    • Citi – Foreign Exchange: 100ms delay costs $1m
    • Google: half second delay in search results in 20% traffic drop = lost revenue

Implications, Considerations, Predictions, Musings

  • Time Value of Information
    • We mentioned earlier that the value of information declines rapidly over time. This is true, until you need it again- for an audit, or historical analysis – whatever the reason. Because of the cost of in-memory, you will likely want to segment your data to keep only the most used data in memory, while placing other data in more cost-effective stores.
  •  Data Temperature
    • Transparent query processing
    • Cross-store optimizer
    • Data for immediate use (daily, hourly etc. reporting) direct from source/stream
    • Pareto principle: 20% of data falls into this category
    • High potential candidate for in-memory or flash/SSD
    • Data for immediate use: daily, hourly etc. reporting
    • 20% of data falls into this category
    • Good candidate for columnar store
    • Infrequently used data
    • Archive storage
    • Good candidate for Hadoop, traditional RDBMS, tape or other offline
  • Data Temperature Analysis: Example
    • Captured by analysis of database statistics
    • Can also be captured via audit reports from BI tools
    • Business impact analysis
  • Not All Technologies are Created Equal
    • Who is how columnar?
    • Comparison of Vendor/Product and Columnar Maturity
      • Teradata Database: 2
      • Oracle Exadata: 1
      • SAP HANA: 3
      • Pivotal Greenplum/HAWQ: 2
      • IBM DB2 BLU: 3
      • Microsoft SQL Server xVelocity: 2
      • HP Vertica: 3
      • Actian Paraccel: 3
      • IBM Netezza: n/a
      • SAP Sybase: IQ, 3
      • Infobright: 1
      • Vectorwise: 1+
      • Columnar Maturity Key
        • Level 1 Columnar: Uses PAX to achieve columnar compression. No columnar projection provided. No columnar engine provided. Approximate 4X performance advantage over row store for read queries (10X column compression versus 2.5X row compression).
        • Level 2 Columnar: Uses columnar compression and projection. No columnar engine provided. Approximate 10X advantage over Level 1 read queries (10% of the columns are selected).
        • Level 3 Columnar: Uses columnar compression and projection… and includes a columnar engine that optimizes processing. Approximate 50X advantage over Level 2 read queries (Vector processing – 20X, SIMD – 8X, Fewer CPU Stalls – 2X, Cache Utilization – 10X, in-memory compression + projection 20X in differing combinations for each query)
    • Who is how parallel?

  • Cost Considerations & Comparisons
    • SAP HANA
    • Microsoft SQL
    • Oracle TimesTen
    • IBM DB2 BLUE
    • Teradata
    • Tableau
    • Qlik
    • Spotfire
    • TM1
    • Dynamic Cubes
    • Hadoop
  • Implications, Predictions, Considerations, Musings
    • For the right business problem, in-memory can truly be a game changer
      • New competitive differentiation through the implementation of new/better business processes
      • Real-time applications combining analytics and transactional at the point of impact
    • In-memory technology is now mainstream and proven, and will move increasingly to de facto status
    • As with any technology, the business value and use case should drive the adoption of a specific technology
    • While expensive, given competition and simultaneous growth in capacity and precipitous price drops, in-memory technology will become increasingly attractive for a growing number of use cases
    • In-memory is so fast - don't POC for incremental speed - we're talking orders of magnitude
    • Mobile/social data can be processed fast enough in-memory then be persisted as needed
  • Senturus Recommendations
    • While powerful, in-memory will add the most benefit to properly architected applications
    • Can be expensive enablers of bad habits otherwise
    • Use the right tool for the right job
    • Don’t just speed up bad decisions or accelerate faulty processes

OPEN