By: Justin Harrigan, Account Executive, Dasher Technologies and Ryan Betts, CTO, VoltDB

One of the reasons our clients value working with Dasher is that fact that we are constantly looking for the next innovation in the IT industry, meeting with companies and understanding their technologies so we can be a better partner to our clients.  As we continue to investigate solutions that address the challenge of Big Data analytics we have decided to invest our time and energy with a new partner – VoltDB, who we believe brings real quantifiable business value to your organization.  This joint blog post is a testament to our new partnership.

VoltDB is an in-memory relational database, a real-time analytics engine and stream processing in an integrated platform. The application offers companies a better way to analyze their proliferating data to make smarter business decisions.  Read on to learn how VoltDB is solving Big Data management challenges while maintaining high speed and performance.

How To Solve Big Data Management Without Sacrificing Speed
VoltDB

 Solving data management problems at scale requires applying the right tool for the right job. VoltDB is specializing at making hard solutions easier. When thinking about large scale data, it is important to realize that the full stack has three major components – each with its own responsibilities and unique capabilities:

  1.   Fast Data (modern OLTP: ingest, analyze, decide and export )
  2.   Big Data Analytics (columnar OLAP: efficient historical analytics and reporting fast)
  3.   Data Lake (inexpensive storage of vast data enabling exploration and data science).

The architectures that enable fast data are not suitable to big data analytics (and v.v.).  By integrating these systems into a data pipeline, architects enable the complete big data pipeline: capture, explore, analyze and act.

Addressing the 3 problems under the “Big Data” umbrella…. The Data Pipe

Fast Data
Generating value from “big data” requires the ability to use the insights and knowledge gained from the analytic side when processing new data. Incoming events represent opportunities to personalize customer experience, offer up-sell or cross-sell enticements, alert on fraud or security risk, and optimize shared resources utilization. All of these activities need to combine historical models (baselines, seasonality, user segmentation) with real-time context.  This set of activities are what we refer to as “fast data management.”

Building applications on top of fast data requires a system that can:

  • Ingest / interact with the data feed
  • Make decisions on each event in the feed
  • Provide visibility into fast-moving data with real-time analytics
  • Seamlessly integrate into the systems designed to store Big Data
  • Ability to serve analytic results and knowledge from the Big Data systems quickly to users and applications, closing the data loop.

VoltDB delivers these capabilities through a distributed, SQL, transactional (ACID), relational database that can process thousands to millions of incoming events (or application requests) per second.

Three Application Patterns used by VoltDB Clients:

  1. Use VoltDB’s fast transactions to build high throughput request / response applications that need low-latency database decisions (authorization, recommendation, personalization use cases) at high throughput.
  2. Use VoltDB’s processing speed + OLAP connector features to implement real time streaming transformation of incoming streams (assemble discrete events into single logic sessions, filter redundant data, interpolate missing results, enrich incoming data with dimension data and metadata in real time.
  3. Use VoltDB’s in-memory SQL engine to implement real time analytics for operational monitoring and query-scaling against analytic results.

Smart (Analytic) Data
As data ages from its zero day generation, it quickly loses value as a single data point but it gains value as a relational data point when coupled with a larger dataset for analytics workloads.  This is why Dasher advocates the right tool for the right point in time.  VoltDB accomplishes the data ingest task and is designed to bulk offload data to an analytics platform after the initial single data point value has been realized.   Relational analytics at scale is the next step in the process.  This is where value is derived from massive datasets in real time and used to return a decision or value to the end user.

Building a relational analytics database requires the following:

  •  Analytics at scale of 100’s of TB’s
  • SQL workload for relational insight
  • Resilient node failure (K+X) for uptime
  • Efficient online capacity scalability
  • ACID compliance
  • Data analytics on specific datasets, not full data store
  • Realtime insights for decision making

One Example of a platform that fits this portion of the datapipe is HP Vertica.  To learn more about the Vertica platform and its application, you can read Dasher’s previous blog post.

Data Lake
Often overlooked as an important part of the data pipe is the Data Lake.  Most look at this part of the solution as a scalable storage platform, which it is, but what is not weighted as heavily in value is the ability of the data lake for exploratory analytics.  A data lake should be able to analyze all data in the system for new insights and relations.  A well designed data pipe will integrate the fast data solution, analytics solution, and data lake as a seamless analytics platform where data derived from any of the three solutions can be fed back to the user.

When integrating into a SQL analytics platform, a data lake should have the following:

  • Scalability to handle 10’s of PB’s
  • SQL integration to run exploratory analytics
  • Low cost per GB so data retention is not fiscally constrained
  • Single domain name for large datasets to reduce application complexity

What we have learned in the past few years is that there is no one solution fits all approach for scalable analytics.  While one solution may handle all workloads at small scale, the data pipe needs task specific tools to handle targeted workloads.  Dasher and VoltDB believe this is the best way to build a data pipe that scales proportional without bottlenecks.

If you are interested in learning about VoltDB products and solutions and how Dasher can help, contact us.

More Resources from VoltDB:
Why VoltDB
Video: Why VoltDB is 100 times faster
More videos