Freed from the constraints of storage, network and memory, many big data analytics systems now are routinely revealing themselves to be compute bound. To compensate, big data analytic systems often result in wide horizontal sprawl (300-node Spark or NoSQL clusters are not unusual!)— to bring in enough compute for the task at hand. High system complexity and crushing operational costs often result. As the world shifts from physical to virtual assets and methods of engagement, there is an increasing need for systems of intelligence to live alongside the more traditional systems of record and systems of analysis. New approaches to data processing are required to support the real-time processing of data required to drive these systems of intelligence.
Join 451 Research and Kinetica to learn:
•An overview of the business and technical trends driving widespread interest in real-time analytics
•Why systems of analysis need to be transformed and augmented with systems of intelligence bringing new approaches to data processing
•How a new class of solution—a GPU-accelerated, scale out, in-memory database–can bring you orders of magnitude more compute power, significantly smaller hardware footprint, and unrivaled analytic capabilities.
•Hear how other companies in a variety of industries, such as financial services, entertainment, pharmaceutical, and oil and gas, benefit from augmenting their legacy systems with a modern analytics database.
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
1. Powering real-time big data analytics
with a next-gen GPU database
November 1, 2017
Matt Aslett
Research Director, Data Platforms
and Analytics Channel
451 Research
Dipti Borkar
Vice President, Product Marketing
Kinetica
2. Housekeeping Items
2
Questions?
A copy of the presentation will be
provided to all attendeesPresentation Slides
Feedback
To ask a question, click on the
question button
Don’t forget to leave feedback
at the end of the webinar
3. Today’s speakers
3
Matt Aslett
Research Director, Data Platforms and Analytics Channel, 451 Research
Matt has overall responsibility for the data platforms and analytics research coverage, which includes operational and
analytic databases, Hadoop, grid/cache, stream processing, search-based data platforms, data integration, data quality,
data management, analytics, machine learning and advanced analytics. Matt's own primary area of focus includes data
management, reporting and analytics, and exploring how the various data platforms and analytics technology sectors
are converging in the form of next-generation data platforms.
Dipti Borkar
Vice President, Product Marketing, Kinetica
Dipti has over 15 years experience in database technology across relational and non-relational databases. Prior to
Kinetica, Dipti was Vice President of Product Marketing at Couchbase and held several leadership positions there
including Head of Global Technical Sales and Head of Product Management.
Earlier in her career Dipti was a part of the product team at MarkLogic and managed development teams at IBM DB2
where she started her career as a database software engineer. Dipti holds a Masters degree in Computer Science from
the University of California, San Diego with a specialization in databases, and an MBA from the Haas School of
Business at University of California, Berkeley.
4. Powering real-time big data analytics
with a next-gen GPU database
Matt Aslett
Research Director, Data Platforms & Analytics
5. 451 Research is a leading IT research & advisory company
5
Founded in 2000
300+ employees, including over 120 analysts
2,000+ clients: Technology & Service providers, corporate
advisory, finance, professional services, and IT decision makers
70,000+ IT professionals, business users and consumers in our research
community
Over 52 million data points published each quarter and 4,500+ reports
published each year
3,000+ technology & service providers under coverage
451 Research and its sister company, Uptime Institute, are the two divisions
of The 451 Group
Headquartered in New York City, with offices in London, Boston, San
Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia,
Taiwan, Singapore and Malaysia
Research & Data
Advisory
Events
Go 2 Market
7. Big data and beyond
7
• V is for various things…
but does not define big data
• To understand the trends driving ‘big
data 451 Research focused beyond the
nature of the data on what enterprises
wanted to do with it
8. Big data and beyond
8
• V is for various things…
but does not define big data
• To understand the trends driving ‘big
data 451 Research focused beyond the
nature of the data on what enterprises
wanted to do with it
• Totality – storing and processing all data (or as much as is economically viable
• Exploration – schema-free approaches to analyzing data to identify new
patterns
• Frequency – more frequent analysis of data to enable real-time decision
making
14. Emergence of GPU databases
▪ Potential customers that are doing deep
learning and more advanced analytics on
HPC systems that leverage GPU
processors
▪ Data scientists or other specialists need
to pull data from a system of record and
load it into an HPC system to perform the
analytics leveraging certain algorithms.
14
15. 15
Emergence of GPU databases
• While HPC systems are well equipped to
handle advanced analytics because they
leverage GPUs, there is also a price to be
paid as it requires moving the data from
one system to the other.
• GPU databases open up the door for
machine learning, deep learning and
other advanced analytical workloads to
be run alongside BI workloads, within the
same environment.
16. CPUs and GPUs
• A CPU is a very good general processor,
handling a variety of complex tasks well.
• A GPU, is more specialized and can do
certain tasks extremely well.
• CPUs consist of multiple cores
• GPUs consist of thousands of cores
• CPUs geared for serial operations
• GPUs geared for parallel operations
▪ Can be paired together for the greatest overall optimization 16
22. Powering real-time big data analytics with a
next-gen GPU database
Dipti Borkar| VP, product Marketing| dborkar@kinetica.com
23. Company
80+, enterprise and startup expertise
Awards Customers and Partners
Investors
$50m Series A June 2017
Ray Lane
Company| Summary
2014
2016
23
24. Advances in Big Data Processing
DATA WAREHOUSE
RDBMS & Data Warehouse
technologies enable
organizations to store and
analyze growing volumes of data
on high performance machines,
but at high cost.
DISTRIBUTED STORAGE
Hadoop and MapReduce
enables distributed storage and
processing across multiple
machines.
Storing massive volumes of data
becomes more affordable, but
performance is slow
AFFORDABLE MEMORY
Affordable memory allows for
faster data read and write.
HANA, MemSQL, & Exadata
provide faster analytics.
1990 - 2000’s 2005… 2010… 2017…
AT SCALE PROCESSING
BECOMES THE
BOTTLENECK
GPU ACCELERATED COMPUTE
GPU cores bulk process tasks in
parallel - far more efficient for many
data-intensive tasks than CPUs
which process those tasks linearly.
24
25. GPU | Tale of Numbers
100x
75%
Performance
>100x gains over traditional
RDBMS / NoSQL / In-Mem
Databases
Cores
Modern GPUs can consist of
up to 3000+ cores compared
to 32 in a CPU
Costs
75% reduction in
infrastructure costs, licensing,
staff, etc.
More with Less
Increase performance,
throughput, capability while
minimizing the costs to
support the business
GPUs are designed around thousands of small, efficient cores that
are well suited to performing repeated similar instructions in
parallel – making them ideal for the compute-intensive workloads
required of large data sets.
Performance Increase
Infrastructure Cost Savings
4000vs.
32
25
26. Kinetica: Core
26
ANALYTICS DATABASE ACCELERATED BY GPUs
KINETICA
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
GPU Accelerated
Columnar In-memory Database
HTTP Head Node
Columnar in-memory database
Data available much like a traditional RDBMS… rows,
columns
Data held in-memory; persisted to disk
Interact with Kinetica through its native REST API,
Java, Python, JavaScript, NodeJS, C++, SQL, etc… as
well as with various connectors
Native GIS & IP address object support
VERY FAST: Ideal for OLAP workloads
Typical hardware setup: 256GB - 1TB
memory with 2-4 GPUs per node.
27. Kinetica Architecture
27
ETL / STREAM
PROCESSING
ON DEMAND SCALE OUT +
1TB MEM / 2 GPU CARDS
SQL
Native
APIs
PARALLELINGEST
Geospatial
WMS
Custom
Connectors
In-Database Processing
CUSTOM
LOGIC BIDMach
ML
Libs
BI DASHBOARDS
BI / GIS / APPS
CUSTOM APPS
& GEOSPATIAL
KINETICA ‘REVEAL’
STREAMINGDATAERP/CRM/
TRANSACTIONALDATA
UDFs
28. The Kinetica cluster architecture
VISUALIZATION via ODBC/JDBCAPIs
Java API
JavaScript API
REST API
C++ API
Node.js API
Python API
OPEN SOURCE
INTEGRATION
Apache NiFi
Apache Kafka
Apache Spark
Apache Storm
GEOSPATIAL CAPABILITIES
Geometric
Objects
Tracks
Geospatial
Endpoints
WMS
WKT
KINETICA CLUSTER
On-Demand Scale
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar
In-memory
HTTP Server
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar
In-memory
HTTP Server
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar
In-memory
HTTP Server
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar
In-memory
HTTP Server
OTHER
INTEGRATION
Message Queues
ETL Tools
Streaming Tools
28
29. Parallel Ingest Provides High Performance Streaming
29
1 NODE (1TB/2GPU)
PARALLEL
INGEST
1 NODE (1TB/2GPU)
1 NODE (1TB/2GPU)
Each node of the system can share the task of data
ingest, provides more and faster throughput. It can
always be made faster simply by adding more nodes.
30. 50-100x Faster on Queries with Large Datasets
• Large retailer tested complex SQL queries
on 3 years of retail data (150bn rows)
• 10 node Kinetica cluster against 30TB+
cluster from next best alternative
• GPU is able to perform many instructions in
parallel. Huge performance gains on
aggregations, group bys, joins, etc.
• Kinetica sustained ingest of 1.3bn
objects/minute with 70 attributes per row
30
WHEN COMPARED TO LEADING IN-MEMORY ALTERNATIVES
32. Kinetica | Combined Strengths and Capabilities
Supercharge
BI
Taking advantage of the parallel nature
of the GPU, Kinetica delivers low-
latency, high-performance analytics on
large and steaming data sets.
Simultaneously ingest,
explore, analyze, and
visualize data within
milliseconds to make critical
decisions.
User-defined functions (UDFs) allow
for distributed custom compute
directly from within the database.
Easier to work with large
geospatial data sets.
Fast, Distributed
Database Engine
In-Database
Analytics
Native
Geospatial &
Visualization
Pipeline
32
35. FASTER BI WITH A GPU DATABASE
35
Tableau + Kinetica
Kinetica combines GPU’s brute-force compute with the
simplicity of a relational database for millisecond query
response on massive data sets without extensive
tuning.
• Incredibly fast query performance.
• Distributed design - ideal for large and streaming datasets.
• SQL-92 compliant relational database – without limits.
• More power means less need for tuning, indexing, and
administration of the database.
• No need to do pre-aggregation or build out cubes.
• Reduce reliance on specialized skills to prep and set-up
data.
36. 36
Rethink interaction between business analyst & data scientist
SPECIALIZED AI/ DATA
SCIENCE TOOLS
SUBSET
DATA SCIENTISTS
BUSINESS USERS
EXTRACT
EXTRACTING DATA FOR AI IS
EXPENSIVE AND SLOW
ENTERPRISES
STRUGGLE TO MAKE
AI MODELS AVAILABLE
TO BUSINESS
???
• MapReduce
• Spark
• NoSQL DBs
• SQL Databases
• DFS
• CPU Compute Nodes
• GPU Compute Nodes
Proliferation of Hardware &
Software Components
37. Kinetica | The Ideal Process – Consolidate the BI / AI stack
37
Monte Carlo Risk
Custom Function 2
Custom Function 3
API EXPOSES CUSTOM
FUNCTIONS WHICH CAN BE
MADE AVAILABLE TO BUSINESS
USERS
BUSINESS USERS
DATA SCIENTISTS
UDFs
• Analytics
• AI/ML/Deep Learning
• Power of in-memory SQL
• Integrated CPU/GPU
• Bomb with Streams
Single Database Platform for
AI + BI
38. AI & BI on One GPU-Accelerated Database
HIGH PERFORMANCE ANALYTICS
DATABASE
UDF UDF UDF
ODBC
/ JDBC Native
REST API WMS
BUSINESS INTELLIGENCE
CUSTOM APPLICATIONS
HIGH FIDELITY
GEOSPATIAL PIPELINE
MACHINE LEARNING
& DEEP LEARNING GPU-ACCELERATED
DATA SCIENCE
PREDICTIVE MODELS
e.g. Risk Management,
Sales Volume, Fraud.
BIDMach
SQL
DATA SCIENTISTS
/ DEVELOPERS
BUSINESS
USERS
38
39. Distributed Geospatial Pipeline
39
NATIVE VISUALIZATION IS DESIGNED FOR FAST MOVING, LOCATION-BASED DATA
Native Geospatial Object Types
• Points, Shapes, Tracks, Labels
Native Geospatial Functions
• Filters (by area, by series, by geometry, etc.)
• Aggregation (histograms)
• Geofencing - triggers
• Video generation (based on dates/times)
Generate Map Overlay Imagery (via WMS)
• Rasterize points
• Style based on attributes (class-break)
• Heat maps
41. ENTERTAINMENT | Customer 360
41
CASE STUDY : BI ACCELERATION
BUSINESS OBJECTIVE
• Accelerate Tableau dashboards for faster customer 360 analytics
NEW CAPABILITIES DELIVERED
• 24X faster dashboard loads
• 3.5X faster slice and dice, drilldowns, filters
SOLUTION OVERVIEW
• Tableau Server and Kinetica running on Google Cloud Platform
• Kinetica accelerates EDW workload
• Simply point to Kinetica using Tableau’s replace data source feature
42. 42
AD TECH | Real-time reporting & ad delivery
CASE STUDY : REAL-TIME DATA AND ANALYTICS
BUSINESS OBJECTIVE
• Be first to market with game changing technologies that put publishers’
needs first
• Support PubMatic’s real-time campaign reporting
NEW CAPABILITIES DELIVERED
• High-speed ingest, store, and persist data processing capabilities
• Ad-hoc analytics on ad impression and bid data
SOLUTION OVERVIEW
• Kinetica considered as a functional replacement for a 40-node Apache
Apex cluster -> smaller HW footprint
• Hi-speed data ingestion via native Kafka integration
• Python access to Kinetica data store for simplified data science discovery
• Contributed fast data capabilities to long term retention and archive
Hadoop Data Lake
“At PubMatic, we are consistently focused on being early to
market with leading technologies that put publishers’ needs
first. Processing over one trillion ad impressions
monthly, PubMatic provides omni-channel revenue automation
technology for publishers and programmatic tools for media
buyers. Leveraging leading edge data and technology
innovation, Kinetica contributes high-speed ingest, store,
and persist data processing capabilities in support
of PubMatic’s real-time reporting and ad pacing engine.”
- Vasu Cherlopalle, Vice President of Big Data and Analytics
43. One of the things I like about
Kinetica is it gives us more of a
general-purpose use of the
technology. There has been a lot
of software created to answer
certain questions [but] highly
specialized tools have limited
functionality and are tuned to do
a certain workload.
"
Mark Ramsey, Chief Data Officer at GSK
BUSINESS OBJECTIVE
• Faster processing of transcriptomics to run simulations of
chemical reactions for drug discovery, research, and
development
NEW CAPABILITIES DELIVERED
• In-database processing to develop models, leveraging GPU
acceleration for performance, and direct access to CUDA APIs
via UDFs deployed within Kinetica
• Seek out signals from massive collection of drug targets
combined from external data, historical data from
experiments, ad clinical trials
SOLUTION OVERVIEW
• Kinetica running on-premises on a cluster of 7 HPE DL 380
servers
• Familiar relational database with GPU acceleration
LIFE SCIENCES : GENOMICS RESEARCH
CASE STUDY : ADVANCED IN-DATABASE ANALYTICS
43
44. PIPELINE & WELL ANALYTICS
44
CASE STUDY : LOCATION BASED ANALYTICS
BUSINESS OBJECTIVE
• Augment SaaS offering to provide research data and
analytics on oil and gas to energy investors and operators
with geospatial query, visualization, and analytics
NEW CAPABILITIES DELIVERED
• Geospatial visualization and analytics of massive number of
wells, pipelines by land ownership, region etc.
• Custom visualizations and charts for data-driven insights
• Embedded solution with seamless Node.js integration, GPU
acceleration
SOLUTION OVERVIEW
• Kinetica running in RSEG’s Amazon Web Services VPC
deployment
45. LOGISTICS | Workforce optimization
BUSINESS OBJECTIVE
• Deliver better business services, optimize operations, and save
costs across 600,000 employees, 215,000 delivery vehicles, and
deliver 500 million pieces of mail daily
NEW CAPABILITIES DELIVERED
• Real-time delivery and pickup notifications, shipment routing,
just-in-time supplies
• Real-time route optimization - route planning, rerouting
• Geospatial analytics to uncover overlapping coverage areas,
uncovered areas, and distribution bottlenecks
SOLUTION OVERVIEW
• USPS runs Kinetica as a 70 TB in-memory database on a HPE DL
380 200 node system. Each node consists of a single X86 blade
server with 1TB RAM, 2 NVIDIA K80 GPUs
• Kinetica collects, processes, and analyzes 200,000 messages
per minute for real-time streaming analytics. 15,000 daily
sessions with 5 9’s uptime
45
46. PERFORMANCE SCALABLE CONVERGED AI AND BI
INDUSTRY-STANDARD
CONNECTIVITY
Distributed
Columnar
In-Memory
Relational
GPU Accelerated
Ingest, Query, Compute
Commodity Hardware
On-premises or Cloud
Scales to 100’s of TB
Less Infrastructure
More Compute
Predictable, Linear
Machine Learning
Artificial Intelligence
In-Database
Self-Service
Open Source
Kafka, Storm, NiFi, Spark
ODBC, JDBC
ANSI SQL/92
API’s for Java, JS, C++,
Python, Node.js, REST
Summary | Kinetica GPU Accelerated Analytics
46