SlideShare uma empresa Scribd logo
1 de 29
Bay Area Hadoop Users Group
Turning the Tables with InfiniDB for
Hadoop
December 18, 2013
Agenda
 InfiniDB Background
 InfiniDB Technical Foundations
 Parallelism
 Partitioning Model
 Additional I/O Efficiencies

 (My)SQL for Hadoop
 When to use Columnar/InfiniDB for Hadoop
 InfiniDB Benchmarks

Copyright © 2013 Calpont. All Rights Reserved.
InfiniDB Background
Platforms

Versions

 InfiniDB

 InfiniDB Launched Feb 2010

 InfiniDB for the Cloud

 InfiniDB 4 – latest release
available October 2013

 InfiniDB for Hadoop

 Added InfiniDB for Hadoop

 Source code at
https://github.com/infinidb

 GPL v2
 No restrictions on syntax,
scale, or performance

Copyright © 2013 Calpont. All Rights Reserved.
InfiniDB Background - Customer Base

Copyright © 2013 Calpont. All Rights Reserved.
InfiniDB Background
Platforms
 InfiniDB

Local Disk, GlusterFS, Windows*

 http://www.calpont.com/products/tryinfinidb

 InfiniDB for Hadoop

CDH or HDP

 http://www.calpont.com/products/tryinfinidb

 InfiniDB for the Cloud

Any availability zone

Copyright © 2013 Calpont. All Rights Reserved.
InfiniDB Background – InfiniDB for Hadoop
 InfiniDB is a non-map/reduce engine
 Reads and writes natively to HDFS

Pig/Hive

HBase

Map Reduce

InfiniDB
for
Hadoop

Hadoop Distributed File System

6
InfiniDB Background - InfiniDB for Hadoop
Is InfiniDB a Database?
“InfiniDB turns SQL developers

…not a General Purpose DBMS.

into Big Data developers. We
deployed it quickly and easily

Is InfiniDB NoSQL?

for our online sales analytics.

… only in the sense that we discarded

Something we couldn’t do

traditional DBMS architectures.

with Hadoop, Mongo, or
Teradata”

Is InfiniDB an SQL for Hadoop technology?
… Yes, but not general purpose SQL.

InfiniDB is highly optimized for analytic
workloads/queries.

7
InfiniDB Foundation - Parallelism
• User Module – Processes SQL Requests
• Performance Module – Executes the Queries
Single Server

MPP

or

Local disk / EBS
GlusterFS / HDFS
8
InfiniDB Foundation - Parallelism
•Purpose-built C++ engine
•Parallelism is at the thread level
•Example: 12 PM Servers with 8 cores each
yields 96 parallel processing engines.
•SQL is translated into thousands or tens of
thousands of discrete jobs or “primitives”.
•The UM sends primitives to the processing
engines.
9
InfiniDB Foundation - Parallelism
•User Module – Processes SQL Requests
•Performance Module – Executes the Queries
Single Server

MPP

• Primitives are issued to
thread queue within PM
• Fixed thread count at PM
Local disk / EBS
GlusterFS / HDFS
10
Fully Parallel SQL + Full SQL Syntax

DoW

Reduce 

SQL Operations are translated into thousands of jobs via custom
Distribution of Work:
• Parallel/Distributed Data Access
• Parallel/Distributed Joins (Inner, Outer)
• Parallel/Distributed Sub-queries (From, Where, Select)
• Parallel/Distributed Group By, Distinct, and Aggregation
• Extensible with Parallel/Distributed User Defined Functions
Results are returned to User Module in Reduce Phase
11
InfiniDB Data Partitioning
2-Dimensional Partitioning Model
•Vertical Partitioning by Column
o Not Column-Family (no relation to HBase)
o Only do I/O for columns requested

•Horizontal Partitioning by range of rows
o Meta-data stored within in-memory structure

12
InfiniDB Data Partitioning
•Partition elimination can occur based on:
o Columns not included in SQL.
o Based on filter expressed within query.
o Based on filter expressed on a join table:

Table1 filter can drive Table2 I/O elimination
o Intersection between filters:
Filter1 and Filter2 does I/O on intersection
13
Column Restriction and Projection
|-------- Column # Seventeen -----------|

Extent # 27

Filter 3

Filter 2

Filter 1

|-------------- Column # Six ---------------|

|-------------- Column # Four ---------------|

Projection

Extent # 5

Projection

• Automatic Vertical Partitioning + Horizontal Partitioning
• Just-In-Time Materialization
14
Additional I/O Efficiency
Techniques to Avoid Unnecessary I/O
 Vertical Partitioning: read only the columns required

 Horizontal Partition: focus on the rows required
 Just-in-time materialization

Techniques for Efficient I/O
 Columnar compression reduces I/O from disk
 Global data buffer cache can reduce disk I/O (in-memory)

 Avoidance of Random I/O

15
InfiniDB Design Principles
®

Scalable

Fast

16

Simple
(My)SQL for Hadoop - Engine=InfiniDB
InfiniDB uses standard “Engine=InfiniDB” syntax:

CREATE TABLE `game_warehouse`.`dim_title` (
`id` INT,
`name` VARCHAR(45),
`publisher` VARCHAR(45),
`release_date` DATE,
`language` INT,
`platform_name` VARCHAR(45),
`version` VARCHAR(45)
) ENGINE=InfiniDB;

17
(My)SQL for Hadoop
Leverage existing tools
that connect to
MySQL

Expose Structured
Data to the Business

Familiar User Privilege
Administration

MicroStrategy
JasperSoft
Pentaho

MySQL ease of use + Hadoop Scale + Columnar
Performance
18
Syntax Support

Broad MySQL
SQL syntax

-

+

Analytic/windowing
functions included
with InfiniDB 4

No indexing needed.
Partitioning is automatic.

InfiniDB Supported Syntax
19
When to Use InfiniDB for Hadoop

Query Size (Vision/Scope) defines workloads:
1

100 10,000

1,000,000

100,000,000 10,000,000,000

Query Size/Vision/Scope

OLTP/NoSQL Workloads

ROLAP/Analytic/Reporting Workloads

General purpose DBMS missed the target
( dated database technology generally not optimal )
20
What is your typical query?
1

100 10,000

1,000,000

100,000,000 10,000,000,000

Query Vision/Scope

OLTP/NoSQL Workloads

Analytic Workloads

• There is no “average” query.
• The challenges are at the extremes:
o The challenge of high concurrency levels with small queries.
o The challenge of latency for very large queries.

• Most use cases imply multiple data technologies.
21
Columnar Appropriate Workloads
1

100 10,000

1,000,000

100,000,000 10,000,000,000

Query Vision/Scope

OLTP/NoSQL Workloads

Pure Columnar about
10x worse I/O for
single record lookups
22

ROLAP/Analytic/Reporting Workloads

Pure Columnar about
10x better I/O for large
data access patterns
Columnar Appropriate Workloads
Data Dimensions and InfiniDB for Hadoop
Unstructured Data
Schema on read

Schema on write

Small Queries

Large Queries

Transform (ETL)

Targeted Extract

Pre-defined queries
23

Structured

Ad-hoc queries
InfiniDB Query Performance – Percona
Star Schema Benchmark (SSB)
Q5 Series
5 table Joins

Q1 Series
2 table Joins

Q2 Series
3 table Joins

Q3 Series
4 table Joins

24
1000 Genomes Data Set – 289 Billion Rows
 Fast load Rate
 Millions rows/sec
 Billions rows/hour

 Scalable load rate

1000 Genomes data set on AWS
1000 Genomes Data Set – ~ 24 trillion base
nucleotide values
Scaling: 4 –> 8 –> 16 Performance Modules

 Fast Analytics
 Millions of rows/second

 Scalable Analytics

Seconds

per core

 Automatic parallelism
Performance Modules (PMs) Active

Figure 2 - TATA Binding Protein
Source: http://en.wikipedia.org/wiki/TATA_binding_protein
Impala-InfiniDB Benchmark (Piwik Data Set)

InfiniDB

Figure 1 - Piwik Standard Query Performance

InfiniDB

Figure 2 - Piwik Ad-Hoc Query Performance

Piwik is an Open Source alternative to Google Analytics
Queries 1-6 offered are Piwik production queries
Queries 7-9 are additional ad-hoc queries covering all data
Amazon 5-node cluster
Columnar Appropriate Workloads
Data Dimensions and InfiniDB for Hadoop
Structured
Schema on read

InfiniDB

Schema on write

Small Queries

Large Queries

Transform (ETL)

Targeted Extract

Figure 2 - Piwik Ad-Hoc Query Performance

Ad-hoc queries
28
Download Today
InfiniDB and InfiniDB for Hadoop:
www.calpont.com
InfiniDB for the Cloud:
InfiniDB AMI in any AWS Availability Zone/Region

Services Inquiries:
sales@calpont.com
Twitter:
@InfiniDB

@jtommaney

© 2013 Calpont Corporation. Calpont, the Calpont logo, InfiniDB, and the InfiniDB logo are trademarks of Calpont Corporation. AWS is a trademark of Amazon.com,
Inc., and Apache Hadoop is a trademark of the Apache Software Foundation. Other product names and logos may be trademarks of their respective owners.

29

Mais conteúdo relacionado

Mais procurados

OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 
IBM #Softlayer infographic 2016
IBM #Softlayer infographic 2016IBM #Softlayer infographic 2016
IBM #Softlayer infographic 2016Patrick Bouillaud
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloudgluent.
 
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...xKinAnx
 
Data organization: hive meetup
Data organization: hive meetupData organization: hive meetup
Data organization: hive meetupt3rmin4t0r
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic StoragePatrick Bouillaud
 
Llap: Locality is Dead
Llap: Locality is DeadLlap: Locality is Dead
Llap: Locality is Deadt3rmin4t0r
 
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient Ceph Community
 
POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI Anand Haridass
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoYu Liu
 
Gummadi-47-Shadowbase-Technical-Overview.Final
Gummadi-47-Shadowbase-Technical-Overview.FinalGummadi-47-Shadowbase-Technical-Overview.Final
Gummadi-47-Shadowbase-Technical-Overview.Finalajaya gummadi
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...Filipe Miranda
 
AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center AMD
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2Yutaka Kawai
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
 
Why hitachi virtual storage platform does so well in a mainframe environment ...
Why hitachi virtual storage platform does so well in a mainframe environment ...Why hitachi virtual storage platform does so well in a mainframe environment ...
Why hitachi virtual storage platform does so well in a mainframe environment ...Hitachi Vantara
 

Mais procurados (20)

OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
IBM #Softlayer infographic 2016
IBM #Softlayer infographic 2016IBM #Softlayer infographic 2016
IBM #Softlayer infographic 2016
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
 
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
 
Ac922 cdac webinar
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinar
 
Data organization: hive meetup
Data organization: hive meetupData organization: hive meetup
Data organization: hive meetup
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Llap: Locality is Dead
Llap: Locality is DeadLlap: Locality is Dead
Llap: Locality is Dead
 
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
 
POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
 
Gummadi-47-Shadowbase-Technical-Overview.Final
Gummadi-47-Shadowbase-Technical-Overview.FinalGummadi-47-Shadowbase-Technical-Overview.Final
Gummadi-47-Shadowbase-Technical-Overview.Final
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
 
AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Why hitachi virtual storage platform does so well in a mainframe environment ...
Why hitachi virtual storage platform does so well in a mainframe environment ...Why hitachi virtual storage platform does so well in a mainframe environment ...
Why hitachi virtual storage platform does so well in a mainframe environment ...
 

Semelhante a December 2013 HUG: InfiniDB for Hadoop

MySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBMySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBCalpont
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Informix warehouse accelerator update
Informix warehouse accelerator updateInformix warehouse accelerator update
Informix warehouse accelerator updateIBM Sverige
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Cloudera, Inc.
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMario Beck
 
Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01Arunkumar Shanmugam
 
MySQL 5.6, news in 5.7 and our HA options
MySQL 5.6, news in 5.7 and our HA optionsMySQL 5.6, news in 5.7 and our HA options
MySQL 5.6, news in 5.7 and our HA optionsTed Wennmark
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsInside Analysis
 
IBM Analytics Accelerator Trends & Directions Namk Hrle
IBM Analytics Accelerator  Trends & Directions Namk Hrle IBM Analytics Accelerator  Trends & Directions Namk Hrle
IBM Analytics Accelerator Trends & Directions Namk Hrle Surekha Parekh
 
IBM DB2 Analytics Accelerator Trends & Directions by Namik Hrle
IBM DB2 Analytics Accelerator  Trends & Directions by Namik Hrle IBM DB2 Analytics Accelerator  Trends & Directions by Namik Hrle
IBM DB2 Analytics Accelerator Trends & Directions by Namik Hrle Surekha Parekh
 
Software Variability Management
Software Variability ManagementSoftware Variability Management
Software Variability ManagementXavierDevroey
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AITyrone Systems
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
MySQL Performance Metrics that Matter
MySQL Performance Metrics that MatterMySQL Performance Metrics that Matter
MySQL Performance Metrics that MatterMorgan Tocker
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseScyllaDB
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Db2 analytics accelerator on ibm integrated analytics system   technical over...Db2 analytics accelerator on ibm integrated analytics system   technical over...
Db2 analytics accelerator on ibm integrated analytics system technical over...Daniel Martin
 

Semelhante a December 2013 HUG: InfiniDB for Hadoop (20)

MySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBMySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDB
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Informix warehouse accelerator update
Informix warehouse accelerator updateInformix warehouse accelerator update
Informix warehouse accelerator update
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IX
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDB
 
Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01
 
MySQL 5.6, news in 5.7 and our HA options
MySQL 5.6, news in 5.7 and our HA optionsMySQL 5.6, news in 5.7 and our HA options
MySQL 5.6, news in 5.7 and our HA options
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
 
IBM Analytics Accelerator Trends & Directions Namk Hrle
IBM Analytics Accelerator  Trends & Directions Namk Hrle IBM Analytics Accelerator  Trends & Directions Namk Hrle
IBM Analytics Accelerator Trends & Directions Namk Hrle
 
IBM DB2 Analytics Accelerator Trends & Directions by Namik Hrle
IBM DB2 Analytics Accelerator  Trends & Directions by Namik Hrle IBM DB2 Analytics Accelerator  Trends & Directions by Namik Hrle
IBM DB2 Analytics Accelerator Trends & Directions by Namik Hrle
 
SDAccel Design Contest: SDAccel and F1 Instances
SDAccel Design Contest: SDAccel and F1 InstancesSDAccel Design Contest: SDAccel and F1 Instances
SDAccel Design Contest: SDAccel and F1 Instances
 
Software Variability Management
Software Variability ManagementSoftware Variability Management
Software Variability Management
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
MySQL Performance Metrics that Matter
MySQL Performance Metrics that MatterMySQL Performance Metrics that Matter
MySQL Performance Metrics that Matter
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency Database
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Db2 analytics accelerator on ibm integrated analytics system   technical over...Db2 analytics accelerator on ibm integrated analytics system   technical over...
Db2 analytics accelerator on ibm integrated analytics system technical over...
 

Mais de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Mais de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Último

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

December 2013 HUG: InfiniDB for Hadoop

  • 1. Bay Area Hadoop Users Group Turning the Tables with InfiniDB for Hadoop December 18, 2013
  • 2. Agenda  InfiniDB Background  InfiniDB Technical Foundations  Parallelism  Partitioning Model  Additional I/O Efficiencies  (My)SQL for Hadoop  When to use Columnar/InfiniDB for Hadoop  InfiniDB Benchmarks Copyright © 2013 Calpont. All Rights Reserved.
  • 3. InfiniDB Background Platforms Versions  InfiniDB  InfiniDB Launched Feb 2010  InfiniDB for the Cloud  InfiniDB 4 – latest release available October 2013  InfiniDB for Hadoop  Added InfiniDB for Hadoop  Source code at https://github.com/infinidb  GPL v2  No restrictions on syntax, scale, or performance Copyright © 2013 Calpont. All Rights Reserved.
  • 4. InfiniDB Background - Customer Base Copyright © 2013 Calpont. All Rights Reserved.
  • 5. InfiniDB Background Platforms  InfiniDB Local Disk, GlusterFS, Windows*  http://www.calpont.com/products/tryinfinidb  InfiniDB for Hadoop CDH or HDP  http://www.calpont.com/products/tryinfinidb  InfiniDB for the Cloud Any availability zone Copyright © 2013 Calpont. All Rights Reserved.
  • 6. InfiniDB Background – InfiniDB for Hadoop  InfiniDB is a non-map/reduce engine  Reads and writes natively to HDFS Pig/Hive HBase Map Reduce InfiniDB for Hadoop Hadoop Distributed File System 6
  • 7. InfiniDB Background - InfiniDB for Hadoop Is InfiniDB a Database? “InfiniDB turns SQL developers …not a General Purpose DBMS. into Big Data developers. We deployed it quickly and easily Is InfiniDB NoSQL? for our online sales analytics. … only in the sense that we discarded Something we couldn’t do traditional DBMS architectures. with Hadoop, Mongo, or Teradata” Is InfiniDB an SQL for Hadoop technology? … Yes, but not general purpose SQL. InfiniDB is highly optimized for analytic workloads/queries. 7
  • 8. InfiniDB Foundation - Parallelism • User Module – Processes SQL Requests • Performance Module – Executes the Queries Single Server MPP or Local disk / EBS GlusterFS / HDFS 8
  • 9. InfiniDB Foundation - Parallelism •Purpose-built C++ engine •Parallelism is at the thread level •Example: 12 PM Servers with 8 cores each yields 96 parallel processing engines. •SQL is translated into thousands or tens of thousands of discrete jobs or “primitives”. •The UM sends primitives to the processing engines. 9
  • 10. InfiniDB Foundation - Parallelism •User Module – Processes SQL Requests •Performance Module – Executes the Queries Single Server MPP • Primitives are issued to thread queue within PM • Fixed thread count at PM Local disk / EBS GlusterFS / HDFS 10
  • 11. Fully Parallel SQL + Full SQL Syntax DoW Reduce  SQL Operations are translated into thousands of jobs via custom Distribution of Work: • Parallel/Distributed Data Access • Parallel/Distributed Joins (Inner, Outer) • Parallel/Distributed Sub-queries (From, Where, Select) • Parallel/Distributed Group By, Distinct, and Aggregation • Extensible with Parallel/Distributed User Defined Functions Results are returned to User Module in Reduce Phase 11
  • 12. InfiniDB Data Partitioning 2-Dimensional Partitioning Model •Vertical Partitioning by Column o Not Column-Family (no relation to HBase) o Only do I/O for columns requested •Horizontal Partitioning by range of rows o Meta-data stored within in-memory structure 12
  • 13. InfiniDB Data Partitioning •Partition elimination can occur based on: o Columns not included in SQL. o Based on filter expressed within query. o Based on filter expressed on a join table: Table1 filter can drive Table2 I/O elimination o Intersection between filters: Filter1 and Filter2 does I/O on intersection 13
  • 14. Column Restriction and Projection |-------- Column # Seventeen -----------| Extent # 27 Filter 3 Filter 2 Filter 1 |-------------- Column # Six ---------------| |-------------- Column # Four ---------------| Projection Extent # 5 Projection • Automatic Vertical Partitioning + Horizontal Partitioning • Just-In-Time Materialization 14
  • 15. Additional I/O Efficiency Techniques to Avoid Unnecessary I/O  Vertical Partitioning: read only the columns required  Horizontal Partition: focus on the rows required  Just-in-time materialization Techniques for Efficient I/O  Columnar compression reduces I/O from disk  Global data buffer cache can reduce disk I/O (in-memory)  Avoidance of Random I/O 15
  • 17. (My)SQL for Hadoop - Engine=InfiniDB InfiniDB uses standard “Engine=InfiniDB” syntax: CREATE TABLE `game_warehouse`.`dim_title` ( `id` INT, `name` VARCHAR(45), `publisher` VARCHAR(45), `release_date` DATE, `language` INT, `platform_name` VARCHAR(45), `version` VARCHAR(45) ) ENGINE=InfiniDB; 17
  • 18. (My)SQL for Hadoop Leverage existing tools that connect to MySQL Expose Structured Data to the Business Familiar User Privilege Administration MicroStrategy JasperSoft Pentaho MySQL ease of use + Hadoop Scale + Columnar Performance 18
  • 19. Syntax Support Broad MySQL SQL syntax - + Analytic/windowing functions included with InfiniDB 4 No indexing needed. Partitioning is automatic. InfiniDB Supported Syntax 19
  • 20. When to Use InfiniDB for Hadoop Query Size (Vision/Scope) defines workloads: 1 100 10,000 1,000,000 100,000,000 10,000,000,000 Query Size/Vision/Scope OLTP/NoSQL Workloads ROLAP/Analytic/Reporting Workloads General purpose DBMS missed the target ( dated database technology generally not optimal ) 20
  • 21. What is your typical query? 1 100 10,000 1,000,000 100,000,000 10,000,000,000 Query Vision/Scope OLTP/NoSQL Workloads Analytic Workloads • There is no “average” query. • The challenges are at the extremes: o The challenge of high concurrency levels with small queries. o The challenge of latency for very large queries. • Most use cases imply multiple data technologies. 21
  • 22. Columnar Appropriate Workloads 1 100 10,000 1,000,000 100,000,000 10,000,000,000 Query Vision/Scope OLTP/NoSQL Workloads Pure Columnar about 10x worse I/O for single record lookups 22 ROLAP/Analytic/Reporting Workloads Pure Columnar about 10x better I/O for large data access patterns
  • 23. Columnar Appropriate Workloads Data Dimensions and InfiniDB for Hadoop Unstructured Data Schema on read Schema on write Small Queries Large Queries Transform (ETL) Targeted Extract Pre-defined queries 23 Structured Ad-hoc queries
  • 24. InfiniDB Query Performance – Percona Star Schema Benchmark (SSB) Q5 Series 5 table Joins Q1 Series 2 table Joins Q2 Series 3 table Joins Q3 Series 4 table Joins 24
  • 25. 1000 Genomes Data Set – 289 Billion Rows  Fast load Rate  Millions rows/sec  Billions rows/hour  Scalable load rate 1000 Genomes data set on AWS
  • 26. 1000 Genomes Data Set – ~ 24 trillion base nucleotide values Scaling: 4 –> 8 –> 16 Performance Modules  Fast Analytics  Millions of rows/second  Scalable Analytics Seconds per core  Automatic parallelism Performance Modules (PMs) Active Figure 2 - TATA Binding Protein Source: http://en.wikipedia.org/wiki/TATA_binding_protein
  • 27. Impala-InfiniDB Benchmark (Piwik Data Set) InfiniDB Figure 1 - Piwik Standard Query Performance InfiniDB Figure 2 - Piwik Ad-Hoc Query Performance Piwik is an Open Source alternative to Google Analytics Queries 1-6 offered are Piwik production queries Queries 7-9 are additional ad-hoc queries covering all data Amazon 5-node cluster
  • 28. Columnar Appropriate Workloads Data Dimensions and InfiniDB for Hadoop Structured Schema on read InfiniDB Schema on write Small Queries Large Queries Transform (ETL) Targeted Extract Figure 2 - Piwik Ad-Hoc Query Performance Ad-hoc queries 28
  • 29. Download Today InfiniDB and InfiniDB for Hadoop: www.calpont.com InfiniDB for the Cloud: InfiniDB AMI in any AWS Availability Zone/Region Services Inquiries: sales@calpont.com Twitter: @InfiniDB @jtommaney © 2013 Calpont Corporation. Calpont, the Calpont logo, InfiniDB, and the InfiniDB logo are trademarks of Calpont Corporation. AWS is a trademark of Amazon.com, Inc., and Apache Hadoop is a trademark of the Apache Software Foundation. Other product names and logos may be trademarks of their respective owners. 29