SlideShare a Scribd company logo
1 of 17
COMBINING HADOOP &
VERTICA FOR LARGE SCALE
ANALYTICS
Hadoop Summit 2012




Shilpa Lawande
VP Engineering, Vertica, an HP Company
1
    ©2011 Hewlett-Packard Development Company, L.P.
    The information contained herein is subject to change without notice
The Big Data Problem




2
The (popular) Solution




    Vertica’s Real-time Analytics
                  +
    Scale & Flexibility of Hadoop
3
What is Vertica?
                                                                      Speed

    •    SQL Database for Real-time Analytics         Services
          Cloud

    •    Runs on x86 hardware
    •    MPP Columnar Architecture – scales to PBs!
                  Monetize
                                             Real
                                             Time


    •    Extensible analytics capabilities
    •    Easy to setup and use
    •    Elastic - grow/shrink as needed
                   Better
                  Decisions
                                         Statistics



         Mobile                                       Individual
                              Analysis




                                                                   Simplicity
4
What Analytics can Vertica do?




     SQL                   Extended
                           SQL
     •  Window
        functions          •  Sessionization           SDKs
     •  Graph              •  Time series              •  C++
     •  Monte Carlo        •  Pattern                  •  R
                              matching
     •  Statistical
                           •  Event series
     •  Geospatial
                              joins




5   Check out: https://github.com/vertica/Vertica-Extension-Packages
Who uses Vertica?

        600+
    Customers worldwide     “… by partnering with Vertica
                            we’re able to provide operators the
                            tools they need to confidently
                            interpret customer experience…”
                                   Steve Kish, Director, Product Management Empirix




                              “…being able to run social graph
                            analysis on tables with tens of billions
                              of rows with a fast turn around is
                                         amazing…”
                                    Dan McCaffrey, Director of Analytics, Zynga




6
What's different – Hadoop vs Vertica?




    Vertica                                       Hadoop
                                  Both
    •  Designed for           Purpose-built       •  Designed for
       Performance              Scalable             Fault-tolerance
    •  SQL                      Analytics         •  Map-Reduce
    •  Interactive             Platforms          •  Batch Analytics
       Analytics




        Read: http://www.vertica.com/2011/09/21/counting-triangles/
7
Getting the best of both worlds!

                             SQL/                                           Extensions
                                           	
  	
                           In C++, R
                             ODBC/         	
  	
  	
  Ver%ca	
  	
  
                             JDBC          	
  	
  	
  	
  Engine	
  




                                                                        External Tables
                                       Native




                                             User-defined Loads
                      Ver%ca	
  	
  
                      Storage	
  
   8                                       Hadoop/MR Connector
New in Vertica 6
Joint Use Cases

Hadoop for ETL, Vertica for Analytics
    •  Logparsing / tagging / filtering
    •  Convert JSON into relational tuples

HDFS for data storage, Vertica + Hadoop for Analytics
    •  Real-timeanalytics on Vertica (needs speed)
    •  Long-running / exploratory analytics on Hadoop (needs fault tolerance)
    •  Load from HDFS directly to Vertica (needs Vertica 6)
    •  SQL access to HDFS (needs Vertica 6)

Vertica for data storage, Hadoop as a multi-purpose tool
    •  Hadoop as a scheduler / load-balancer
    •  Hadoop to convert to formats for other tools (e.g. STATA)
    •  Hadoop for Backup via Sqoop


9
Customer Stories




10
Accelerating Drug Discovery




                                     The solution
•  Analyzing gene                                                •  Queries went from 5
   variants using SNPs                                              hours to 5 minutes
   and Microarray data   •  Hadoop to find the variants          •  Scale to 100s of TB of
                            between a sample sequence               data
                            and a reference genome               •  More experiments =>
                         •  Vertica to determine oncology           faster discoveries!
                            targets
                         •  Tools: Pipeline Pilot, Spotfire, R
     The problem                                                       The value


11
Digital Consumer Insights


     •  HDFS to store raw    •  Vertica to store &    Faster insights
     input behavioral data   operationalize high     delivered more
     •  Hadoop / MR to       value biz data          consistently with less
     find conversions        •  Reporting &          administrative
     (regexp processing)     analytics via Tableau   overhead, and
                             and R                   cheaper hardware!!
                             •  Custom ETL




12
On a Privacy Assurance Mission


             Collect user              Use MR to                 Use Vertica to
          privacy reporting           process and               analyze stats for
            requests into          structure the data            every 3rd party
               HDFS                into Vertica (ETL)          tag on a website.




     For Consumers:                         For Advertisers:

                                            Provide greater transparency
                                            to end-users    (look for               on an a
     A free browser plugin that
     can tell you who’s tracking            Understand impact of 3rd party tags on
     you!                                   website performance

13
Social Video
      Social Video Analytics                                     Social Video Advertising




▫  Video analytics – 100+ Leading Pubs
                                                    Hadoop for batch processing
                                                    of logs and ETL into Vertica
       ▫ Campaign Measurement – 100+ major brands
                                                    Vertica for ad-hoc analytics
                                                    and interactive dashboards
▫ Industry-Wide Charts


                                                    Redis KV store for serving
                                                    low-latency data needs


                     100s of millions of events collected and processed
                           daily on Petabyte scale infrastructure!


14
Try Vertica for free!



     Community Edition

     Up to 1 TB limit, 3 nodes!

     Check out Vertica
       Extensions on Github!




15
References and Other Info …


 Website:    www.vertica.com

 Community Edition: http://www.vertica.com/community/

 Github:    https://github.com/vertica/Vertica-Extension-Packages

 Questions or Comments: shilpa@vertica.com

 Jobs: resumes@vertica.com     (Awesome new location in Cambridge,
 MA!)

 Follow us on Twitter: @slawande, @verticacorp

16
Sessions will resume at 2:25pm




                             Page 17

More Related Content

What's hot

Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
OW2
 
Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated Architecture
DataWorks Summit
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
IntelAPAC
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
DataWorks Summit
 

What's hot (20)

End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed REnd-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses Hadoop
 
Modernizing your Application Architecture with Microservices
Modernizing your Application Architecture with MicroservicesModernizing your Application Architecture with Microservices
Modernizing your Application Architecture with Microservices
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Cascading User Group Meet
Cascading User Group MeetCascading User Group Meet
Cascading User Group Meet
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
 
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated Architecture
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 

Viewers also liked

Buscadores
Buscadores Buscadores
Buscadores
PauColdR
 
TERRENOS DE PLANTACIONES DE PIÑA
TERRENOS DE PLANTACIONES DE PIÑATERRENOS DE PLANTACIONES DE PIÑA
TERRENOS DE PLANTACIONES DE PIÑA
yensen rioja
 
Expressions pour utiliser en cours de français2
Expressions pour utiliser en cours de français2Expressions pour utiliser en cours de français2
Expressions pour utiliser en cours de français2
Francés Sari
 
Integrated Communications Summer Internship 2016
Integrated Communications Summer Internship 2016Integrated Communications Summer Internship 2016
Integrated Communications Summer Internship 2016
Heather Ellis
 
Lomce datorkigu /Lo que nos viene con la Lomce
Lomce datorkigu /Lo que nos viene con la LomceLomce datorkigu /Lo que nos viene con la Lomce
Lomce datorkigu /Lo que nos viene con la Lomce
eguzsal
 

Viewers also liked (20)

Grammaire intermediare
Grammaire   intermediareGrammaire   intermediare
Grammaire intermediare
 
Big Data in Disease Management
Big Data in Disease ManagementBig Data in Disease Management
Big Data in Disease Management
 
Contrato servicio profesionales pdvsa
Contrato servicio profesionales pdvsaContrato servicio profesionales pdvsa
Contrato servicio profesionales pdvsa
 
updated resume 2016 - QS
updated resume 2016 - QSupdated resume 2016 - QS
updated resume 2016 - QS
 
Real madrid
Real madrid Real madrid
Real madrid
 
Buscadores
Buscadores Buscadores
Buscadores
 
TERRENOS DE PLANTACIONES DE PIÑA
TERRENOS DE PLANTACIONES DE PIÑATERRENOS DE PLANTACIONES DE PIÑA
TERRENOS DE PLANTACIONES DE PIÑA
 
Situations
SituationsSituations
Situations
 
Expressions pour utiliser en cours de français2
Expressions pour utiliser en cours de français2Expressions pour utiliser en cours de français2
Expressions pour utiliser en cours de français2
 
la voix passive - B2 vp eoi
la voix passive - B2 vp eoila voix passive - B2 vp eoi
la voix passive - B2 vp eoi
 
Integrated Communications Summer Internship 2016
Integrated Communications Summer Internship 2016Integrated Communications Summer Internship 2016
Integrated Communications Summer Internship 2016
 
Lomce datorkigu /Lo que nos viene con la Lomce
Lomce datorkigu /Lo que nos viene con la LomceLomce datorkigu /Lo que nos viene con la Lomce
Lomce datorkigu /Lo que nos viene con la Lomce
 
Digital Ethnography: New Ways of Knowing Ourselves and Our Culture
Digital Ethnography: New Ways of Knowing Ourselves and Our CultureDigital Ethnography: New Ways of Knowing Ourselves and Our Culture
Digital Ethnography: New Ways of Knowing Ourselves and Our Culture
 
Digital Strategy Recommendations Written Proposal - Allo Communications
Digital Strategy Recommendations Written Proposal - Allo CommunicationsDigital Strategy Recommendations Written Proposal - Allo Communications
Digital Strategy Recommendations Written Proposal - Allo Communications
 
la voix passive - B2 vp eoi
la voix passive - B2 vp eoila voix passive - B2 vp eoi
la voix passive - B2 vp eoi
 
Argan oil liquid gold with magical benefits
Argan oil  liquid gold with magical benefitsArgan oil  liquid gold with magical benefits
Argan oil liquid gold with magical benefits
 
PLC-SCADA and automation
PLC-SCADA and automationPLC-SCADA and automation
PLC-SCADA and automation
 
PLC and SCADA in Industrial Automation
PLC and SCADA in Industrial AutomationPLC and SCADA in Industrial Automation
PLC and SCADA in Industrial Automation
 
Slideshare.Com Powerpoint
Slideshare.Com PowerpointSlideshare.Com Powerpoint
Slideshare.Com Powerpoint
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar to Combining Hadoop RDBMS for Large-Scale Big Data Analytics

Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
Harry Frost
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
Kognitio
 

Similar to Combining Hadoop RDBMS for Large-Scale Big Data Analytics (20)

Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopHP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
4AA6-4492ENW
4AA6-4492ENW4AA6-4492ENW
4AA6-4492ENW
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware Hadoop
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Combining Hadoop RDBMS for Large-Scale Big Data Analytics

  • 1. COMBINING HADOOP & VERTICA FOR LARGE SCALE ANALYTICS Hadoop Summit 2012 Shilpa Lawande VP Engineering, Vertica, an HP Company 1 ©2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
  • 2. The Big Data Problem 2
  • 3. The (popular) Solution Vertica’s Real-time Analytics + Scale & Flexibility of Hadoop 3
  • 4. What is Vertica? Speed •  SQL Database for Real-time Analytics Services Cloud •  Runs on x86 hardware •  MPP Columnar Architecture – scales to PBs! Monetize Real Time •  Extensible analytics capabilities •  Easy to setup and use •  Elastic - grow/shrink as needed Better Decisions Statistics Mobile Individual Analysis Simplicity 4
  • 5. What Analytics can Vertica do? SQL Extended SQL •  Window functions •  Sessionization SDKs •  Graph •  Time series •  C++ •  Monte Carlo •  Pattern •  R matching •  Statistical •  Event series •  Geospatial joins 5 Check out: https://github.com/vertica/Vertica-Extension-Packages
  • 6. Who uses Vertica? 600+ Customers worldwide “… by partnering with Vertica we’re able to provide operators the tools they need to confidently interpret customer experience…” Steve Kish, Director, Product Management Empirix “…being able to run social graph analysis on tables with tens of billions of rows with a fast turn around is amazing…” Dan McCaffrey, Director of Analytics, Zynga 6
  • 7. What's different – Hadoop vs Vertica? Vertica Hadoop Both •  Designed for Purpose-built •  Designed for Performance Scalable Fault-tolerance •  SQL Analytics •  Map-Reduce •  Interactive Platforms •  Batch Analytics Analytics Read: http://www.vertica.com/2011/09/21/counting-triangles/ 7
  • 8. Getting the best of both worlds! SQL/ Extensions     In C++, R ODBC/      Ver%ca     JDBC        Engine   External Tables Native User-defined Loads Ver%ca     Storage   8 Hadoop/MR Connector New in Vertica 6
  • 9. Joint Use Cases Hadoop for ETL, Vertica for Analytics •  Logparsing / tagging / filtering •  Convert JSON into relational tuples HDFS for data storage, Vertica + Hadoop for Analytics •  Real-timeanalytics on Vertica (needs speed) •  Long-running / exploratory analytics on Hadoop (needs fault tolerance) •  Load from HDFS directly to Vertica (needs Vertica 6) •  SQL access to HDFS (needs Vertica 6) Vertica for data storage, Hadoop as a multi-purpose tool •  Hadoop as a scheduler / load-balancer •  Hadoop to convert to formats for other tools (e.g. STATA) •  Hadoop for Backup via Sqoop 9
  • 11. Accelerating Drug Discovery The solution •  Analyzing gene •  Queries went from 5 variants using SNPs hours to 5 minutes and Microarray data •  Hadoop to find the variants •  Scale to 100s of TB of between a sample sequence data and a reference genome •  More experiments => •  Vertica to determine oncology faster discoveries! targets •  Tools: Pipeline Pilot, Spotfire, R The problem The value 11
  • 12. Digital Consumer Insights •  HDFS to store raw •  Vertica to store & Faster insights input behavioral data operationalize high delivered more •  Hadoop / MR to value biz data consistently with less find conversions •  Reporting & administrative (regexp processing) analytics via Tableau overhead, and and R cheaper hardware!! •  Custom ETL 12
  • 13. On a Privacy Assurance Mission Collect user Use MR to Use Vertica to privacy reporting process and analyze stats for requests into structure the data every 3rd party HDFS into Vertica (ETL) tag on a website. For Consumers: For Advertisers: Provide greater transparency to end-users (look for on an a A free browser plugin that can tell you who’s tracking Understand impact of 3rd party tags on you! website performance 13
  • 14. Social Video Social Video Analytics Social Video Advertising ▫  Video analytics – 100+ Leading Pubs Hadoop for batch processing of logs and ETL into Vertica ▫ Campaign Measurement – 100+ major brands Vertica for ad-hoc analytics and interactive dashboards ▫ Industry-Wide Charts Redis KV store for serving low-latency data needs 100s of millions of events collected and processed daily on Petabyte scale infrastructure! 14
  • 15. Try Vertica for free! Community Edition Up to 1 TB limit, 3 nodes! Check out Vertica Extensions on Github! 15
  • 16. References and Other Info … Website: www.vertica.com Community Edition: http://www.vertica.com/community/ Github: https://github.com/vertica/Vertica-Extension-Packages Questions or Comments: shilpa@vertica.com Jobs: resumes@vertica.com (Awesome new location in Cambridge, MA!) Follow us on Twitter: @slawande, @verticacorp 16
  • 17. Sessions will resume at 2:25pm Page 17