SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
Calpont InfiniDB®
Accelerating Data Insights

                             ®




InfiniDB 3: Speeding Big Data Analytics in Amazon EC2

 Jim Tommaney, CTO Calpont
 June 2012
Today’s Presenter - Jim Tommaney

  • Calpont’s Chief Technologist
  • Architect of InfiniDB
  • 25 years experience in applied data
    technologies for BI and analytics
  • Drives InfiniDB roadmap and futures
  • Closely engaged in client deployments
    and POCs




InfiniDB® Scalable. Fast. Simple.   2       © 2012 Calpont. All Rights Reserved.
Today’s Discussion

 • Introduction
 • InfiniDB Architecture for Big Data Analytics
 • InfiniDB 3
         o   Provisioned for Amazon EC2
         o   Demo – Creating a small or large cluster
         o   Parallel load options for load scaling
         o   Demo – Cpimport Load for MPP




InfiniDB® Scalable. Fast. Simple.      3                © 2012 Calpont. All Rights Reserved.
How Fast is the World’s Big Data Footprint Growing?

  How Big is a Byte?
                                                                      How is Data Growing Daily?
  1 gigabyte                   1000000000 bytes                       15 petabytes of new information is
  1000 gigabytes               1 terabyte                             created each day – 8x more
                                                                      information than in all the libraries
  1 million terabytes 1 Exabyte                                       in the United States
  1 billion terabytes          1 Zettabyte
                                                                      Experiments at the CERN
                                                                      laboratory generate 40 TBs of data
                                                                      every second
                                    IT Implications of Big Data
                                    Wal-Mart - one million transactions every hour,
                                    feeding databases that store 2.5 petabytes –

                                    167 times the books in American’s Library of
                                    Congress




InfiniDB® Scalable. Fast. Simple.                          4                             © 2012 Calpont. All Rights Reserved.
Evolution of the Analytic Platform

                                                                 OLAP
                                                                          Analytic
                                                        DBMS     MOLAP   Platforms
                                        Relational    Extensions Cubes
                                         Systems
                                      Commercialize
                         Relational
                         Prototypes

  First DBMS
      (IDS)




    1960s               1970s         1980s      1990s       2000s       2010+




InfiniDB® Scalable. Fast. Simple.                                        © 2012 Calpont. All Rights Reserved.
What is InfiniDB?



                Simple, Powerful Platform for Big Data Analytics

                       Columnar Performance Efficiency
                            Widely used MySQL Interface
              MPP, MapReduce style Query Execution

                                          6




InfiniDB® Scalable. Fast. Simple.                             © 2012 Calpont. All Rights Reserved.
Benefits of InfiniDB



                     Real-time, Consistent Query Performance

                     Linear Scale for Massive Data

                     Removes Limits to Dimensions and Granularity

                     Easy to Deploy and Maintain


InfiniDB® Scalable. Fast. Simple.      7              © 2012 Calpont. All Rights Reserved.
How InfiniDB is Used
      Analytic Needs                Analytic Data Environment     Data Integration           Big Data Sources



                                    Data Warehouse
                                                                       ETL
                                                                                           Transactional

   Dimensional
     Analytics                                                      Hadoop

                                                                                              Operational

                                     Analytic Data                   MDM
  Data Discovery
                                        Store


                                                                                                Legacy
                                                                Direct Load Model               RDBMS


       Predictive
        Analytics
InfiniDB® Scalable. Fast. Simple.                                               © 2012 Calpont. All Rights Reserved.
Big Data Reference Architecture
      Analytic Needs                Analytic Data Environment   Data Integration           Big Data Sources




                                                                     ETL
                                                                                         Transactional

   Dimensional
     Analytics

                                                                  Hadoop                    Operational



  Data Discovery

                                                                   MDM
                                                                                              Legacy
                                                                                              RDBMS


       Predictive
        Analytics
InfiniDB® Scalable. Fast. Simple.                                             © 2012 Calpont. All Rights Reserved.
InfiniDB Product Evolution

                                                                                               InfiniDB 3
                                                             InfiniDB 2.0



                          InfiniDB 1.5
                                                                                               • Parallel Load for Big Data
                                                                  • UDFs for In-database       • Transparent provisioning
                                                                    analytics                  and run time operations on
                                                                  • Real-time compression
    InfiniDB 1.0                                                  • Enhanced partitioning      Amazon EC2
                                    •   Full parallel sub-query
                                    •   UTF-8 Support             • Enhanced parallelization
                                    •   Expanded SQL support
                                    •   Added support for
                                        additional Linux
  • 100% Columnar                       platforms
  • Full scale-out MPP
  • Fully integrated map
    reduction operations
  • High speed data load

InfiniDB® Scalable. Fast. Simple.                                     10                              © 2012 Calpont. All Rights Reserved.
InfiniDB 3

  Increasing Flexibility while Preserving Simplicity and Speed




          Unmatched                 Deployment
         Simplicity and              Flexibility
            Speed

           Easier to…
           • Take advantage of Cloud deployments
           • Load Massive Data for Distributed HW
InfiniDB® Scalable. Fast. Simple.             11    © 2012 Calpont. All Rights Reserved.
InfiniDB 3 - New Capabilities




            Prepackaged AMI for            Transparent support of EC2
          automatic provisioning of          virtual storage and data
           InfiniDB nodes on EC2            redundancy (EBS) polices




InfiniDB® Scalable. Fast. Simple.     12                    © 2012 Calpont. All Rights Reserved.
Accessing the InfiniDB AMI Trial

                                         1. Calpont.com/tryinfiniDB

                                         2. Select the AMI option

                                         3. Provide AWS #

                                         4. Calpont will provide
                                            access within 24 hrs




InfiniDB® Scalable. Fast. Simple.   13                  © 2012 Calpont. All Rights Reserved.
Big Data Reference Architecture
      Analytic Needs                Analytic Data Environment   Data Integration           Big Data Sources




                                                                     ETL
                                                                                         Transactional

   Dimensional
     Analytics

                                                                  Hadoop                    Operational

                                           User
                                          Module
  Data Discovery
                                                Performance
                                                  Module
                                                                   MDM
                                                                                              Legacy
                                                                                              RDBMS


       Predictive
        Analytics
InfiniDB® Scalable. Fast. Simple.                                             © 2012 Calpont. All Rights Reserved.
InfiniDB AMI
    DEMO
InfiniDB 3 - New Capabilities
  • Parallel Data Load designed for Big Data
      SIMPLE
            Same simple command
            Several data load configurations possible

       SCALABLE
           Linear performance as more nodes participate in the loading

         FAST
            No query performance degradation during data load



InfiniDB® Scalable. Fast. Simple.        16                 © 2012 Calpont. All Rights Reserved.
InfiniDB 3 - Parallel Data Load Options

                                    Single Bulk     Parallel Bulk
        Bulk Load,
                                       Load,           Load,
         Central
                                    Partitioned     Partitioned
                                    n partitioned    n partitioned
        1 data source
                                    data sources     data sources

             1 single                  1 single       n bulk load
            command                   command         commands

     Auto distribution              n Performance   n Performance
        across S/N                  Module nodes       Modules

InfiniDB® Scalable. Fast. Simple.         17           © 2012 Calpont. All Rights Reserved.
InfiniDB Cpimport Load
         DEMO
InfiniDB 3 – Key Takeaways

    Scalable with Amazon but same
     platform you are used to on-premise
                                           “Based on this survey, the
                                            data warehouse vendor with
    Easier to deploy with AMI              the happiest customers in
                                            2011 was Teradata, followed
                                            by CALPONT, then IBM,
    Extended load performance to MPP       followed by Kognitio and

     deployments                            Kalido”

                                                                 Andy Hayler
                                                        Information Difference
                                                          2011 DW Landscape
                                                                        Survey




InfiniDB® Scalable. Fast. Simple.                  © 2012 Calpont. All Rights Reserved.
®




 www.calpont.com
@Calpont, @InfiniDB

Mais conteúdo relacionado

Mais procurados

Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsDataWorks Summit
 
Enabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data SourcesEnabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data SourcesInside Analysis
 
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, InformaticaHadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, InformaticaCloudera, Inc.
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsHortonworks
 
Hadoop Summit Japan 2011 Fall - LT by IBM
Hadoop Summit Japan 2011 Fall - LT by IBMHadoop Summit Japan 2011 Fall - LT by IBM
Hadoop Summit Japan 2011 Fall - LT by IBMAtsushi Tsuchiya
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrLarge-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrDataWorks Summit
 
Research on big data
Research on big dataResearch on big data
Research on big dataRoby Chen
 
Big Data and HPC
Big Data and HPCBig Data and HPC
Big Data and HPCNetApp
 
Searching conversations with hadoop
Searching conversations with hadoopSearching conversations with hadoop
Searching conversations with hadoopDataWorks Summit
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondTeradata Aster
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQSybase Türkiye
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Eric Baldeschwieler
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetupRoby Chen
 

Mais procurados (19)

Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
 
Enabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data SourcesEnabling Flexible Governance for All Data Sources
Enabling Flexible Governance for All Data Sources
 
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, InformaticaHadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
 
hadoop @ Ibmbigdata
hadoop @ Ibmbigdatahadoop @ Ibmbigdata
hadoop @ Ibmbigdata
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Hadoop Summit Japan 2011 Fall - LT by IBM
Hadoop Summit Japan 2011 Fall - LT by IBMHadoop Summit Japan 2011 Fall - LT by IBM
Hadoop Summit Japan 2011 Fall - LT by IBM
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrLarge-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
 
Research on big data
Research on big dataResearch on big data
Research on big data
 
Big Data and HPC
Big Data and HPCBig Data and HPC
Big Data and HPC
 
Searching conversations with hadoop
Searching conversations with hadoopSearching conversations with hadoop
Searching conversations with hadoop
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQ
 
Security data deluge
Security data delugeSecurity data deluge
Security data deluge
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 

Semelhante a InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Research ON Big Data
Research ON Big DataResearch ON Big Data
Research ON Big Datamysqlops
 
IBM Big Data Platform, 2012
IBM Big Data Platform, 2012IBM Big Data Platform, 2012
IBM Big Data Platform, 2012Rob Thomas
 
Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Etu Solution
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
From open data to API-driven business
From open data to API-driven businessFrom open data to API-driven business
From open data to API-driven businessOpenDataSoft
 
Streaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionDATAVERSITY
 
The IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse ApplianceThe IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse ApplianceIBM Sverige
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012Amazon Web Services
 
BI Forum 2009 - Principy architektury MPP datového skladu
BI Forum 2009 - Principy architektury MPP datového skladuBI Forum 2009 - Principy architektury MPP datového skladu
BI Forum 2009 - Principy architektury MPP datového skladuOKsystem
 

Semelhante a InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2 (20)

Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Research ON Big Data
Research ON Big DataResearch ON Big Data
Research ON Big Data
 
IBM Big Data Platform, 2012
IBM Big Data Platform, 2012IBM Big Data Platform, 2012
IBM Big Data Platform, 2012
 
Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
From open data to API-driven business
From open data to API-driven businessFrom open data to API-driven business
From open data to API-driven business
 
Streaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise Adoption
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
The IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse ApplianceThe IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse Appliance
 
The New Enterprise Data Platform
The New Enterprise Data PlatformThe New Enterprise Data Platform
The New Enterprise Data Platform
 
Yahoo & Hadoop
Yahoo & HadoopYahoo & Hadoop
Yahoo & Hadoop
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
 
ESGYN Overview
ESGYN OverviewESGYN Overview
ESGYN Overview
 
BI Forum 2009 - Principy architektury MPP datového skladu
BI Forum 2009 - Principy architektury MPP datového skladuBI Forum 2009 - Principy architektury MPP datového skladu
BI Forum 2009 - Principy architektury MPP datového skladu
 
Big Data
Big DataBig Data
Big Data
 

Último

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Último (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

  • 1. Calpont InfiniDB® Accelerating Data Insights ® InfiniDB 3: Speeding Big Data Analytics in Amazon EC2 Jim Tommaney, CTO Calpont June 2012
  • 2. Today’s Presenter - Jim Tommaney • Calpont’s Chief Technologist • Architect of InfiniDB • 25 years experience in applied data technologies for BI and analytics • Drives InfiniDB roadmap and futures • Closely engaged in client deployments and POCs InfiniDB® Scalable. Fast. Simple. 2 © 2012 Calpont. All Rights Reserved.
  • 3. Today’s Discussion • Introduction • InfiniDB Architecture for Big Data Analytics • InfiniDB 3 o Provisioned for Amazon EC2 o Demo – Creating a small or large cluster o Parallel load options for load scaling o Demo – Cpimport Load for MPP InfiniDB® Scalable. Fast. Simple. 3 © 2012 Calpont. All Rights Reserved.
  • 4. How Fast is the World’s Big Data Footprint Growing? How Big is a Byte? How is Data Growing Daily? 1 gigabyte 1000000000 bytes 15 petabytes of new information is 1000 gigabytes 1 terabyte created each day – 8x more information than in all the libraries 1 million terabytes 1 Exabyte in the United States 1 billion terabytes 1 Zettabyte Experiments at the CERN laboratory generate 40 TBs of data every second IT Implications of Big Data Wal-Mart - one million transactions every hour, feeding databases that store 2.5 petabytes – 167 times the books in American’s Library of Congress InfiniDB® Scalable. Fast. Simple. 4 © 2012 Calpont. All Rights Reserved.
  • 5. Evolution of the Analytic Platform OLAP Analytic DBMS MOLAP Platforms Relational Extensions Cubes Systems Commercialize Relational Prototypes First DBMS (IDS) 1960s 1970s 1980s 1990s 2000s 2010+ InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
  • 6. What is InfiniDB? Simple, Powerful Platform for Big Data Analytics Columnar Performance Efficiency Widely used MySQL Interface MPP, MapReduce style Query Execution 6 InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
  • 7. Benefits of InfiniDB Real-time, Consistent Query Performance Linear Scale for Massive Data Removes Limits to Dimensions and Granularity Easy to Deploy and Maintain InfiniDB® Scalable. Fast. Simple. 7 © 2012 Calpont. All Rights Reserved.
  • 8. How InfiniDB is Used Analytic Needs Analytic Data Environment Data Integration Big Data Sources Data Warehouse ETL Transactional Dimensional Analytics Hadoop Operational Analytic Data MDM Data Discovery Store Legacy Direct Load Model RDBMS Predictive Analytics InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
  • 9. Big Data Reference Architecture Analytic Needs Analytic Data Environment Data Integration Big Data Sources ETL Transactional Dimensional Analytics Hadoop Operational Data Discovery MDM Legacy RDBMS Predictive Analytics InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
  • 10. InfiniDB Product Evolution InfiniDB 3 InfiniDB 2.0 InfiniDB 1.5 • Parallel Load for Big Data • UDFs for In-database • Transparent provisioning analytics and run time operations on • Real-time compression InfiniDB 1.0 • Enhanced partitioning Amazon EC2 • Full parallel sub-query • UTF-8 Support • Enhanced parallelization • Expanded SQL support • Added support for additional Linux • 100% Columnar platforms • Full scale-out MPP • Fully integrated map reduction operations • High speed data load InfiniDB® Scalable. Fast. Simple. 10 © 2012 Calpont. All Rights Reserved.
  • 11. InfiniDB 3 Increasing Flexibility while Preserving Simplicity and Speed Unmatched Deployment Simplicity and Flexibility Speed Easier to… • Take advantage of Cloud deployments • Load Massive Data for Distributed HW InfiniDB® Scalable. Fast. Simple. 11 © 2012 Calpont. All Rights Reserved.
  • 12. InfiniDB 3 - New Capabilities Prepackaged AMI for Transparent support of EC2 automatic provisioning of virtual storage and data InfiniDB nodes on EC2 redundancy (EBS) polices InfiniDB® Scalable. Fast. Simple. 12 © 2012 Calpont. All Rights Reserved.
  • 13. Accessing the InfiniDB AMI Trial 1. Calpont.com/tryinfiniDB 2. Select the AMI option 3. Provide AWS # 4. Calpont will provide access within 24 hrs InfiniDB® Scalable. Fast. Simple. 13 © 2012 Calpont. All Rights Reserved.
  • 14. Big Data Reference Architecture Analytic Needs Analytic Data Environment Data Integration Big Data Sources ETL Transactional Dimensional Analytics Hadoop Operational User Module Data Discovery Performance Module MDM Legacy RDBMS Predictive Analytics InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
  • 15. InfiniDB AMI DEMO
  • 16. InfiniDB 3 - New Capabilities • Parallel Data Load designed for Big Data SIMPLE Same simple command Several data load configurations possible SCALABLE Linear performance as more nodes participate in the loading FAST No query performance degradation during data load InfiniDB® Scalable. Fast. Simple. 16 © 2012 Calpont. All Rights Reserved.
  • 17. InfiniDB 3 - Parallel Data Load Options Single Bulk Parallel Bulk Bulk Load, Load, Load, Central Partitioned Partitioned n partitioned n partitioned 1 data source data sources data sources 1 single 1 single n bulk load command command commands Auto distribution n Performance n Performance across S/N Module nodes Modules InfiniDB® Scalable. Fast. Simple. 17 © 2012 Calpont. All Rights Reserved.
  • 19. InfiniDB 3 – Key Takeaways Scalable with Amazon but same platform you are used to on-premise “Based on this survey, the data warehouse vendor with Easier to deploy with AMI the happiest customers in 2011 was Teradata, followed by CALPONT, then IBM, Extended load performance to MPP followed by Kognitio and deployments Kalido” Andy Hayler Information Difference 2011 DW Landscape Survey InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.