SlideShare uma empresa Scribd logo
1 de 28
MapR's Hadoop Distribution on
Google Compute Engine
Who am I?

     http://www.mapr.com/company/events/
           speaking/devfest-dc-9-28-12
•     Keys Botzum
•     kbotzum@maprtech.com
•     Senior Principal Technologist, MapR Technologies
•     MapR Federal and Eastern Region
MapR’s Experience with Google Compute Engine


 •  Fast
    –    Virtualized public cloud
    –    Rivals on-premise physical

 •  Easy
     –  Provision 1,000s of servers in
        minutes

 •  Cost effective
     –  Pay only for what you use
gcutil is your friend

•    Command line tool that runs on your client machines to manage your
     instances in your cloud
•    Remarkably easy to use
      –  New server/instance: gcutil addinstance
      –  Connect to a server/instance: gcutil ssh
•    Can create your own custom images using Google’s tools
      –  Using custom images is as easy as addinstance –image <image name>
      –  MapR is creating custom images for MapR clusters
MapReduce: A Paradigm Shift
•    Distributed computing platform
      •  Large clusters
      •  Commodity hardware
•    Pioneered at Google
      •  BigTable, MapReduce and Google File System
•    Commercially available as Hadoop
MapR Technologies
•  Open, enterprise-grade distribution for
   Hadoop
   –    Easy, dependable and fast
   –    Open source with standards-based extensions

•  Hadoop
   –    Big data analytics
   –    Inspired by MapReduce paper published by Google
        scientists Jeffrey Dean and Sanjay Ghemawat in
        2004

•  MapR is recognized as a technology
   leader

•  MapR Hadoop Cloud Service now
   available on Google Compute Engine
MapR Partners
MapR’s Complete Distribution
for Apache Hadoop
                                                           MapR Control System
•    Integrated, tested,
     hardened and supported                 MapR
                                          Heatmap™
                                                          LDAP, NIS
                                                          Integration
                                                                           Quotas,           CLI,
                                                                                           REST APT
                                                                        Alerts, Alarms
•    Integrated with
     Accumulo
                                     Hive           Pig       Oozle        Sqoop         HBase        Whirr
•    Runs on commodity
     hardware
•    Open source with          Accumulo    Mahout     Cascading     Naglos       Ganglia         Flume        Zoo-
                                                                   Integration   Integration                 keeper
     standards-based
     extensions for:
      •  Security
      •  File-based access
                               Direct                                            Snap-
      •  Most SQL-based        Access
                                            Real-     Volumes      Mirrors                       Data
                                            Time                                 shots         Placemen
         access                 NFS       Streamin                                                 t
      •  Easiest integration                  g
                                     No NameNode             High Performance            Stateful Failover
•    High availability                Architecture             Direct Shuffle            and Self Healing

•    Best performance
                                                      MapR’s Storage Services™
                                                                2.7	
  
Overview of Starting a Cluster
•    Google’s gcutil is your friend
      •  Very easy tool for spinning up instances
•    MapR is creating a tool and infrastructure to spin up a fully functional MapR
     cluster composed of many nodes
      •  ./mapr-start-cluster.sh –machine-type <…> -masters <#> -slaves <#>
      •  …wait a few minutes
      •  gcutil ssh <node running admin server> and set admin’s password
      •  gcutil listinstances (to find your cluster’s IP addresses)
      •  … use the cluster, it’s fully functional
      •  ./mapr-stop-cluster.sh
      •  …billing for cluster stops



* Note that this is not the final interface, but rather is representative of what will be released. Some details omitted for
clarity.
Demo


Let’s run a large sort
Run TeraSort on a 1250-node MapR Hadoop cluster on
Google Compute Engine

      (10 billion records, 1TB of data)
How does this Compare to Terasort
Records?



               MapR on        Record on physical
            Google Compute        hardware
                Engine
Hardware      Virtual/Cloud        Physical
Cores             5024              11680
Disks             1256               5840
Servers           1256               1460
Time            1:20 min           1:02 min
Deployment Comparison


 Current Record

 1460 physical servers        1256 instances
   Prepare datacenter     Invoke gcutil command
 Rack and stack servers
   Maintain hardware



   Months                  Minutes
Cost Comparison


 Current Record

   1460 1U servers x   1256 n1-standard-4-d x
     $4K/server =       $.58/instance hour x
                           80 seconds =




$5,840,000                   $16
                             ($728/hour)
Easy Management at Scale



•  Health
   Monitoring
•  Cluster
   Administration
•  Application
   Resource
   Provisioning
Direct Access NFS™
  File	
  Browsers	
                                     Standard	
  Linux	
  
                                                       Commands	
  &	
  Tools	
  
                                                              grep!
                          Access	
  Directly	
  	
            sed!
                          “Drag	
  &	
  Drop”	
               sort!
                                                              tar!




                         Random	
  Read	
  
                         Random	
  Write	
  


                           Log	
  directly	
  
   Applica=ons	
  
Multi-tenancy
§  Consider a large cluster with lots of storage and
    numerous jobs supporting multiple
    organizations
§  Volumes
     §  Control storage usage
           §  quotas on volumes
           §  quotas on cluster storage by user or
               group
     §  Control data placement
           §  ensure that data is stored in the
               locations you want
     §  Control mirroring and snapshotting
§  Job management
     §  Control where jobs run
           §  ensure that jobs run where you want
     §  Historical view of metrics collected from
         jobs
           §  ease troubleshooting of job issues
§  Security/Protection
     §  Fine grained permissions on volume and
         cluster management, including delegation
MapR: Lights Out Data Center Ready


                                                  Dependable
Reliable Compute
                                                   Storage


 •  Automated	
  stateful	
  failover	
     §  Business	
  con=nuity	
  with	
  	
  
                                                snapshots	
  	
  and	
  mirrors	
  
 •  Automated	
  re-­‐replica=on	
          §  Recover	
  to	
  a	
  point	
  in	
  =me	
  
 •  Self-­‐healing	
  from	
  HW	
  	
      §  End-­‐to-­‐end	
  check	
  
    and	
  SW	
  failures	
                     summing	
  	
  
 •  Load	
  balancing	
                     §  Strong	
  consistency	
  
                                            §  Built	
  in	
  compression	
  
 •  Rolling	
  upgrades	
  
                                            §  Mirror	
  across	
  sites	
  to	
  
 •  No	
  lost	
  jobs	
  or	
  data	
          meet	
  
 •  99999’s	
  of	
  up=me	
                    Recovery	
  Time	
  Objec=ves
MapR Mirroring/COOP Requirements

                                                      Business	
  Con=nuity	
  	
  
  Production                 Research                 and	
  Efficiency	
  

                                                      Efficient	
  design	
  
                      WAN                             §    Differen=al	
  deltas	
  are	
  updated	
  
Datacenter	
  1	
           Datacenter	
  1	
  
                                                      §    Compressed	
  and	
  	
  
                                                            check-­‐summed	
  


                                                      Easy	
  to	
  manage	
  
  Production
                      WAN
                             Cloud                    §    Scheduled	
  or	
  on-­‐demand	
  
                                                      §    WAN,	
  Remote	
  Seeding	
  
                                                      §    Consistent	
  point-­‐in-­‐=me	
  

                                           Compute Engine
MapR Drives Hardware Performance
                                                                                       Typical Hadoop
    % Performance vs. Apache/CDH
                                            450%

                                            400%
                                                                                  Commodity Hardware
                                            350%

                                            300%

                                            250%                                                                                                                  % Perf vs.
                                                                                                                                                                  Apache/CDH
                                            200%

                                            150%

                                            100%

                                              50%

                                               0%
                                                            400MBPS                 1200MBPS              1800MBPS                      SSD
                                                            <6 Drives           12*5400RPM Drives     12*7200RPM Drives               2*10GbE
                                                              1NIC               >1NIC or 10GbE        >1NIC or 10GbE                12+ Cores
                                                             6 Cores                 8 Cores               12 Cores                  64G DRAM
                                                           24G DRAM                 32G DRAM              48G DRAM




                 Why is MapR faster and more efficient?
              §                   No	
  redundant	
  layers	
  (not	
  a	
  file	
  system	
        §    Na=ve	
  compression	
  
                                   over	
  a	
  file	
  system)	
                                    §    Op=mized	
  shuffle	
  
              §                   C/C++	
  vs.	
  Java	
  (higher	
  performance	
  and	
          §    Advanced	
  cache	
  manager	
  
                                   no	
  garbage	
  collec=on	
  freezes)	
                         §    Port	
  scaling	
  (mul=-­‐NIC	
  support)	
  and	
  
              §                   Distributed	
  metadata	
                                              high-­‐speed	
  RPC	
  
Designed for Performance and Scale
                           MapR                    Apache/CDH
     Terasort w/ 1x replication (no compression)
     Total (minutes)       24 min 34 sec           49 min 33 sec
     Map                   9 min 54 sec            28 min 12 sec
     Shuffle               9 min 8 sec             27 min 0 sec
     Terasort w/ 3x replication (no compression)
     Total                 47 min 4 sec            73 min 42 sec
     Map                   11 min 2 sec            30 min 8 sec
     Shuffle               9 min 17 sec            28 min 40 sec
     DFSIO/local write
     Throughput/node       870 MB/s                240 MB/s
     YCSB (HBase benchmark, 50% read, 50% update)
     Throughput            33102 ops/sec           7904 ops/sec
     Latency (r/u)         2.9-4 ms/0.4 ms         7-30 ms/0-5 ms
     YCSB (HBase benchmark, 95% read, 5% update)
     Throughput            18K ops/sec             8500 ops/sec
     Latency (r/u)         5.5-5.7 ms/0.6 ms       12-30 ms/1 ms

     HW: 10 servers, 2 x 4 cores (2.4 GHz), 11 x 2TB, 32 GB
Customer Support

•    24x7x365 “Follow-The-Sun” coverage
      •  Critical customer issues are worked on
         around the clock
•    Dedicated team of Hadoop engineering
     experts
•    Contacting MapR support
      •  Email: support@mapr.com
         (automatically opens a case)
      •  Phone: 1.855.669.6277
      •  Self Service options:
           §  http://answers.mapr.com/
           §  Web Portal: http://mapr.com/
               support
Two MapR Editions – M3 and M5


§    Control	
  System	
                       §    Control	
  System	
  
§    NFS	
  Access	
                           §    NFS	
  Access	
  
§    Performance	
                             §    Performance	
  
§    Unlimited	
  Nodes	
                      §    High	
  Availability	
  
§    Free	
  	
                                §    Snapshots	
  &	
  Mirroring	
  
                                                §    24	
  X	
  7	
  Support	
  
Also Available through:
                                                §    Annual	
  Subscrip=on	
  




                               Compute Engine
Try MapR on Google
   Compute Engine
www.mapr.com/google
Apache Drill
 Interactive Analysis of Large-Scale Datasets
Latency Matters

•    Ad-hoc analysis with interactive tools

•    Real-time dashboards

•    Event/trend detection and analysis
      •  Network intrusion analysis on the fly
      •  Fraud
      •  Failure detection and analysis
Big Data Processing

                 Batch processing   Interactive analysis   Stream processing
Query runtime    Minutes to hours   Milliseconds to        Never-ending
                                    minutes
Data volume      TBs to PBs         GBs to PBs             Continuous stream
Programming      MapReduce          Queries                DAG
model
Users            Developers         Analysts and           Developers
                                    developers
Google project   MapReduce          Dremel
Open source      Hadoop                                    Storm and S4
project          MapReduce




          Introducing Apache Drill…
Innovations
•  MapReduce
    •    Scalable IO and compute trumps efficiency with today's commodity hardware
    •    With large datasets, schemas and indexes are too limiting
    •    Flexibility is more important than efficiency
    •    An easy to use scalable, fault tolerant execution framework is key for large
         clusters
•  Dremel
    •    Columnar storage provides significant performance benefits at scale
    •    Columnar storage with nesting preserves structure and can be very efficient
    •    Avoiding final record assembly as long as possible improves efficiency
    •    Optimizing for the query use case can avoid the full generality of MR and thus
         significantly reduce latency. No need to start JVMs, just push compact queries to
         running agents.
•  Apache Drill
    •  Open source project based upon Dremel’s ideas
    •  More flexibility and openness
More Reading on Apache Drill
•    MapR and Apache Drill
      •  http://www.mapr.com/drill
•    Apache Drill project page
      •  http://incubator.apache.org/projects/drill.html
•    Google’s Dremel
      •  http://research.google.com/pubs/pub36632.html
•    Google’s BigQuery
      •  https://developers.google.com/bigquery/docs/query-reference
•    MIT’s C-Store – a columnar database
      •  http://db.csail.mit.edu/projects/cstore/
•    Microsoft’s Dryad
      •  Distributed execution engine
      •  http://research.microsoft.com/en-us/projects/dryad/
•    Google’s Protobufs
      •  https://developers.google.com/protocol-buffers/docs/proto

Mais conteúdo relacionado

Mais procurados

How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterAltoros
 
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012Amazon Web Services
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 
Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Tsuyoshi OZAWA
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)jmhsieh
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseCloudera, Inc.
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloudSteve Loughran
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudRose Toomey
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationAlex Moundalexis
 
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS StorageWebinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS StorageGlusterFS
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design PatternsDavid Pallmann
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentDataWorks Summit
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
 

Mais procurados (20)

How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop Cluster
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS StorageWebinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
 
Yarnthug2014
Yarnthug2014Yarnthug2014
Yarnthug2014
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design Patterns
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
 
ha_module5
ha_module5ha_module5
ha_module5
 

Destaque

Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoopabord
 
Integrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentIntegrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentMapR Technologies
 
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Case study: Hadoop as ELT for Leading US Retailer - Happiest MindsCase study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Case study: Hadoop as ELT for Leading US Retailer - Happiest MindsHappiest Minds Technologies
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems WebinarCloudera, Inc.
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Edureka!
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationmattlieber
 
Teradata 13.10
Teradata 13.10Teradata 13.10
Teradata 13.10Teradata
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoopmcsrivas
 
Cost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationCost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationDataWorks Summit
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseDataWorks Summit
 

Destaque (15)

Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
HUG France - Apache Drill
HUG France - Apache DrillHUG France - Apache Drill
HUG France - Apache Drill
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoop
 
Integrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentIntegrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environment
 
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Case study: Hadoop as ELT for Leading US Retailer - Happiest MindsCase study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
 
Teradata 13.10
Teradata 13.10Teradata 13.10
Teradata 13.10
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
Cost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationCost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop Implementation
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
 

Semelhante a Google Compute and MapR

OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesMatt Ray
 
Achieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefAchieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefMatt Ray
 
Accumulo Nutch/GORA, Storm, and Pig
Accumulo Nutch/GORA, Storm, and PigAccumulo Nutch/GORA, Storm, and Pig
Accumulo Nutch/GORA, Storm, and PigJason Trost
 
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita IvanovGridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita IvanovJAXLondon2014
 
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on HadoopFeb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on HadoopYahoo Developer Network
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Webappmanager Overview
Webappmanager OverviewWebappmanager Overview
Webappmanager Overviewmythictechno
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastMapR Technologies
 
Spark volume requirements 2018
Spark volume requirements 2018Spark volume requirements 2018
Spark volume requirements 2018Rachit Arora
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesJason TC HOU (侯宗成)
 
Virtualizing Sharepoint for Performance and Availability
Virtualizing Sharepoint for Performance and AvailabilityVirtualizing Sharepoint for Performance and Availability
Virtualizing Sharepoint for Performance and AvailabilityDamir Bersinic
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systexJames Chen
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Deploying OpenStack using Crowbar
Deploying OpenStack using CrowbarDeploying OpenStack using Crowbar
Deploying OpenStack using Crowbaropenstackindia
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qconYiwei Ma
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesDataWorks Summit
 
Architecting a Private Cloud - Cloud Expo
Architecting a Private Cloud - Cloud ExpoArchitecting a Private Cloud - Cloud Expo
Architecting a Private Cloud - Cloud Exposmw355
 
London Hashicorp Meetup #22 - Congruent infrastructure @zopa by Ben Coughlan
London Hashicorp Meetup #22 - Congruent infrastructure @zopa by Ben CoughlanLondon Hashicorp Meetup #22 - Congruent infrastructure @zopa by Ben Coughlan
London Hashicorp Meetup #22 - Congruent infrastructure @zopa by Ben CoughlanBen Coughlan
 

Semelhante a Google Compute and MapR (20)

Philly DB MapR Overview
Philly DB MapR OverviewPhilly DB MapR Overview
Philly DB MapR Overview
 
OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best Practices
 
Achieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefAchieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with Chef
 
Accumulo Nutch/GORA, Storm, and Pig
Accumulo Nutch/GORA, Storm, and PigAccumulo Nutch/GORA, Storm, and Pig
Accumulo Nutch/GORA, Storm, and Pig
 
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita IvanovGridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
 
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on HadoopFeb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Webappmanager Overview
Webappmanager OverviewWebappmanager Overview
Webappmanager Overview
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
Spark volume requirements 2018
Spark volume requirements 2018Spark volume requirements 2018
Spark volume requirements 2018
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
 
Virtualizing Sharepoint for Performance and Availability
Virtualizing Sharepoint for Performance and AvailabilityVirtualizing Sharepoint for Performance and Availability
Virtualizing Sharepoint for Performance and Availability
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Deploying OpenStack using Crowbar
Deploying OpenStack using CrowbarDeploying OpenStack using Crowbar
Deploying OpenStack using Crowbar
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qcon
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
 
Architecting a Private Cloud - Cloud Expo
Architecting a Private Cloud - Cloud ExpoArchitecting a Private Cloud - Cloud Expo
Architecting a Private Cloud - Cloud Expo
 
London Hashicorp Meetup #22 - Congruent infrastructure @zopa by Ben Coughlan
London Hashicorp Meetup #22 - Congruent infrastructure @zopa by Ben CoughlanLondon Hashicorp Meetup #22 - Congruent infrastructure @zopa by Ben Coughlan
London Hashicorp Meetup #22 - Congruent infrastructure @zopa by Ben Coughlan
 

Mais de MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Mais de MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Último

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Google Compute and MapR

  • 1. MapR's Hadoop Distribution on Google Compute Engine
  • 2. Who am I? http://www.mapr.com/company/events/ speaking/devfest-dc-9-28-12 •  Keys Botzum •  kbotzum@maprtech.com •  Senior Principal Technologist, MapR Technologies •  MapR Federal and Eastern Region
  • 3. MapR’s Experience with Google Compute Engine •  Fast –  Virtualized public cloud –  Rivals on-premise physical •  Easy –  Provision 1,000s of servers in minutes •  Cost effective –  Pay only for what you use
  • 4. gcutil is your friend •  Command line tool that runs on your client machines to manage your instances in your cloud •  Remarkably easy to use –  New server/instance: gcutil addinstance –  Connect to a server/instance: gcutil ssh •  Can create your own custom images using Google’s tools –  Using custom images is as easy as addinstance –image <image name> –  MapR is creating custom images for MapR clusters
  • 5. MapReduce: A Paradigm Shift •  Distributed computing platform •  Large clusters •  Commodity hardware •  Pioneered at Google •  BigTable, MapReduce and Google File System •  Commercially available as Hadoop
  • 6. MapR Technologies •  Open, enterprise-grade distribution for Hadoop –  Easy, dependable and fast –  Open source with standards-based extensions •  Hadoop –  Big data analytics –  Inspired by MapReduce paper published by Google scientists Jeffrey Dean and Sanjay Ghemawat in 2004 •  MapR is recognized as a technology leader •  MapR Hadoop Cloud Service now available on Google Compute Engine
  • 8. MapR’s Complete Distribution for Apache Hadoop MapR Control System •  Integrated, tested, hardened and supported MapR Heatmap™ LDAP, NIS Integration Quotas, CLI, REST APT Alerts, Alarms •  Integrated with Accumulo Hive Pig Oozle Sqoop HBase Whirr •  Runs on commodity hardware •  Open source with Accumulo Mahout Cascading Naglos Ganglia Flume Zoo- Integration Integration keeper standards-based extensions for: •  Security •  File-based access Direct Snap- •  Most SQL-based Access Real- Volumes Mirrors Data Time shots Placemen access NFS Streamin t •  Easiest integration g No NameNode High Performance Stateful Failover •  High availability Architecture Direct Shuffle and Self Healing •  Best performance MapR’s Storage Services™ 2.7  
  • 9. Overview of Starting a Cluster •  Google’s gcutil is your friend •  Very easy tool for spinning up instances •  MapR is creating a tool and infrastructure to spin up a fully functional MapR cluster composed of many nodes •  ./mapr-start-cluster.sh –machine-type <…> -masters <#> -slaves <#> •  …wait a few minutes •  gcutil ssh <node running admin server> and set admin’s password •  gcutil listinstances (to find your cluster’s IP addresses) •  … use the cluster, it’s fully functional •  ./mapr-stop-cluster.sh •  …billing for cluster stops * Note that this is not the final interface, but rather is representative of what will be released. Some details omitted for clarity.
  • 10. Demo Let’s run a large sort Run TeraSort on a 1250-node MapR Hadoop cluster on Google Compute Engine (10 billion records, 1TB of data)
  • 11. How does this Compare to Terasort Records? MapR on Record on physical Google Compute hardware Engine Hardware Virtual/Cloud Physical Cores 5024 11680 Disks 1256 5840 Servers 1256 1460 Time 1:20 min 1:02 min
  • 12. Deployment Comparison Current Record 1460 physical servers 1256 instances Prepare datacenter Invoke gcutil command Rack and stack servers Maintain hardware Months Minutes
  • 13. Cost Comparison Current Record 1460 1U servers x 1256 n1-standard-4-d x $4K/server = $.58/instance hour x 80 seconds = $5,840,000 $16 ($728/hour)
  • 14. Easy Management at Scale •  Health Monitoring •  Cluster Administration •  Application Resource Provisioning
  • 15. Direct Access NFS™ File  Browsers   Standard  Linux   Commands  &  Tools   grep! Access  Directly     sed! “Drag  &  Drop”   sort! tar! Random  Read   Random  Write   Log  directly   Applica=ons  
  • 16. Multi-tenancy §  Consider a large cluster with lots of storage and numerous jobs supporting multiple organizations §  Volumes §  Control storage usage §  quotas on volumes §  quotas on cluster storage by user or group §  Control data placement §  ensure that data is stored in the locations you want §  Control mirroring and snapshotting §  Job management §  Control where jobs run §  ensure that jobs run where you want §  Historical view of metrics collected from jobs §  ease troubleshooting of job issues §  Security/Protection §  Fine grained permissions on volume and cluster management, including delegation
  • 17. MapR: Lights Out Data Center Ready Dependable Reliable Compute Storage •  Automated  stateful  failover   §  Business  con=nuity  with     snapshots    and  mirrors   •  Automated  re-­‐replica=on   §  Recover  to  a  point  in  =me   •  Self-­‐healing  from  HW     §  End-­‐to-­‐end  check   and  SW  failures   summing     •  Load  balancing   §  Strong  consistency   §  Built  in  compression   •  Rolling  upgrades   §  Mirror  across  sites  to   •  No  lost  jobs  or  data   meet   •  99999’s  of  up=me   Recovery  Time  Objec=ves
  • 18. MapR Mirroring/COOP Requirements Business  Con=nuity     Production Research and  Efficiency   Efficient  design   WAN §  Differen=al  deltas  are  updated   Datacenter  1   Datacenter  1   §  Compressed  and     check-­‐summed   Easy  to  manage   Production WAN Cloud §  Scheduled  or  on-­‐demand   §  WAN,  Remote  Seeding   §  Consistent  point-­‐in-­‐=me   Compute Engine
  • 19. MapR Drives Hardware Performance Typical Hadoop % Performance vs. Apache/CDH 450% 400% Commodity Hardware 350% 300% 250% % Perf vs. Apache/CDH 200% 150% 100% 50% 0% 400MBPS 1200MBPS 1800MBPS SSD <6 Drives 12*5400RPM Drives 12*7200RPM Drives 2*10GbE 1NIC >1NIC or 10GbE >1NIC or 10GbE 12+ Cores 6 Cores 8 Cores 12 Cores 64G DRAM 24G DRAM 32G DRAM 48G DRAM Why is MapR faster and more efficient? §  No  redundant  layers  (not  a  file  system   §  Na=ve  compression   over  a  file  system)   §  Op=mized  shuffle   §  C/C++  vs.  Java  (higher  performance  and   §  Advanced  cache  manager   no  garbage  collec=on  freezes)   §  Port  scaling  (mul=-­‐NIC  support)  and   §  Distributed  metadata   high-­‐speed  RPC  
  • 20. Designed for Performance and Scale MapR Apache/CDH Terasort w/ 1x replication (no compression) Total (minutes) 24 min 34 sec 49 min 33 sec Map 9 min 54 sec 28 min 12 sec Shuffle 9 min 8 sec 27 min 0 sec Terasort w/ 3x replication (no compression) Total 47 min 4 sec 73 min 42 sec Map 11 min 2 sec 30 min 8 sec Shuffle 9 min 17 sec 28 min 40 sec DFSIO/local write Throughput/node 870 MB/s 240 MB/s YCSB (HBase benchmark, 50% read, 50% update) Throughput 33102 ops/sec 7904 ops/sec Latency (r/u) 2.9-4 ms/0.4 ms 7-30 ms/0-5 ms YCSB (HBase benchmark, 95% read, 5% update) Throughput 18K ops/sec 8500 ops/sec Latency (r/u) 5.5-5.7 ms/0.6 ms 12-30 ms/1 ms HW: 10 servers, 2 x 4 cores (2.4 GHz), 11 x 2TB, 32 GB
  • 21. Customer Support •  24x7x365 “Follow-The-Sun” coverage •  Critical customer issues are worked on around the clock •  Dedicated team of Hadoop engineering experts •  Contacting MapR support •  Email: support@mapr.com (automatically opens a case) •  Phone: 1.855.669.6277 •  Self Service options: §  http://answers.mapr.com/ §  Web Portal: http://mapr.com/ support
  • 22. Two MapR Editions – M3 and M5 §  Control  System   §  Control  System   §  NFS  Access   §  NFS  Access   §  Performance   §  Performance   §  Unlimited  Nodes   §  High  Availability   §  Free     §  Snapshots  &  Mirroring   §  24  X  7  Support   Also Available through: §  Annual  Subscrip=on   Compute Engine
  • 23. Try MapR on Google Compute Engine www.mapr.com/google
  • 24. Apache Drill Interactive Analysis of Large-Scale Datasets
  • 25. Latency Matters •  Ad-hoc analysis with interactive tools •  Real-time dashboards •  Event/trend detection and analysis •  Network intrusion analysis on the fly •  Fraud •  Failure detection and analysis
  • 26. Big Data Processing Batch processing Interactive analysis Stream processing Query runtime Minutes to hours Milliseconds to Never-ending minutes Data volume TBs to PBs GBs to PBs Continuous stream Programming MapReduce Queries DAG model Users Developers Analysts and Developers developers Google project MapReduce Dremel Open source Hadoop Storm and S4 project MapReduce Introducing Apache Drill…
  • 27. Innovations •  MapReduce •  Scalable IO and compute trumps efficiency with today's commodity hardware •  With large datasets, schemas and indexes are too limiting •  Flexibility is more important than efficiency •  An easy to use scalable, fault tolerant execution framework is key for large clusters •  Dremel •  Columnar storage provides significant performance benefits at scale •  Columnar storage with nesting preserves structure and can be very efficient •  Avoiding final record assembly as long as possible improves efficiency •  Optimizing for the query use case can avoid the full generality of MR and thus significantly reduce latency. No need to start JVMs, just push compact queries to running agents. •  Apache Drill •  Open source project based upon Dremel’s ideas •  More flexibility and openness
  • 28. More Reading on Apache Drill •  MapR and Apache Drill •  http://www.mapr.com/drill •  Apache Drill project page •  http://incubator.apache.org/projects/drill.html •  Google’s Dremel •  http://research.google.com/pubs/pub36632.html •  Google’s BigQuery •  https://developers.google.com/bigquery/docs/query-reference •  MIT’s C-Store – a columnar database •  http://db.csail.mit.edu/projects/cstore/ •  Microsoft’s Dryad •  Distributed execution engine •  http://research.microsoft.com/en-us/projects/dryad/ •  Google’s Protobufs •  https://developers.google.com/protocol-buffers/docs/proto