SlideShare uma empresa Scribd logo
1 de 12
Hadoop Now, Next & Beyond
Eric Baldeschwieler, Hortonworks CTO
June 13, 2012
Hadoop Summit Is BIG!

 10x growth                  2012 Summit
 in 5 years!                 2200+ people


              2011 Summit
              1600+ people

2008: First Summit
       200+ people
Timeline: Apache Hadoop 1.0 & 2.0

                                                                        1.0: The most stable release
                0.20.1     DEV        QA    beta
                                                                             3 years of stabilization & key features
HADOOP 1.0




                             DEV       QA          beta
                  0.20.2

                                       Security
                                         DEV         QA              beta
                           0.20.1xx
                                                  Security, MR multi tenancy

                                      0.20.2xx        DEV        QA        beta                                Hadoop 1.0
                                                                      Append                                      GA
                                                               1.0       DEV        QA         beta




                                   New Append
HADOOP 2.0




                                                      DEV
                              0.21
                                                          Security
                                                                        DEV                             QA
                                                   0.22
                                                                            Federation, YARN                           Hadoop 2.0
                                                                     0.23            DEV              QA       alpha
                                                                                           HA, Wire Compatibility
                                                                                                                         alpha
                                                                 QA                                   DEV                    beta
             2.0: Next-gen MapReduce & HDFS       2.0
                  Exciting community innovations under development
               2008                2009                         2010                      2011                 2012
Hadoop 1.0 Key Features
• Flush / Sync for HBase
  – 1.0 is the first Apache Hadoop release to support HBase
  – This work began in 0.18 in 2008!
  – Benefit: Interactive apps – Web site personalization

• Security – Strong authentication via Kerberos
  – Benefit: Audit compliance, multi-tenancy

• MapReduce limits
  – Solve whack-a-mole like bad user job problem
  – Benefit: Reliability, multi-tenancy
Hadoop 2.0 Innovations
•  Focus on Scale and Community Innovation
   –  YARN and Federation designed to support 10,000+ computer clusters

•  YARN: Scalable, Pluggable Execution Frameworks
   –  Improves MapReduce performance
   –  Will support community development of new frameworks
   –  Near real-time, Machine learning & Analytics use cases

•  Federation: Scalable, Pluggable Storage
   –  Isolation via multiple volumes / Name Nodes
   –  Shared block pool w/ pluggable volume managers

•  Always On: No Cluster Downtime
   –  Wire compatible APIs (protobufs)
   –  HDFS hot standby HA
   –  Rolling upgrades
   –  Log & checkpoint management
Balancing Innovation & Stability


                     INNOVATION                                         STABILITY




Source: The above graphic based on concepts from Geoffrey Moore’s book – Crossing the Chasm
Hortonworks Data Platform (HDP) 1.0
HDP 1.0 Highlights

1   Pure Apache Hadoop 1.0 code line, 100% open source


2   Open source Management & Monitoring via Ambari


3   Common Metadata Services via HCatalog


4   Enterprise Data Integration with Talend Open Studio


5   Multi-tenant Protections via Capacity Scheduler


6   Full Stack HA via proven 3rd party products
Management & Monitoring Services -> Ambari

•  Powerful monitoring and alerting dashboards
   –  View topology, health & utilization of cluster
   –  Detailed view of cluster operations, server & storage
      utilization, job status, and performance levels
   –  Get alerts to critical events

•  Simple installation & provisioning
   –  Easy configuration process
   –  One-click deployment for clusters of all sizes
   –  Analyzes/recommends optimal services configuration
   –  Automatically configures mount points in the cluster
Full Stack High Availability
         Proven HA solutions with proven Hadoop 1.0
                                   Failover and restart for
                                       •  NameNode
                                       •  JobTracker
HA Cluster                             •  Other services to come…


                                   Open API allows use of Proven HA
                                   from multiple vendors


                                   Minimized changes to clients and
                                   configuration


                                   Auto-detects failures:
                                   •  Services, OS & Hardware
               HA Cluster

                                   Complementary to 2.0 HA efforts
The Road Ahead
• Ambari
  – REST APIs & general hardening
  – Integrations w/ enterprise & cloud management solutions

• HCatalog
  – ODBC / JDBC, security, relaxed schemas (AVRO, JSON…)
  – More REST APIs and Integrations with 3rd party data stores

• Full Stack HA
  – Continued work with virtualization & operating system vendors

• Native Windows support
  – Integrations with broader Windows ecosystem of systems/tools
Welcome to the Hadoop Summit!


            Enjoy

   Help the grow ecosystem!

Mais conteúdo relacionado

Mais procurados

Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016Anil Nair
 
Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0Dirk Oppenkowski
 
Open stack design 2012 applications targeting openstack-final
Open stack design 2012   applications targeting openstack-finalOpen stack design 2012   applications targeting openstack-final
Open stack design 2012 applications targeting openstack-finalrhirschfeld
 
Cloud Foundry Diego, Lattice, Docker and more
Cloud Foundry Diego, Lattice, Docker and moreCloud Foundry Diego, Lattice, Docker and more
Cloud Foundry Diego, Lattice, Docker and morecornelia davis
 
Apache Ambari Stack Extensibility
Apache Ambari Stack ExtensibilityApache Ambari Stack Extensibility
Apache Ambari Stack ExtensibilityJayush Luniya
 
Accelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
Accelerate Your OpenStack Deployment Presented by SolidFire and Red HatAccelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
Accelerate Your OpenStack Deployment Presented by SolidFire and Red HatNetApp
 
What is the PaaS?
What is the PaaS?What is the PaaS?
What is the PaaS?CloudBees
 
My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12Mat Keep
 
Oracle RAC - New Generation
Oracle RAC - New GenerationOracle RAC - New Generation
Oracle RAC - New GenerationAnil Nair
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessCloudera, Inc.
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
OpenStack and MySQL
OpenStack and MySQLOpenStack and MySQL
OpenStack and MySQLMatt Lord
 
Solaris 11.2 What's New
Solaris 11.2 What's NewSolaris 11.2 What's New
Solaris 11.2 What's NewOrgad Kimchi
 
Scaling paypal workloads with oracle rac ss
Scaling paypal workloads with oracle rac ssScaling paypal workloads with oracle rac ss
Scaling paypal workloads with oracle rac ssAnil Nair
 
Apache Web Server -- Ready for the Enterprise
Apache Web Server -- Ready for the EnterpriseApache Web Server -- Ready for the Enterprise
Apache Web Server -- Ready for the Enterprisewebhostingguy
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...DataWorks Summit
 
1 architecture & design
1   architecture & design1   architecture & design
1 architecture & designMark Swarbrick
 
Rac 12c rel2_operational_best_practices_sangam_2017_as_pdf
Rac 12c rel2_operational_best_practices_sangam_2017_as_pdfRac 12c rel2_operational_best_practices_sangam_2017_as_pdf
Rac 12c rel2_operational_best_practices_sangam_2017_as_pdfAnil Nair
 
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2Jeroen Burgers
 
Ten Real-World Customer Configurations on Oracle Database Appliance
Ten Real-World Customer Configurations on Oracle Database Appliance Ten Real-World Customer Configurations on Oracle Database Appliance
Ten Real-World Customer Configurations on Oracle Database Appliance Simon Haslam
 

Mais procurados (20)

Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016
 
Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0
 
Open stack design 2012 applications targeting openstack-final
Open stack design 2012   applications targeting openstack-finalOpen stack design 2012   applications targeting openstack-final
Open stack design 2012 applications targeting openstack-final
 
Cloud Foundry Diego, Lattice, Docker and more
Cloud Foundry Diego, Lattice, Docker and moreCloud Foundry Diego, Lattice, Docker and more
Cloud Foundry Diego, Lattice, Docker and more
 
Apache Ambari Stack Extensibility
Apache Ambari Stack ExtensibilityApache Ambari Stack Extensibility
Apache Ambari Stack Extensibility
 
Accelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
Accelerate Your OpenStack Deployment Presented by SolidFire and Red HatAccelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
Accelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
 
What is the PaaS?
What is the PaaS?What is the PaaS?
What is the PaaS?
 
My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12
 
Oracle RAC - New Generation
Oracle RAC - New GenerationOracle RAC - New Generation
Oracle RAC - New Generation
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
OpenStack and MySQL
OpenStack and MySQLOpenStack and MySQL
OpenStack and MySQL
 
Solaris 11.2 What's New
Solaris 11.2 What's NewSolaris 11.2 What's New
Solaris 11.2 What's New
 
Scaling paypal workloads with oracle rac ss
Scaling paypal workloads with oracle rac ssScaling paypal workloads with oracle rac ss
Scaling paypal workloads with oracle rac ss
 
Apache Web Server -- Ready for the Enterprise
Apache Web Server -- Ready for the EnterpriseApache Web Server -- Ready for the Enterprise
Apache Web Server -- Ready for the Enterprise
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
1 architecture & design
1   architecture & design1   architecture & design
1 architecture & design
 
Rac 12c rel2_operational_best_practices_sangam_2017_as_pdf
Rac 12c rel2_operational_best_practices_sangam_2017_as_pdfRac 12c rel2_operational_best_practices_sangam_2017_as_pdf
Rac 12c rel2_operational_best_practices_sangam_2017_as_pdf
 
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
 
Ten Real-World Customer Configurations on Oracle Database Appliance
Ten Real-World Customer Configurations on Oracle Database Appliance Ten Real-World Customer Configurations on Oracle Database Appliance
Ten Real-World Customer Configurations on Oracle Database Appliance
 

Destaque

Scrapy talk at DataPhilly
Scrapy talk at DataPhillyScrapy talk at DataPhilly
Scrapy talk at DataPhillyobdit
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceobdit
 
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and HiveJan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and HiveYahoo Developer Network
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 

Destaque (6)

Scrapy talk at DataPhilly
Scrapy talk at DataPhillyScrapy talk at DataPhilly
Scrapy talk at DataPhilly
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and HiveJan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Hive sq lfor-hadoop
Hive sq lfor-hadoopHive sq lfor-hadoop
Hive sq lfor-hadoop
 
SQL in Hadoop
SQL in HadoopSQL in Hadoop
SQL in Hadoop
 

Semelhante a Hadoop Now, Next & Beyond

Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hortonworks
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondDataWorks Summit
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrowSteve Loughran
 
Hadoop 3.0 features
Hadoop 3.0 featuresHadoop 3.0 features
Hadoop 3.0 featuresanand murari
 
Hadoop 3.0 features
Hadoop 3.0 featuresHadoop 3.0 features
Hadoop 3.0 featuresanand murari
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroCloudera, Inc.
 
Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments buildacloud
 
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata EnvironmentsDeploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata EnvironmentsPuppet
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera, Inc.
 
Cloud Foundry Open Tour - London
Cloud Foundry Open Tour - LondonCloud Foundry Open Tour - London
Cloud Foundry Open Tour - Londonmarklucovsky
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Sumeet Singh
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYWangda Tan
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Big Data Joe™ Rossi
 
Hw09 Clouderas Distribution For Hadoop
Hw09   Clouderas Distribution For HadoopHw09   Clouderas Distribution For Hadoop
Hw09 Clouderas Distribution For HadoopCloudera, Inc.
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 

Semelhante a Hadoop Now, Next & Beyond (20)

Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
 
Hadoop Versioning
Hadoop VersioningHadoop Versioning
Hadoop Versioning
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 
Hadoop 3.0 features
Hadoop 3.0 featuresHadoop 3.0 features
Hadoop 3.0 features
 
Hadoop 3.0 features
Hadoop 3.0 featuresHadoop 3.0 features
Hadoop 3.0 features
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments
 
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata EnvironmentsDeploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
 
1.0 vs2.0
1.0 vs2.01.0 vs2.0
1.0 vs2.0
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
 
Cloud Foundry Open Tour - London
Cloud Foundry Open Tour - LondonCloud Foundry Open Tour - London
Cloud Foundry Open Tour - London
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2
 
OpenStack at PayPal
OpenStack at PayPalOpenStack at PayPal
OpenStack at PayPal
 
Hw09 Clouderas Distribution For Hadoop
Hw09   Clouderas Distribution For HadoopHw09   Clouderas Distribution For Hadoop
Hw09 Clouderas Distribution For Hadoop
 
The Future of Hbase
The Future of HbaseThe Future of Hbase
The Future of Hbase
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

Último (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Hadoop Now, Next & Beyond

  • 1. Hadoop Now, Next & Beyond Eric Baldeschwieler, Hortonworks CTO June 13, 2012
  • 2. Hadoop Summit Is BIG! 10x growth 2012 Summit in 5 years! 2200+ people 2011 Summit 1600+ people 2008: First Summit 200+ people
  • 3. Timeline: Apache Hadoop 1.0 & 2.0 1.0: The most stable release 0.20.1 DEV QA beta 3 years of stabilization & key features HADOOP 1.0 DEV QA beta 0.20.2 Security DEV QA beta 0.20.1xx Security, MR multi tenancy 0.20.2xx DEV QA beta Hadoop 1.0 Append GA 1.0 DEV QA beta New Append HADOOP 2.0 DEV 0.21 Security DEV QA 0.22 Federation, YARN Hadoop 2.0 0.23 DEV QA alpha HA, Wire Compatibility alpha QA DEV beta 2.0: Next-gen MapReduce & HDFS 2.0 Exciting community innovations under development 2008 2009 2010 2011 2012
  • 4. Hadoop 1.0 Key Features • Flush / Sync for HBase – 1.0 is the first Apache Hadoop release to support HBase – This work began in 0.18 in 2008! – Benefit: Interactive apps – Web site personalization • Security – Strong authentication via Kerberos – Benefit: Audit compliance, multi-tenancy • MapReduce limits – Solve whack-a-mole like bad user job problem – Benefit: Reliability, multi-tenancy
  • 5. Hadoop 2.0 Innovations •  Focus on Scale and Community Innovation –  YARN and Federation designed to support 10,000+ computer clusters •  YARN: Scalable, Pluggable Execution Frameworks –  Improves MapReduce performance –  Will support community development of new frameworks –  Near real-time, Machine learning & Analytics use cases •  Federation: Scalable, Pluggable Storage –  Isolation via multiple volumes / Name Nodes –  Shared block pool w/ pluggable volume managers •  Always On: No Cluster Downtime –  Wire compatible APIs (protobufs) –  HDFS hot standby HA –  Rolling upgrades –  Log & checkpoint management
  • 6. Balancing Innovation & Stability INNOVATION STABILITY Source: The above graphic based on concepts from Geoffrey Moore’s book – Crossing the Chasm
  • 8. HDP 1.0 Highlights 1 Pure Apache Hadoop 1.0 code line, 100% open source 2 Open source Management & Monitoring via Ambari 3 Common Metadata Services via HCatalog 4 Enterprise Data Integration with Talend Open Studio 5 Multi-tenant Protections via Capacity Scheduler 6 Full Stack HA via proven 3rd party products
  • 9. Management & Monitoring Services -> Ambari •  Powerful monitoring and alerting dashboards –  View topology, health & utilization of cluster –  Detailed view of cluster operations, server & storage utilization, job status, and performance levels –  Get alerts to critical events •  Simple installation & provisioning –  Easy configuration process –  One-click deployment for clusters of all sizes –  Analyzes/recommends optimal services configuration –  Automatically configures mount points in the cluster
  • 10. Full Stack High Availability Proven HA solutions with proven Hadoop 1.0 Failover and restart for •  NameNode •  JobTracker HA Cluster •  Other services to come… Open API allows use of Proven HA from multiple vendors Minimized changes to clients and configuration Auto-detects failures: •  Services, OS & Hardware HA Cluster Complementary to 2.0 HA efforts
  • 11. The Road Ahead • Ambari – REST APIs & general hardening – Integrations w/ enterprise & cloud management solutions • HCatalog – ODBC / JDBC, security, relaxed schemas (AVRO, JSON…) – More REST APIs and Integrations with 3rd party data stores • Full Stack HA – Continued work with virtualization & operating system vendors • Native Windows support – Integrations with broader Windows ecosystem of systems/tools
  • 12. Welcome to the Hadoop Summit! Enjoy Help the grow ecosystem!