SlideShare uma empresa Scribd logo
1 de 12
Hadoop Now, Next & Beyond
Eric Baldeschwieler, Hortonworks CTO
June 13, 2012
Hadoop Summit Is BIG!

 10x growth                  2012 Summit
 in 5 years!                 2200+ people


              2011 Summit
              1600+ people

2008: First Summit
       200+ people
Timeline: Apache Hadoop 1.0 & 2.0

                                                                        1.0: The most stable release
                0.20.1     DEV        QA    beta
                                                                             3 years of stabilization & key features
HADOOP 1.0




                             DEV       QA          beta
                  0.20.2

                                       Security
                                         DEV         QA              beta
                           0.20.1xx
                                                  Security, MR multi tenancy

                                      0.20.2xx        DEV        QA        beta                                Hadoop 1.0
                                                                      Append                                      GA
                                                               1.0       DEV        QA         beta




                                   New Append
HADOOP 2.0




                                                      DEV
                              0.21
                                                          Security
                                                                        DEV                             QA
                                                   0.22
                                                                            Federation, YARN                           Hadoop 2.0
                                                                     0.23            DEV              QA       alpha
                                                                                           HA, Wire Compatibility
                                                                                                                         alpha
                                                                 QA                                   DEV                    beta
             2.0: Next-gen MapReduce & HDFS       2.0
                  Exciting community innovations under development
               2008                2009                         2010                      2011                 2012
Hadoop 1.0 Key Features
• Flush / Sync for HBase
  – 1.0 is the first Apache Hadoop release to support HBase
  – This work began in 0.18 in 2008!
  – Benefit: Interactive apps – Web site personalization

• Security – Strong authentication via Kerberos
  – Benefit: Audit compliance, multi-tenancy

• MapReduce limits
  – Solve whack-a-mole like bad user job problem
  – Benefit: Reliability, multi-tenancy
Hadoop 2.0 Innovations
•  Focus on Scale and Community Innovation
   –  YARN and Federation designed to support 10,000+ computer clusters

•  YARN: Scalable, Pluggable Execution Frameworks
   –  Improves MapReduce performance
   –  Will support community development of new frameworks
   –  Near real-time, Machine learning & Analytics use cases

•  Federation: Scalable, Pluggable Storage
   –  Isolation via multiple volumes / Name Nodes
   –  Shared block pool w/ pluggable volume managers

•  Always On: No Cluster Downtime
   –  Wire compatible APIs (protobufs)
   –  HDFS hot standby HA
   –  Rolling upgrades
   –  Log & checkpoint management
Balancing Innovation & Stability


                     INNOVATION                                         STABILITY




Source: The above graphic based on concepts from Geoffrey Moore’s book – Crossing the Chasm
Hortonworks Data Platform (HDP) 1.0
HDP 1.0 Highlights

1   Pure Apache Hadoop 1.0 code line, 100% open source


2   Open source Management & Monitoring via Ambari


3   Common Metadata Services via HCatalog


4   Enterprise Data Integration with Talend Open Studio


5   Multi-tenant Protections via Capacity Scheduler


6   Full Stack HA via proven 3rd party products
Management & Monitoring Services -> Ambari

•  Powerful monitoring and alerting dashboards
   –  View topology, health & utilization of cluster
   –  Detailed view of cluster operations, server & storage
      utilization, job status, and performance levels
   –  Get alerts to critical events

•  Simple installation & provisioning
   –  Easy configuration process
   –  One-click deployment for clusters of all sizes
   –  Analyzes/recommends optimal services configuration
   –  Automatically configures mount points in the cluster
Full Stack High Availability
         Proven HA solutions with proven Hadoop 1.0
                                   Failover and restart for
                                       •  NameNode
                                       •  JobTracker
HA Cluster                             •  Other services to come…


                                   Open API allows use of Proven HA
                                   from multiple vendors


                                   Minimized changes to clients and
                                   configuration


                                   Auto-detects failures:
                                   •  Services, OS & Hardware
               HA Cluster

                                   Complementary to 2.0 HA efforts
The Road Ahead
• Ambari
  – REST APIs & general hardening
  – Integrations w/ enterprise & cloud management solutions

• HCatalog
  – ODBC / JDBC, security, relaxed schemas (AVRO, JSON…)
  – More REST APIs and Integrations with 3rd party data stores

• Full Stack HA
  – Continued work with virtualization & operating system vendors

• Native Windows support
  – Integrations with broader Windows ecosystem of systems/tools
Welcome to the Hadoop Summit!


            Enjoy

   Help the grow ecosystem!

Mais conteúdo relacionado

Mais procurados

Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0
Dirk Oppenkowski
 
Open stack design 2012 applications targeting openstack-final
Open stack design 2012   applications targeting openstack-finalOpen stack design 2012   applications targeting openstack-final
Open stack design 2012 applications targeting openstack-final
rhirschfeld
 
What is the PaaS?
What is the PaaS?What is the PaaS?
What is the PaaS?
CloudBees
 
Apache Web Server -- Ready for the Enterprise
Apache Web Server -- Ready for the EnterpriseApache Web Server -- Ready for the Enterprise
Apache Web Server -- Ready for the Enterprise
webhostingguy
 

Mais procurados (20)

Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016
 
Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0
 
Open stack design 2012 applications targeting openstack-final
Open stack design 2012   applications targeting openstack-finalOpen stack design 2012   applications targeting openstack-final
Open stack design 2012 applications targeting openstack-final
 
Cloud Foundry Diego, Lattice, Docker and more
Cloud Foundry Diego, Lattice, Docker and moreCloud Foundry Diego, Lattice, Docker and more
Cloud Foundry Diego, Lattice, Docker and more
 
Apache Ambari Stack Extensibility
Apache Ambari Stack ExtensibilityApache Ambari Stack Extensibility
Apache Ambari Stack Extensibility
 
Accelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
Accelerate Your OpenStack Deployment Presented by SolidFire and Red HatAccelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
Accelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
 
What is the PaaS?
What is the PaaS?What is the PaaS?
What is the PaaS?
 
My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12
 
Oracle RAC - New Generation
Oracle RAC - New GenerationOracle RAC - New Generation
Oracle RAC - New Generation
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
OpenStack and MySQL
OpenStack and MySQLOpenStack and MySQL
OpenStack and MySQL
 
Solaris 11.2 What's New
Solaris 11.2 What's NewSolaris 11.2 What's New
Solaris 11.2 What's New
 
Scaling paypal workloads with oracle rac ss
Scaling paypal workloads with oracle rac ssScaling paypal workloads with oracle rac ss
Scaling paypal workloads with oracle rac ss
 
Apache Web Server -- Ready for the Enterprise
Apache Web Server -- Ready for the EnterpriseApache Web Server -- Ready for the Enterprise
Apache Web Server -- Ready for the Enterprise
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
1 architecture & design
1   architecture & design1   architecture & design
1 architecture & design
 
Rac 12c rel2_operational_best_practices_sangam_2017_as_pdf
Rac 12c rel2_operational_best_practices_sangam_2017_as_pdfRac 12c rel2_operational_best_practices_sangam_2017_as_pdf
Rac 12c rel2_operational_best_practices_sangam_2017_as_pdf
 
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
 
Ten Real-World Customer Configurations on Oracle Database Appliance
Ten Real-World Customer Configurations on Oracle Database Appliance Ten Real-World Customer Configurations on Oracle Database Appliance
Ten Real-World Customer Configurations on Oracle Database Appliance
 

Destaque

Destaque (6)

Scrapy talk at DataPhilly
Scrapy talk at DataPhillyScrapy talk at DataPhilly
Scrapy talk at DataPhilly
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and HiveJan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Hive sq lfor-hadoop
Hive sq lfor-hadoopHive sq lfor-hadoop
Hive sq lfor-hadoop
 
SQL in Hadoop
SQL in HadoopSQL in Hadoop
SQL in Hadoop
 

Semelhante a Hadoop Now, Next & Beyond

Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera, Inc.
 
Hw09 Clouderas Distribution For Hadoop
Hw09   Clouderas Distribution For HadoopHw09   Clouderas Distribution For Hadoop
Hw09 Clouderas Distribution For Hadoop
Cloudera, Inc.
 

Semelhante a Hadoop Now, Next & Beyond (20)

Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
 
Hadoop Versioning
Hadoop VersioningHadoop Versioning
Hadoop Versioning
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 
Hadoop 3.0 features
Hadoop 3.0 featuresHadoop 3.0 features
Hadoop 3.0 features
 
Hadoop 3.0 features
Hadoop 3.0 featuresHadoop 3.0 features
Hadoop 3.0 features
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments
 
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata EnvironmentsDeploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
 
1.0 vs2.0
1.0 vs2.01.0 vs2.0
1.0 vs2.0
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
 
Cloud Foundry Open Tour - London
Cloud Foundry Open Tour - LondonCloud Foundry Open Tour - London
Cloud Foundry Open Tour - London
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2
 
OpenStack at PayPal
OpenStack at PayPalOpenStack at PayPal
OpenStack at PayPal
 
Hw09 Clouderas Distribution For Hadoop
Hw09   Clouderas Distribution For HadoopHw09   Clouderas Distribution For Hadoop
Hw09 Clouderas Distribution For Hadoop
 
The Future of Hbase
The Future of HbaseThe Future of Hbase
The Future of Hbase
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 

Mais de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Hadoop Now, Next & Beyond

  • 1. Hadoop Now, Next & Beyond Eric Baldeschwieler, Hortonworks CTO June 13, 2012
  • 2. Hadoop Summit Is BIG! 10x growth 2012 Summit in 5 years! 2200+ people 2011 Summit 1600+ people 2008: First Summit 200+ people
  • 3. Timeline: Apache Hadoop 1.0 & 2.0 1.0: The most stable release 0.20.1 DEV QA beta 3 years of stabilization & key features HADOOP 1.0 DEV QA beta 0.20.2 Security DEV QA beta 0.20.1xx Security, MR multi tenancy 0.20.2xx DEV QA beta Hadoop 1.0 Append GA 1.0 DEV QA beta New Append HADOOP 2.0 DEV 0.21 Security DEV QA 0.22 Federation, YARN Hadoop 2.0 0.23 DEV QA alpha HA, Wire Compatibility alpha QA DEV beta 2.0: Next-gen MapReduce & HDFS 2.0 Exciting community innovations under development 2008 2009 2010 2011 2012
  • 4. Hadoop 1.0 Key Features • Flush / Sync for HBase – 1.0 is the first Apache Hadoop release to support HBase – This work began in 0.18 in 2008! – Benefit: Interactive apps – Web site personalization • Security – Strong authentication via Kerberos – Benefit: Audit compliance, multi-tenancy • MapReduce limits – Solve whack-a-mole like bad user job problem – Benefit: Reliability, multi-tenancy
  • 5. Hadoop 2.0 Innovations •  Focus on Scale and Community Innovation –  YARN and Federation designed to support 10,000+ computer clusters •  YARN: Scalable, Pluggable Execution Frameworks –  Improves MapReduce performance –  Will support community development of new frameworks –  Near real-time, Machine learning & Analytics use cases •  Federation: Scalable, Pluggable Storage –  Isolation via multiple volumes / Name Nodes –  Shared block pool w/ pluggable volume managers •  Always On: No Cluster Downtime –  Wire compatible APIs (protobufs) –  HDFS hot standby HA –  Rolling upgrades –  Log & checkpoint management
  • 6. Balancing Innovation & Stability INNOVATION STABILITY Source: The above graphic based on concepts from Geoffrey Moore’s book – Crossing the Chasm
  • 8. HDP 1.0 Highlights 1 Pure Apache Hadoop 1.0 code line, 100% open source 2 Open source Management & Monitoring via Ambari 3 Common Metadata Services via HCatalog 4 Enterprise Data Integration with Talend Open Studio 5 Multi-tenant Protections via Capacity Scheduler 6 Full Stack HA via proven 3rd party products
  • 9. Management & Monitoring Services -> Ambari •  Powerful monitoring and alerting dashboards –  View topology, health & utilization of cluster –  Detailed view of cluster operations, server & storage utilization, job status, and performance levels –  Get alerts to critical events •  Simple installation & provisioning –  Easy configuration process –  One-click deployment for clusters of all sizes –  Analyzes/recommends optimal services configuration –  Automatically configures mount points in the cluster
  • 10. Full Stack High Availability Proven HA solutions with proven Hadoop 1.0 Failover and restart for •  NameNode •  JobTracker HA Cluster •  Other services to come… Open API allows use of Proven HA from multiple vendors Minimized changes to clients and configuration Auto-detects failures: •  Services, OS & Hardware HA Cluster Complementary to 2.0 HA efforts
  • 11. The Road Ahead • Ambari – REST APIs & general hardening – Integrations w/ enterprise & cloud management solutions • HCatalog – ODBC / JDBC, security, relaxed schemas (AVRO, JSON…) – More REST APIs and Integrations with 3rd party data stores • Full Stack HA – Continued work with virtualization & operating system vendors • Native Windows support – Integrations with broader Windows ecosystem of systems/tools
  • 12. Welcome to the Hadoop Summit! Enjoy Help the grow ecosystem!