4. Why Hadoop World?
Time to Upgrade Your Data Management Strategy
▪ Hadoop isn’t just for Web Companies anymore
▪ Terabytes are common place
▪ Enables consumption of all enterprise data
▪ Wide adoption across verticals
▪ Hadoop is driven by the Community
▪ Most registrants are new to Hadoop
▪ Sharing experience is critical - and incredibly valuable
▪ Users and Developers exchanging needs and ideas
6. Growing Up with Hadoop
You’ve come a long way baby...
▪ Early Days
▪ 2004: Google Publishes MapReduce/GFS
▪ 2005: Hadoop Prototype
▪ Doug Cutting and Mike Cafarella
▪ 2006: Hadoop Running on 20 nodes
▪ Internet Archive and UW
Doug Cutting
Photo Credit: New York Times
7. Growing Up with Hadoop
You’ve come a long way baby...
▪ Formative Years
▪ 2006: Yahoo! Begins Major Investment
▪ 2007: Yahoo! Runs Hadoop on 2000 nodes
▪ 2008: Yahoo! uses Hadoop to claim Terasort
Benchmark
8. Growing Up with Hadoop
You’ve come a long way baby...
▪ 5 Major Releases for Hadoop in last year
▪ More Reliable
▪ More Scalable
▪ More Manageable
9. Growing Up with Hadoop
You’ve come a long way baby...
▪ New Sub-Projects Embrace New Users
▪ Hive: SQL Data Warehouse for Hadoop
▪ Pig: Data Analysis Language
10. Growing Up with Hadoop
You’ve come a long way baby...
▪ Sqoop: Database import for Hadoop
▪ Developer by Aaron Kimball, Cloudera
▪ Works over JDBC
▪ Extensible for better pefromance
11. Growing Up with Hadoop
You’ve come a long way baby...
▪ RDBMS Vendors Embrace Hadoop
▪ MapReduce is great for Analytics
▪ Hadoop is the MapReduce Standard
▪ integrates directly with Hadoop
12. Growing Up with Hadoop
You’ve come a long way baby...
▪ Adoption Spanning Globe
▪ HUGs outside the US
▪ Over 10x Companies “PoweredBy”
▪ Not Just for Web Companies Anymore
15. Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community
Latest Stable Hadoop Release
Stable Upcoming Features Distribution for Hadoop
(by customer request)
Hadoop Community
16. Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community
Source Code Powering Y!
Latest Stable Hadoop Release
Improvements for EC2 and S3
Stable Upcoming Features Distribution for Hadoop
(by customer request)
New Features from Cloudera
Hadoop Community
17. Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community
Source Code Powering Y!
Latest Stable Hadoop Release
Improvements for EC2 and S3
Stable Upcoming Features Distribution for Hadoop
(by customer request)
New Features from Cloudera
Cloudera Enhancements
Bug Fixes
Hadoop Community Contributed to Apache
18. Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community
Distribution for Hadoop
Cross-Platform Packaging,
Integration and Testing
Hive, Pig, Sqoop, ...
Support
19. Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community
Private Cloud
Distribution for Hadoop
Cross-Platform Packaging,
Integration and Testing
Hive, Pig, Sqoop, ...
Support
Pac
kag
es
20. Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community
Private Cloud Public Cloud
Distribution for Hadoop
Cross-Platform Packaging,
Integration and Testing
Hive, Pig, Sqoop, ...
Support
Pac
kag ges
es Ima
21. Comparing Growth Rates since March 2009
Standard Packaging Drives Adoption
▪ Consistent Downloads Cloudera Downloads
from Apache Apache Downloads
1,835%
Cloudera Packages
1,392%
▪
Drive New Usage
1,026%
762%
▪ Enables New Hadoop
Applications 384%
238%
100%
133%
100% 96% 95% 93% 97% 95%
March 2009 May 2009 July 09 Aug 09 Sept 09
Normalized by unique users accessing hadoop.apache.org/core/releases.html and Cloudera Package
Repositories in March 2009
22. Cloudera’s Business to Date
Support, Training and Professional Services
▪ Dozens of Support Customers
▪ Using Hadoop for real enterprise workloads
▪ Training and Certification
▪ 100’s of engineers trained
▪ Sysadmin and Manager programs launched at Hadoop World
▪ Professional Services