Webinar: Inside Cloudera's Distribution including Apache Hadoop v3
1. Welcome toInside Cloudera’s Distribution including Apache Hadoop Audio/Telephone: +1 (314) 627-1519 Access Code: 380-729-510 Audio PIN: Shown after joining the Webinar Presenter: Charles Zedlewski, Cloudera VP of Product
2. Housekeeping Ask questions at any time using the Questions panel Problems? Use the Chat panel Slides and recording will be available 2 Copyright 2011 Cloudera Inc. All rights reserved
3. What Cloudera set out to do with CDH3 Give organizations an integrated, complete data management system that is 100% Apache open source Provide a platform that the rest of the enterprise IT ecosystem could integrate with Continue to make Apache Hadoop even easier to adopt Provide a level of release predictability that organizations can plan their maintenance and upgrade cycles on
4. An integrated data management system – what did Google do? Dremel Evenflow Evenflow Dremel Sawzall Bigtable MySQL Gateway MapReduce / GFS Chubby
8. CDH3 assembled the best of the Apache Hadoop ecosystem into an integrated system so you don’t have to Cloudera’s Distribution Including Apache Hadoop Hue Hue Oozie Oozie Hive Hive / Pig HBase Sqoop Flume Zookeeper
9. How CDH3 got created Enhancements written and contributed to Apache projects Customer and partner requirements Integration, testing, & backporting Releases selected or cut Stable release! Hadoop 0.20.2 +923 HBase 0.90.1 +15 Hive 0.7 +22 Pig 0.8 +20 Flume 0.9.3 +17 Oozie 20.2 +31 Hue 1.2.0 +0 Sqoop 1.2 +24 Zookeeper 3.3.3 +12 HDFS Beta cycle, more backporting Prioritization based on customer value, cost and (for CDH) community readiness HBase Flume, etc
10. CDH2 to CDH3 Copyright 2011. Cloudera confidential and proprietary. Redistribution without permission is not permitted
12. Example 1 – clickstream sessions Hive Store table metadata MapReduce Sqoop Reliably collect logs Process into sessions Export in EDW for BI reporting Flume HDFS Store in the filesystem
13. Example 2 – fraud analysis Sqoop Hive Analytics performed using HQL Import regularly changing dimension data HBase MapReduce Sqoop HDFS Import of fact data into filesystem
14. What Cloudera set out to do with CDH3 Give organizations an integrated, complete data management system that is 100% Apache open source Provide a platform that the rest of the enterprise IT ecosystem could integrate with Continue to make Apache Hadoop even easier to adopt Provide a level of release predictability that organizations can plan their maintenance and upgrade cycles on
15. Investing in interfacing with the Enterprise IT ecosystem Drivers, language enhancements, testing Cloudera’s Distribution Including Apache Hadoop Sqoop frame-work, adapters More coming… Packaging, testing
16. What Cloudera set out to do with CDH3 Give organizations an integrated, complete data management system that is 100% Apache open source Provide a platform that the rest of the enterprise IT ecosystem could integrate with Continue to make Apache Hadoop even easier to adopt Provide a level of release predictability that organizations can plan their maintenance and upgrade cycles on
17. Ease of adoption - making CDH a more enterprise quality artifact Regular, non-disruptive updates
18. There are new features for each component too (partial list)