O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Cloudera's Original Pitch Deck from 2008

4.234 visualizações

Publicada em

This is a selection of slides from Cloudera's 2008 pitch deck to raise a $5 million Series A. Accel wound up winning the deal and became the initial investor in the company.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Cloudera's Original Pitch Deck from 2008

  1. 1. Cloudera:Cloudera: Hadoop for the EnterpriseHadoop for the Enterprise September 2008September 2008
  2. 2. Data Growing Much Faster thanData Growing Much Faster than Moore’s LawMoore’s Law 04/21/17 Cloudera ConfidentialCloudera Confidential 22 Source: Richard Winter, Why Are Data Warehouses Growing so Fast?, April 2008
  3. 3. Uniprocessor PerformanceUniprocessor Performance 04/21/17 33Cloudera ConfidentialCloudera Confidential
  4. 4. Founding TeamFounding Team • Mike Olson, CEOMike Olson, CEO – CEO SleepycatCEO Sleepycat – Britton Lee, Illustra,Britton Lee, Illustra, Informix, OracleInformix, Oracle – BA, MS CS, BerkeleyBA, MS CS, Berkeley • Amr Awadallah, CTO, VPAmr Awadallah, CTO, VP EngineeringEngineering – Founder Aptivia/VivaSmartFounder Aptivia/VivaSmart – 8 years at Yahoo! running8 years at Yahoo! running BI infrastructure, includingBI infrastructure, including HadoopHadoop – PhD EE, StanfordPhD EE, Stanford • Christophe Bisciglia, VPChristophe Bisciglia, VP TechnologyTechnology – Created Google/NSFCreated Google/NSF Hadoop cluster andHadoop cluster and programprogram – BA CS, U WashingtonBA CS, U Washington • Jeff Hammerbacher, VPJeff Hammerbacher, VP ProductProduct – Ran world’s largestRan world’s largest operational BI supportoperational BI support system on Hadoop, atsystem on Hadoop, at FacebookFacebook – BA Mathematics, HarvardBA Mathematics, Harvard 04/21/17 44Cloudera ConfidentialCloudera Confidential
  5. 5. What Is Hadoop?What Is Hadoop? • Core engine:Core engine: – Open source implementation of Google’sOpen source implementation of Google’s MapReduce and GFSMapReduce and GFS – Hundreds or thousands of serversHundreds or thousands of servers parallelize a data analysis taskparallelize a data analysis task • Interfaces built on top of MapReduceInterfaces built on top of MapReduce • Storage layer beneath (HDFS)Storage layer beneath (HDFS) • Doug Cutting, Mike Cafarella areDoug Cutting, Mike Cafarella are advisorsadvisors 04/21/17 55Cloudera ConfidentialCloudera Confidential
  6. 6. Hadoop is Open SourceHadoop is Open Source • Hadoop is distributed under the Apache License:Hadoop is distributed under the Apache License: – Reduces concern about lock-inReduces concern about lock-in – Low-cost, effective distribution strategyLow-cost, effective distribution strategy – Allows innovation by partners, customersAllows innovation by partners, customers – Third-party inspection of source code providesThird-party inspection of source code provides assurances on security, product qualityassurances on security, product quality • Business-friendly license encourages commercialBusiness-friendly license encourages commercial developmentdevelopment – ““Open core” licensingOpen core” licensing – Closed-source components, applicationsClosed-source components, applications 04/21/17 66Cloudera ConfidentialCloudera Confidential
  7. 7. Hadoop UsersHadoop Users 04/21/17 77Cloudera ConfidentialCloudera Confidential
  8. 8. Momentum: Google TrendsMomentum: Google Trends 04/21/17 88Cloudera ConfidentialCloudera Confidential Netezza: $127M in FY08, $79M in FY07 Teradata: $830M in 1H08, $1.7B in FY07
  9. 9. Worldwide PhenomenonWorldwide Phenomenon 04/21/17 99Cloudera ConfidentialCloudera Confidential Source: Google Insights world map for searches on “hadoop”, Sept 2008.
  10. 10. Why is Hadoop Successful?Why is Hadoop Successful? • BringsBrings computation closer to datacomputation closer to data allowing both IO and computeallowing both IO and compute scalability.scalability. • Map-ReduceMap-Reduce forces developers toforces developers to thinkthink in a parallel wayin a parallel way • Operates onOperates on unstructured dataunstructured data , and, and structured datastructured data (HBASE, HIVE)(HBASE, HIVE) • Prescriptive developmentPrescriptive development , grows with, grows with you without needing to re-architectyou without needing to re-architect • Procedural languageProcedural language offers poweroffers power 04/21/17 1010Cloudera ConfidentialCloudera Confidential
  11. 11. Current Systems Isolate Users fromCurrent Systems Isolate Users from the Event Level Raw Datathe Event Level Raw Data File Server Farm for Warehouse (File Server Farm for Warehouse (non-queryablenon-queryable)) Warehouse Pre-ProcessingWarehouse Pre-Processing InstrumentationInstrumentation Log CollectionLog Collection Datamart DatabaseDatamart Database BI ReportingBI Reporting MySQLMySQL MemCachedMemCached Live Web SiteLive Web SiteData MiningData Mining R, Weka,R, Weka, SAS, SPSSSAS, SPSS ETLETL ETLETL ETLETL ETLETL ETLETL ETLETL Non-Consumption Expensive ETL Grids Expensive ETL Grids 04/21/17 1111Cloudera ConfidentialCloudera Confidential
  12. 12. Solution: “Smart” Storage ServiceSolution: “Smart” Storage Service Smart Storage: Grid For File Storage & Data ProcessingSmart Storage: Grid For File Storage & Data Processing Warehouse Pre-ProcessingWarehouse Pre-Processing InstrumentationInstrumentation Log CollectionLog Collection Datamart DatabaseDatamart Database BI ReportingBI Reporting MySQLMySQL MemCachedMemCached Live Web SiteLive Web SiteData MiningData Mining R, Weka,R, Weka, SAS, SPSSSAS, SPSS Enable Consumption Eliminate Expensive ETL Grids Eliminate Expensive ETL Grids 04/21/17 1212Cloudera ConfidentialCloudera Confidential
  13. 13. BDP versus OLAP/OLTPBDP versus OLAP/OLTP Schema Complexity Processing Freedom Table Join Complexity Concurrent Jobs Responsiveness Per Job Data Volume Data Update Pattern 100TB Unstructured 100TB 1PB Append OnlyRead/Write 100PB Total Data Volume Structured SQL Generic Data Processing Batch Interactive 1000 100 Tables 10PB 1PB 10PB 100PB OLAP/OLTP Batch Data Processing 04/21/17 1313Cloudera ConfidentialCloudera Confidential
  14. 14. 04/21/17 Cloudera ConfidentialCloudera Confidential 1414 Source: Merrill Lynch Industry Overview, May 7, 2008
  15. 15. Cloudera DifferentiatorsCloudera Differentiators • Enabling Hadoop as an elastic platform withEnabling Hadoop as an elastic platform with statistical multiplexing over many customersstatistical multiplexing over many customers • Multi-Tenant Support:Multi-Tenant Support: Concurrency, Priority, NamespaceConcurrency, Priority, Namespace Isolation, Performance Isolation.Isolation, Performance Isolation. • Monitoring, Reliability, and AvailabilityMonitoring, Reliability, and Availability • Resilience and Fast RecoveryResilience and Fast Recovery : A: A non-sexy problemnon-sexy problem that isthat is critical to enterprisescritical to enterprises , no time to restart ETL job, no time to restart ETL job from scratch, otherwise misses SLA.from scratch, otherwise misses SLA. • IDEIDE to easilyto easily debug, deploy, and tune.debug, deploy, and tune. • Integration withIntegration with data mining and analysisdata mining and analysis functionality (R,functionality (R, Weka, SAS, SPSS)Weka, SAS, SPSS) • Connector certificationConnector certification : another non-sexy problem that is: another non-sexy problem that is ignored by community, make sure system is compatible withignored by community, make sure system is compatible with other enterprise systems.other enterprise systems. 04/21/17 1515Cloudera ConfidentialCloudera Confidential