O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Ventana Research Presents: Best Practices with Hadoop - Real World Data

3.646 visualizações

Publicada em

Ventana Research shares their findings from the first large-scale, effective research on Hadoop and Information Management.

Publicada em: Tecnologia, Negócios
  • Seja o primeiro a comentar

Ventana Research Presents: Best Practices with Hadoop - Real World Data

  1. 1. David Menninger of Ventana Research Presents: Best Practices with Hadoop - Real World DataAudio/Telephone: +1 (909) 259-0012Access Code: 622-064-673Audio PIN: Shown after joining the WebinarHosts: Rich Guth, CMO, Karmasphere Charles Zedlewski, VP Product, Cloudera 1
  2. 2. Housekeeping • Ask questions at any time using the Questions panel • Twitter: #HadoopTrends • Problems? Use the Chat panel • Slides and recording will be available 2
  3. 3. Speaker: David MenningerVice President , Ventana Research • Covers analytics, business intelligence and information management for Ventana Research. David brings over two decades of experience, through which he has marketed and brought to market some of the leading edge technologies for helping organizations analyze data to support a range of action-taking and decision- making processes. • Prior to joining Ventana Research, David was VP of Marketing and Product Management at Vertica Systems, Oracle, Applix, InforSense and IRI Software. He helped create over half a billion dollars of shareholder value while serving in these roles. • Email: david.menninger@ventanaresearch.com • Twitter: @dmenningervr 3
  4. 4. Who We Are Mission: To help organizations to profit from all of their data How We Do It Credentials Technical Team Leadership We deliver relevant The Apache Hadoop Unmatched knowledge Strong executive team products and services. experts. and experience. with proven abilities. Mike Olson Jeff A distribution of Apache Hadoop  Number 1 commercial , open  Founders, committers and CEO Hammerbacher that is tested, certified and source distribution of Apache contributors to Hadoop Chief Scientist Kirk Dunn supported Hadoop  A wealth of experience in the COO Amr Awadalla Comprehensive support and  Largest contributor to the open design and delivery of production Charles VP Engineering professional service offerings source Hadoop ecosystem software Zedlewski Doug Cutting VP, Product A suite of management software  Breadth and depth in a team of Mary Chief Architect for Hadoop operations open source committers and Omer Trajman Rorabaugh contributors VP, Customer Training and certification CFO Solutions programs for developers,  More than 100 customers across administrators, managers and a wide variety of industries data scientists  Strong growth in revenue and new accounts 4 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  5. 5. What we do  Consulting Services  Cloudera University Cloudera Partners OPERATORS ENGINEERS ANALYSTS BUSINESS USERS Cloudera Enterprise Management  Cloudera Management Suite Enterprise  Cloudera Support IDE’s BI / Analytics Tools Reporting CUSTOMERS Adapters Cloudera’s Distribution Enterprise Data Including Apache Hadoop (CDH) Warehouse Web + Application SCM Express Relational Logs Files Web Data Databases BIG DATA5 ©2011 Cloudera, Inc. All Rights Reserved.
  6. 6. Karmasphere Opening Up the Data in Hadoop for the Enterprise6 © Karmasphere 2011 All rights reserved
  7. 7. Karmasphere Big Data Intelligence Product Suite For Data and Business Analysts Graphical environment where Big Data on Hadoop – even unstructured data – can be accessed, discovered, and analyzed via familiar SQL and visualized in Excel and other visualization tools FREE for Developers New to Hadoop Graphical development environment that facilitates learning how to prototype, develop and test MapReduce jobs for Hadoop For Developers Going into Production Graphical development environment for the complete Hadoop application development lifecycle, adding debugging, packaging and profiling to the capabilities of Community Edition7 © Karmasphere 2011 All rights reserved
  8. 8. Hadoop and Information ManagementBenchmark Research ProjectPreliminary FindingsJune 23, 2011 8 ©2011, Ventana Research, Inc.
  9. 9. Agenda Why did we undertake this research? What is did our research examine? What did we find? How should you use this information? Where do you get more information? ©2011, Ventana Research, Inc. 9
  10. 10. Ventana Research – Overview Ventana Research is the leading benchmark research and strategic advisory services firm. Our unparalleled analytic insights and best practices guidance and are based on our rigorous research-based benchmarking, business, technology and best practices services. Unique Combination of Capabilities • Members (85,000) and Reach to Professionals (3milion) • Research and Reach across all line of business functions and IT• Expertise Across Business • Conduct and Deliver Benchmark and Technology Research• Understand Business • Develop Analytic and Best Domain and Processes Practice Assessments • Formalized Research Coverage of Technology Vendors • Deliver Research on Technology Impact to Business ©2011, Ventana Research, Inc. 10
  11. 11. Rising Popularity
  12. 12. Popularity Measured by Job Postings ©2011, Ventana Research, Inc. 12
  13. 13. Research Objectives Gauge both the adoption rate and intentions to use Hadoop Determine which elements of the Hadoop ecosystem are the most popular • Including which distributions, which components and which third- party products. Examine the infrastructures and strategies being used to deploy Hadoop Clarify the role of the cloud in enterprise Hadoop deployments Elucidate the components of the business case for Hadoop Detail use of Hadoop across industries Determine the barriers and obstacles to further adoption of Hadoop ©2008, Ventana Research, Inc.
  14. 14. Respondent Demographics Participation by Region Company Size by Employee Central and count Middle South America Africa East Small 3% 2% 3% 14% Europe Very 7% Large 35% Asia Pacific 16% Midsize 24% North America 69% Large 27% Total qualified responses: 163 ©2011, Ventana Research, Inc. 14
  15. 15. Touching Over Half The Big Data Audience Hadoop Usage Currently in production 22%No plans to use 46% Plan to use 54% within 12 months 12% Plan to use in 12-24 months 3% Still evaluating 17% ©2011, Ventana Research, Inc. 15
  16. 16. Hadoop Is Generally AdditiveIs your Hadoop deployment replacing another technology? Hadoop is supplementing Yes other established 37% technologies, with RDBMSs still the dominant technology being used or planned to be No used by more than nine out63% of ten organizations. ©2011, Ventana Research, Inc. 16
  17. 17. Hadoop Is Additive In More Than One Way Are there things youre able to do or plan to do with large-scale data technologies that you couldnt do before deployment? 87% 52% Hadoop Other ©2011, Ventana Research, Inc. 17
  18. 18. Hadoop Is Additive In More Than One Way What are you able to do or what do you plan to do with large-scale data technologies that you couldnt do before deployment? 94% Analyze data at a greater level of detail 93% Perform types of analytics 88% that couldnt be done on large volumes of data 71% before 88% Keep more historical data (post-process) 60% Capture all of the source 82% data that we are collecting Hadoop 47% (pre-process) Non-Hadoop ©2011, Ventana Research, Inc. 18
  19. 19. What Types of Data? Hadoop is much more likely to be used for log and event data; much less likely to be used for transaction data. It’s also more likely to be used for text and multimedia. Most Common - Hadoop Most Common - Others • Application logs • Customer/member data • Other types of event data • Transaction data • Other log files • Application logs • Web logs • Online retail transactions • Transaction data • Network monitoring/traffic • Network monitoring/traffic • Call detail records What types of large-scale data does your organization analyze? ©2011, Ventana Research, Inc. 19
  20. 20. What Types of Data?Q28 What types of large-scale data does your organization analyze? 59% Customer/member data 68% 44% Transactional data from applications (for… 68% 69% Application logs 37% 64% Other types of event data 23% 41% Network monitoring/network traffic 33% 33% Online retail transactions 34% 51% Other log files 26% 28% Call Detail Records 32% Web logs 46% 21% 36% Text data from social media and online… 15% 36% Search logs 11% 18% Trade/quote data 15% Intelligence/defense data 18% 11% 21% Multimedia (audio/video/images) 9% 8% Weather 3% 3% Hadoop Smartmeter data 6% 3% Non Hadoop Other (please specify) 5% ©2011, Ventana Research, Inc. 20
  21. 21. What Types of Applications? What types of large-scale data applications is your organization running? 60% Query and reporting 89%Consolidation of multiple 63% Hadoop is most oftendata sources for analysis 71% used for advanced Custom/production 65% analyses and is more application 68% likely to be used to 56% analyze unstructured Data preparation 60% data and for data 69% sandboxing than other Advanced analyses 47% technologies. It is less Analysis or indexing 46% likely to be used for of unstructured data 32% query and reporting. Hadoop Data sandbox/ 44% Data experimentation 32% Non-Hadoop ©2011, Ventana Research, Inc. 21
  22. 22. Where Sourced? From which source(s) did you access Hadoop software? Apache 63% Cloudera 55% Amazon 11% The Apache Hadoop distribution, most prevalent IBM 8% followed closely by Cloudera. Nearly half the Yahoo 8% organizations are using Facebook 5% more than one distribution. Other (please 5% specify) Dont know 5% ©2011, Ventana Research, Inc. 22
  23. 23. Which Components? WhichDistributed File System… Hadoop Hadoop-related projects do you use of plan 79% to use? MapReduce 76% Hbase 61% Hive 53% Zookeeper 45% Pig 45% Flume 34% Sqoop 26% Oozie 18% Avro 16% Dont know 11% ©2011, Ventana Research, Inc. 23
  24. 24. Hadoop Organizations are More Confident How confident are you in your organizations ability to manage large-scale data? Hadoop 43% 37% 18% 2% Non Hadoop 23% 32% 35% 11% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Very confident Confident Somewhat confident Not very confident ©2011, Ventana Research, Inc. 24
  25. 25. Report Higher Levels of BenefitsQ27 What are the primary benefits of using your current technologies foranalyzing large-scale data sets? 79% Allow us to retain and analyze more data 71% 85% Increase the speed of analysis 63% 51% Produce more accurate results 66% 64% Reduce or eliminate manual processes 56% 62%Cost savings - reduced implementation time/fees 53%Reduce the time required for data collection and 67% preparation 49% Higher customer retention from better analysis 54% of customer data 54% 72% Utilize computing resources more efficiently 46% 82% Cost savings - license fees 40% 49% Reduce effort/staff required 49% 67% We are able to create new products/services 32% Improved margins resulting from better 41% algorithms 30% Hadoop 26% Non-HadoopImproved clickthrough, cross-selling or upselling 30% ©2011, Ventana Research, Inc. 25
  26. 26. Research Can Help Answer Your Questions Is Hadoop a fad or here to stay? Which distributions/components are being used?  Apache?  Cloudera?  Other? Are your peers using Hadoop and for what purpose? Identify and avoid some of the obstacles to successful deployments.
  27. 27. What Should You Do? Already using Hadoop?  Compare you usage with others  Are you using all the components you should be?  Have you considered all application areas?  Is your usage tactical (cost saving) or strategic (new capabilities)? Not Using or Evaluating Hadoop?  Consider whether you should be  Did your organization need some “proof”? ©2011, Ventana Research, Inc. 27
  28. 28. Where to Get More Information Free webinar and report: Contact us with questions:  Ventana Research will host a webinar with the final results and analysis.  Report of our findings will be distributed by the sponsors and will be available on our website: Ventana Research www.VentanaResearch.com/HIM 925-474-0060 info@ventanaresearch.com www.ventanaresearch.com ©2011, Ventana Research, Inc. 28
  29. 29. Q&AAsk questions using the Questions panelTweet• #HadoopTrends• @dmenningervr• @Cloudera• @KarmasphereThank you for participating! 29