SlideShare uma empresa Scribd logo
1 de 32
Big Data Market Overview
Jo Maitland, Research Director, GigaOM

• 15+ years in technology research and
  journalism with focus on emerging
  infrastructure technologies including next
  generation storage, networking,
  virtualization, and cloud computing
    – Forrester Research (Analyst)
    – The 451 Group (Analyst)
    – TechTarget (Executive Editor)
    – UBM Tech (LightReading.com, Senior
      Editor)
    – Computerwire (Senior Writer)
    – PC Week (Reporter)
Agenda

•   Data growth, it’s big
•   Oh the mess we are in…
•   Let’s turn off all the computers
•   Don’t be daft!
•   There’s new technologies to help store and analyze all this data
•   Enter Hadoop, NoSQL and Hype.
•   It’s the apps stupid
•   Emerging trends
•   Questions to consider
How Big?
Data growth at Facebook
Data growth at Twitter
Growth of machine generated data
Data growth worldwide
Data growth in the enterprise is staggering


•Walmart handles more than 1 million customer
transactions per hour


        •There are about 90 trillion emails per year


•Google processes some 24 petabytes of data per day



       •AT&T transfers 30PB of data per day
Business decision-makers are screwed, basically
The Answer?
What to do…

• Turn off all the computers?
• Turn off some of the computers?
• Stop storing everything and
  classify your data?
• All attempts to stem the tide
  of big data will fail.
Two new technologies have come to our rescue




   Hadoop

                NoSQL
Commercial solutions enter the fray

• Hadoop distribution companies
   –   Cloudera
   –   HortonWorks
   –   MapR
   –   ++
• NoSQL database companies
   –   10gen (MongoDB)
   –   DataStax (Cassandra)
   –   Basho (Riak)
   –   ++
Hadoop + big data apps = useful
Big data applications are key

• Operational intelligence
   – Splunk, Sumo Logic
• Sales and marketing
   – GoodData, Media Science, Bloomreach
• Visualization
   – Tableau Software, QlikTech, Palantir
• Business Intelligence
   – Platfora, Domo, WibiData
• Online advertizing
   – Collective, DataXu, RocketFuel, Turn
• Data as a service
   • FICO, DataSift, Bluekai
What’s next?
Emerging trends

•   More data
•   Focus on applications
•   Data democratization and trust
•   A shift to real time
data
Emerging trends

•   More data
•   Applications
•   Data democratization and trust
•   A shift to real time
Applications
Square
PredPol
23andMe
Emerging trends

•   More data
•   Applications
•   Data democratization and trust
•   A shift to real time
Data democratization and trust
Emerging trends

•   More data
•   Applications
•   Data democratization and trust
•   A shift to real time
Shift to real time
Questions to consider
Investors

• Is the company in an area that is already well funded or over-
  funded?
    – Infrastructure
• What are the emerging sub-categories?
    – Cloud-based services
• What’s the new angle?
    – ?
Customers

• Are there existing big data apps you could use instead of building a
  custom app?
    – Log file analysis
• What is your 3 year big data roadmap?
    – Just as companies have measured their ROI on technology
      investments, they should also measure the value they receive from
      information.
Thank you

Jo.Maitland@GigaOM.com
     #JoMaitlandSF

Mais conteúdo relacionado

Mais de Cloudera, Inc.

2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 

Data Science Day New York: GigaOM Big Data Market Overview

  • 1. Big Data Market Overview
  • 2. Jo Maitland, Research Director, GigaOM • 15+ years in technology research and journalism with focus on emerging infrastructure technologies including next generation storage, networking, virtualization, and cloud computing – Forrester Research (Analyst) – The 451 Group (Analyst) – TechTarget (Executive Editor) – UBM Tech (LightReading.com, Senior Editor) – Computerwire (Senior Writer) – PC Week (Reporter)
  • 3. Agenda • Data growth, it’s big • Oh the mess we are in… • Let’s turn off all the computers • Don’t be daft! • There’s new technologies to help store and analyze all this data • Enter Hadoop, NoSQL and Hype. • It’s the apps stupid • Emerging trends • Questions to consider
  • 5. Data growth at Facebook
  • 6. Data growth at Twitter
  • 7. Growth of machine generated data
  • 9. Data growth in the enterprise is staggering •Walmart handles more than 1 million customer transactions per hour •There are about 90 trillion emails per year •Google processes some 24 petabytes of data per day •AT&T transfers 30PB of data per day
  • 10. Business decision-makers are screwed, basically
  • 12. What to do… • Turn off all the computers? • Turn off some of the computers? • Stop storing everything and classify your data? • All attempts to stem the tide of big data will fail.
  • 13. Two new technologies have come to our rescue Hadoop NoSQL
  • 14. Commercial solutions enter the fray • Hadoop distribution companies – Cloudera – HortonWorks – MapR – ++ • NoSQL database companies – 10gen (MongoDB) – DataStax (Cassandra) – Basho (Riak) – ++
  • 15. Hadoop + big data apps = useful
  • 16. Big data applications are key • Operational intelligence – Splunk, Sumo Logic • Sales and marketing – GoodData, Media Science, Bloomreach • Visualization – Tableau Software, QlikTech, Palantir • Business Intelligence – Platfora, Domo, WibiData • Online advertizing – Collective, DataXu, RocketFuel, Turn • Data as a service • FICO, DataSift, Bluekai
  • 18. Emerging trends • More data • Focus on applications • Data democratization and trust • A shift to real time
  • 19. data
  • 20. Emerging trends • More data • Applications • Data democratization and trust • A shift to real time
  • 25. Emerging trends • More data • Applications • Data democratization and trust • A shift to real time
  • 27. Emerging trends • More data • Applications • Data democratization and trust • A shift to real time
  • 30. Investors • Is the company in an area that is already well funded or over- funded? – Infrastructure • What are the emerging sub-categories? – Cloud-based services • What’s the new angle? – ?
  • 31. Customers • Are there existing big data apps you could use instead of building a custom app? – Log file analysis • What is your 3 year big data roadmap? – Just as companies have measured their ROI on technology investments, they should also measure the value they receive from information.

Notas do Editor

  1. Facebook hit ONE BILLION users in October this year. To keep up with this growth the company had to build’s own technologies for storage and analytics. It’s employees rely on this infrastructure to analyze user engagement. Now other companies want the same storage and analytic capabilities as Facebook. This is how innovative technologies are moving from consumer companies into the enterprise. Especially big data technologies.
  2. Twitter is one of the most fascinating big data companies as its data is growing exponentially. It has one of the most interesting repositories of human generated data in the form of tweets, that can reveal all kinds of insights, from financial market predictions to sentiment analysis in war torn regions of the earth. We think that Twitter has only just begun to scratch the surface of what’s possible with all the data it is collecting.
  3. While humans are posting large numbers of status updates and uploading millions of photos every day, machine generated data is the fastest growing source of data.
  4. Then there’s overall data growth. By 2015 we’re looking at 7.9 Exabytes of data. One Exabyte equals ONE BILLION Gigabytes. It’s into the realm of such large numbers that we can’t even wrap our heads around how big it is anymore.
  5. And for any gamers out there, World of Warcraft uses 1.3 PETABYTES of storage to maintain game information. Clearly, the amount of data enterprises have to deal with, both internal and external, is growing at a relentless rate. And the impact?
  6. Between all the tweets, clickstreams, web logs, page views, streaming videos, social graphs, ad events, downloads, and more, business decision makers are completely overwhelmed by the volume of data and the difficulty of accessing it.
  7. Hadoop is an open source programming framework created by Google and others to store and process HUGE amounts of data at very low cost. It’s been widely adopted by consumer and enterprise companies. NoSQL is a type of database that was created to store very large amounts of complex data in a simple, flexible way. The roots of both these technologies are in open source. But similar to what happened with Linux, when companies reach significant scale in their Hadoop deployments, they require commercial support. While Hadoop is the foundation of hundreds of big data initiatives, it is hard to configure, manage and maintain. And so several companies sprang up offering Hadoop distributions with management and other tools wrapped around the open source software, much the way Red Hat does for Linux. Cloudera, HortonWorks and MapR are all in this category.
  8. The infrastructure layer in big data has been heavily funded. There are 30-40 companies in the NoSQL, NewSQL market and dozens of companies packing Hadoop into infrastructure products and services. They are all hoping to outcompete existing structured database products from vendors like Oracle, IBM, Microsoft, Teradata and others.
  9. But for all this Hadoop and NoSQL infrastructure to be useful we actually need applications on top of it that make use of the data. We think the application space presents the next major opportunity in big data.
  10. Some examples:Splunk is an example of a successful big data app. It has a market cap of around $2.8 billion. The company captures and analyzes machine log file data. And with machine data growing faster than any other form of data, Splunk is an attractive opportunity for investors in this space.In sales and marketing, companies like GoodData gather and analyze data across diverse marketing campaigns that run across Facebook, Google, Twitter, and other sources. They provide consolidated reports that make it easier for marketers to figure out how their campaigns fared. And as access to data becomes more democratized, meaning everyone in the company wants their slice of data delivered in the way they need it, visualization tools are becoming more important. Tableau is one of the most well known names in this space and is on its way to an IPO.In the BI space, Platfora is all about transforming Hadoop datasets into enterprise dashboards with multidimensional layouts, drill-down capabilities and predictive analytics. Online advertising big data apps are about optimizing ad delivery.And data as a service is an emerging category of companies that package certain kinds of data and sell it to other companies. So FICO sells financial data and DataSift works with Twitter’s streaming data.
  11. The ability to store vastly more data at a low cost is driving people to store even MORE data, which might present a challenge in the future if the cost of storage doesn’t go down as rapidly as it has in the past. But for now, companies are benefiting from cheap storage and analytics, allowing them to store and analyze data at a much more granular level and much faster than they could have in the past.
  12. We’re seeing a shift in focus to applications. Companies like Splunk and others are just beginning to scratch the surface of what can be done at the application layer. A number of other categories, both verticals and business functions, are ripe for disruption via big data apps.
  13. Payments is an interesting area in big data apps. Payments company Square can turn any Smartphone into a point of sale device, which means it can capture an immense amount of previously unavailable transaction data. Building on this, the company is providing advanced merchant analytics in addition to developing its own insights.
  14. And outside of the tech industry many interesting applications are popping up as well. PredPol takes large quantities of historical crime data, analyzes it, and uses predictive analytics to predict where crime will happen. For cities facing budget cuts, this means police can patrol specific areas at specific times, which has been shown to reduce the number of crimes that occur.
  15. In healthcare, there are literally hundreds of apps coming online that make use of big data, from the Nike Fit Band which monitors your daily movement to 23ANDMe which uses your genomic data to help you track your ancestry. The growth in apps on top of big data infrastructure is going to be huge.
  16. The days of siloed departments in which only certain people get access to data are over. Now everyone wants access to data to run their own analysis to make business decisions. The next step here is making it easier to share data across different departments. The cultural challenge that will remain is how to remove bias from human decisions and have them be truly informed by data. This is a tough nut to crack. How many times have you over-ridden your GPS in the car, assuming you know a better route, and then had to turn the GPS back on again when you land up lost? We’ve got to trust data and this is a hard one.
  17. Most of today’s big data infrastructure provides batch-orientated processing, returning queries in minutes, hours, sometimes days. But we are seeing increasing demand for more real-time data processing, delivering insights instantaneously. Projects like Cloudera’s Impala and Apache Drill by MapR are working on pushing Hadoop towards a more real-time system.
  18. The big data infrastructure space is well funded or over-funded… What are the new sub-categories, new angles?
  19. Can you tie your big data investment back to Earnings per share? By making the right investments and measuring them appropriately, companies stand to gain significant competitive advantage by leveraging big data.
  20. Please feel free to email me or hit me up on Twitter if you’d like to follow-up on this talk.