2. Jo Maitland, Research Director, GigaOM
• 15+ years in technology research and
journalism with focus on emerging
infrastructure technologies including next
generation storage, networking,
virtualization, and cloud computing
– Forrester Research (Analyst)
– The 451 Group (Analyst)
– TechTarget (Executive Editor)
– UBM Tech (LightReading.com, Senior
Editor)
– Computerwire (Senior Writer)
– PC Week (Reporter)
3. Agenda
• Data growth, it’s big
• Oh the mess we are in…
• Let’s turn off all the computers
• Don’t be daft!
• There’s new technologies to help store and analyze all this data
• Enter Hadoop, NoSQL and Hype.
• It’s the apps stupid
• Emerging trends
• Questions to consider
9. Data growth in the enterprise is staggering
•Walmart handles more than 1 million customer
transactions per hour
•There are about 90 trillion emails per year
•Google processes some 24 petabytes of data per day
•AT&T transfers 30PB of data per day
12. What to do…
• Turn off all the computers?
• Turn off some of the computers?
• Stop storing everything and
classify your data?
• All attempts to stem the tide
of big data will fail.
30. Investors
• Is the company in an area that is already well funded or over-
funded?
– Infrastructure
• What are the emerging sub-categories?
– Cloud-based services
• What’s the new angle?
– ?
31. Customers
• Are there existing big data apps you could use instead of building a
custom app?
– Log file analysis
• What is your 3 year big data roadmap?
– Just as companies have measured their ROI on technology
investments, they should also measure the value they receive from
information.
Facebook hit ONE BILLION users in October this year. To keep up with this growth the company had to build’s own technologies for storage and analytics. It’s employees rely on this infrastructure to analyze user engagement. Now other companies want the same storage and analytic capabilities as Facebook. This is how innovative technologies are moving from consumer companies into the enterprise. Especially big data technologies.
Twitter is one of the most fascinating big data companies as its data is growing exponentially. It has one of the most interesting repositories of human generated data in the form of tweets, that can reveal all kinds of insights, from financial market predictions to sentiment analysis in war torn regions of the earth. We think that Twitter has only just begun to scratch the surface of what’s possible with all the data it is collecting.
While humans are posting large numbers of status updates and uploading millions of photos every day, machine generated data is the fastest growing source of data.
Then there’s overall data growth. By 2015 we’re looking at 7.9 Exabytes of data. One Exabyte equals ONE BILLION Gigabytes. It’s into the realm of such large numbers that we can’t even wrap our heads around how big it is anymore.
And for any gamers out there, World of Warcraft uses 1.3 PETABYTES of storage to maintain game information. Clearly, the amount of data enterprises have to deal with, both internal and external, is growing at a relentless rate. And the impact?
Between all the tweets, clickstreams, web logs, page views, streaming videos, social graphs, ad events, downloads, and more, business decision makers are completely overwhelmed by the volume of data and the difficulty of accessing it.
Hadoop is an open source programming framework created by Google and others to store and process HUGE amounts of data at very low cost. It’s been widely adopted by consumer and enterprise companies. NoSQL is a type of database that was created to store very large amounts of complex data in a simple, flexible way. The roots of both these technologies are in open source. But similar to what happened with Linux, when companies reach significant scale in their Hadoop deployments, they require commercial support. While Hadoop is the foundation of hundreds of big data initiatives, it is hard to configure, manage and maintain. And so several companies sprang up offering Hadoop distributions with management and other tools wrapped around the open source software, much the way Red Hat does for Linux. Cloudera, HortonWorks and MapR are all in this category.
The infrastructure layer in big data has been heavily funded. There are 30-40 companies in the NoSQL, NewSQL market and dozens of companies packing Hadoop into infrastructure products and services. They are all hoping to outcompete existing structured database products from vendors like Oracle, IBM, Microsoft, Teradata and others.
But for all this Hadoop and NoSQL infrastructure to be useful we actually need applications on top of it that make use of the data. We think the application space presents the next major opportunity in big data.
Some examples:Splunk is an example of a successful big data app. It has a market cap of around $2.8 billion. The company captures and analyzes machine log file data. And with machine data growing faster than any other form of data, Splunk is an attractive opportunity for investors in this space.In sales and marketing, companies like GoodData gather and analyze data across diverse marketing campaigns that run across Facebook, Google, Twitter, and other sources. They provide consolidated reports that make it easier for marketers to figure out how their campaigns fared. And as access to data becomes more democratized, meaning everyone in the company wants their slice of data delivered in the way they need it, visualization tools are becoming more important. Tableau is one of the most well known names in this space and is on its way to an IPO.In the BI space, Platfora is all about transforming Hadoop datasets into enterprise dashboards with multidimensional layouts, drill-down capabilities and predictive analytics. Online advertising big data apps are about optimizing ad delivery.And data as a service is an emerging category of companies that package certain kinds of data and sell it to other companies. So FICO sells financial data and DataSift works with Twitter’s streaming data.
The ability to store vastly more data at a low cost is driving people to store even MORE data, which might present a challenge in the future if the cost of storage doesn’t go down as rapidly as it has in the past. But for now, companies are benefiting from cheap storage and analytics, allowing them to store and analyze data at a much more granular level and much faster than they could have in the past.
We’re seeing a shift in focus to applications. Companies like Splunk and others are just beginning to scratch the surface of what can be done at the application layer. A number of other categories, both verticals and business functions, are ripe for disruption via big data apps.
Payments is an interesting area in big data apps. Payments company Square can turn any Smartphone into a point of sale device, which means it can capture an immense amount of previously unavailable transaction data. Building on this, the company is providing advanced merchant analytics in addition to developing its own insights.
And outside of the tech industry many interesting applications are popping up as well. PredPol takes large quantities of historical crime data, analyzes it, and uses predictive analytics to predict where crime will happen. For cities facing budget cuts, this means police can patrol specific areas at specific times, which has been shown to reduce the number of crimes that occur.
In healthcare, there are literally hundreds of apps coming online that make use of big data, from the Nike Fit Band which monitors your daily movement to 23ANDMe which uses your genomic data to help you track your ancestry. The growth in apps on top of big data infrastructure is going to be huge.
The days of siloed departments in which only certain people get access to data are over. Now everyone wants access to data to run their own analysis to make business decisions. The next step here is making it easier to share data across different departments. The cultural challenge that will remain is how to remove bias from human decisions and have them be truly informed by data. This is a tough nut to crack. How many times have you over-ridden your GPS in the car, assuming you know a better route, and then had to turn the GPS back on again when you land up lost? We’ve got to trust data and this is a hard one.
Most of today’s big data infrastructure provides batch-orientated processing, returning queries in minutes, hours, sometimes days. But we are seeing increasing demand for more real-time data processing, delivering insights instantaneously. Projects like Cloudera’s Impala and Apache Drill by MapR are working on pushing Hadoop towards a more real-time system.
The big data infrastructure space is well funded or over-funded… What are the new sub-categories, new angles?
Can you tie your big data investment back to Earnings per share? By making the right investments and measuring them appropriately, companies stand to gain significant competitive advantage by leveraging big data.
Please feel free to email me or hit me up on Twitter if you’d like to follow-up on this talk.