More Related Content
Similar to Big Data a big deal? (20)
Big Data a big deal?
- 5. Why Big Data Now?
VOLUME
1. All on-line digital activity creates artifacts or metadata
which in Tera to Peta byte or more volume is being called
BIG DATA
2. Unstructured Metadata collection occurs when ever digital
activity occurs
3. Digital metadata volume has exploded with growing internet
usage and has accelerated with recent smart phone & iPAD
usage driving global mobile and social activity
5 © 2009/2012 Pythian © All Rights Reserved
- 6. Why Big Data Now?
HUMAN VOLUME
1. In 1998 Google provided 3.6 Million searches in the year
2. In 2011 Google ran 1,722,071,000,000 searches per year
3. In August 2008 there were 100 Million Facebook users
4. In December 2012 there will be over 1 Billion Facebook users
5. In August 2012 Twitter reached over 500,000,000 users
Digital volume of user on-line metadata has exploded with growing
internet, mobile and social use.
6 © 2009/2012 Pythian © All Rights Reserved
- 7. Why Big Data Now?
DEVICE VOLUME
1. In 2005 There were 1.5 Billion RFID Tags
2. In 2012 There are 30 Billion RFID Tags
3. 350 Billion Smart Meter Transactions per year
4. 1 Billion smart phones by 2015 with location
sensors
Digital sensor data volume has exploded with growing
machine usage of sensor and measurement reporting
7 © 2009/2012 Pythian © All Rights Reserved
- 8. Why Big Data Now?
ZEITGIEST
1. Data Driven Decision Making is mainstream
thinking– Think Moneyball by Michael Lewis
2. Google demonstrated the value and importance
of mining ―Big Data‖ for Search, Ad Placement,
Language Translation and a myriad of other
computing challenges with economic benefit.
3. Data trumps smarter algorithms. It is
the dawning of the Age of Real Time & Near Real
Time BIG Impact Analytics.
8 © 2009/2012 Pythian © All Rights Reserved
- 9. Why Big Data Now?
ECONOMICS
1. Collection & Analysis of large volumes of metadata is now
relatively simple, low cost and potentially highly valuable
2. Storage & computing power is relatively low cost enabling the
mining of massive metadata volumes in real time, near real
time or later
3. The economic benefit or value of the insights can
far exceed the costs of acquiring & storing the data
4. The simplification and access of Big Data infrastructure tools
9 © 2009/2012 Pythian © All Rights Reserved
- 10. Purpose of Data Analysis
The analysis of data are required to understand
(a) why consumers purchase a particular,
(b) how consumers purchase the product,
(c) the demographics and psychographics of the purchaser of the
product and
(d) the ultimate user of the product.
10 © 2009/2012 Pythian © All Rights Reserved
- 11. An Alternative Perspective
“Big Data is just the new rallying cry
for the same old stuff BI companies
have been producing all along”
-Stephen Few Perceptual Edge
This seems obvious, but almost no attention is being given
to building the skills and technologies that help us glean
insights from data more effectively. As Richards J. Heuer, AVOID
Jr. argued in the Psychology of Intelligence Analysis
(1999), the primary failures of analysis are less due to CONFUSING
insufficient data than to flawed thinking. To succeed
analytically, we must invest a great deal more of our ABUNDANCE
resources in training people to think effectively and we
must equip them with tools that support that effort. WITH
Heuer spent 45 years supporting the work of the CIA.
Identifying a potential terrorist plot requires an analyst INSIGHT
to sift through a lot of data (perhaps Big Data), but more
importantly, it relies on their ability to connect the dots.
Contrary to Heuer’s emphasis on thinking skills, big data
is merely about more, more, more; not smarter or
better.
11 © 2009/2012 Pythian © All Rights Reserved
- 12. Is Big Data really new?
NO
What is new is that the access-to-insights occurs at
economics and tools available to almost anyone today
Saving all data is now economically viable for everyone.
Large public and private sector (Global 2000) enterprises
have always generated, stored, processed and analyzed
large volume and a variety of structured and
unstructured data:
1. Particle Physics Research - Large Hadron Collider generates 1 Petabyte per second.
2. Oil Exploration - Seismic sensor daa
3. Bioinformatics -Human Genome Project
12 © 2009/2012 Pythian © All Rights Reserved
.
- 13. BIG DATA VS TRADITIONAL
DATA
Petabytes at1/10th Cost of Pre-Engineered Gigabytes to Tera-bytes
Storage SQL Structured
Semi-structured Engineered Systems
Variety of Sources Data Model/Schema
Store Everything Selected Data Stored
Raw Data Complexity at Design/Architecture stage
No Data Model/Schema Simplicity at Usage stage
Parallelize to handle volume Majority of $$ Investment up front
Simplicity at Design/Architecture stage
Complexity at Insight stage
13 © 2009/2012 Pythian © All Rights Reserved
- 14. Big Data is BI at Scale
PHASE 1 PHASE 2 PHASE 3
Capture & Speculate Exploit
Store and Insights
• Petabyte scale Investigate • Real Time
• 300 • Data Science • NRT Decisions
Terabytes/Rack • Analytics
• MAP-R
14 © 2009/2012 Pythian © All Rights Reserved
- 15. Big Data Phase 1- Capture &
Store
Is the value of potential insights much greater
than the cost of searching for them?
BUSINESS QUESTIONS
• How do you plan to store what types of semi-structured data?
• What questions are you attempting to answer?
• What Data Analysis is being currently done?
• What are people asking questions about?
• What DR? What compression? What Storage is possible? Flash vs
Disk? Capacity and How fast to access?
• How many people can access simultaneously?
• KNOW THE DATA? SOURCE? RATE OF GENERATION?
15 © 2009/2012 Pythian © All Rights Reserved
- 16. Big Data Phase 1- Capture &
Store
Is the value of potential insights much greater
than the cost of searching for them?
STORAGE REQUIREMENTS
• Be scalable
• Provide tiered storage
• Be self managing
• Ensure content is highly available
• Ensure content is widely accessible
• Support both analytical and content applications
• Support workflow automation
• Integrate with legacy applications
• Enable integration with public, private and hybrid cloud ecosystems
• Be self healing
16 © 2009/2012 Pythian © All Rights Reserved
- 17. Big Data Phase 2- Speculate and
Investigate
Is the value of potential insights much greater
than the cost of searching for them?
BUSINESS QUESTIONS
• What type of semi-structured data do I have?
• What type of questions am I trying to answer?
• Statistical? Correlation? Causal? Patterns?
• How do I need to manipulate, translate, transform, cleanse,
organize, visualize the data?
• How much time do I have for analysis?
• What tools do I have to perform transformation and analysis?
17 © 2009/2012 Pythian © All Rights Reserved
- 18. Big Data Phase 3- Exploit
Insights
Is the value of potential insights much greater
than the cost of searching for them?
BUSINESS QUESTIONS
• Are discovered patterns/insights available in real-time, near real-
time or further out?
• How do systemically find pattern/insight going forward?
• How do I integrate into business impacting decision process?
18 © 2009/2012 Pythian © All Rights Reserved
- 19. Top 10 Reasons Why all the Hype
around Big Data now?
1. At Tera & Peta bytes it really does get interesting.
2. All the Cool Kids are doing it.
Once the Four Digerati Horseman (Google, Facebook, Twitter, Amazon) say its important, then it really is.
3. BI Folks needed a new marketing moniker.
4. ‗CLOUD‘ hype was already annoying and slowing.
5. Gartner says its near its peak!
6. The term went viral!
7. People thought you said Big Deal!
8. Voluminous data could not be pronounced
9. User Data mining is next to Voyeurism
10. Its Google‘s Vault!
19 © 2009/2012 Pythian © All Rights Reserved
- 20. What is considered Big Data?
VOLUME & VARIETY
1. Any data stored digitally and at scale (Tera bytes
+) with potential for providing practical, useful
insights, potentially with economic benefits
2. Very large volume of unstructured
information/data
3. Big Data is characterized by the volume, velocity
and variety of large data sets
Every “connected” person or “connected”
device is potentially a data generator
20 © 2009/2012 Pythian © All Rights Reserved
- 21. What is considered Big Data?
DIFFICULT & TIMELY
1. Big Data by the nature of the volume hides or
obscures valuable insights. A lot of noise but with
critical and potentially valuable signals buried
within
2. Often the signal value perishes rapidly requiring
real time or near real time analysis and action
Big Data is the quintessential signal vs noise
problem
21 © 2009/2012 Pythian © All Rights Reserved
- 22. Examples of Big Data?
• Local/regional weather information
• WEB Traffic information
• User search behavior
• Social information – who connected to whom, who
poked who etc.
• Mobile User information – preferences, likes,
habits
• Application usage information
• E-commerce transaction information
• Physical retail customer transaction data
22 © 2009/2012 Pythian © All Rights Reserved
- 23. Who are the Top 15 ‘Big Data’ ‘Players’?
1. Google 11.Microsoft
2. Amazon 12.IBM
3. Apple 13.Hortonworks
4. Yahoo 14.Zynga
5. Facebook 15.eBay
6. Salesforce
7. Twitter
8. Cloudera
9. LinkedIN
10.NetFlix
23 © 2009/2012 Pythian © All Rights Reserved
- 24. 1. www.kaggle.com
2. www.indeed.com
3. www.recordedfuture.com
4. www.datamarket.com
5. www.climate.com
6. www.manybills.com
7. www.electrion.twitter.com
8. www.consensu.gov
9. www.coursera.com
10. www.data.gov
24 © 2009/2012 Pythian © All Rights Reserved
- 25. What is the size of the BIG DATA Market?
Deloitte pegs the size of the big data market at
about $1.3-$1.5 billion in 2012
In March, the IDC released a statement that
predicted the worldwide big data technology
services market to reach $16.9 billion in 2015.
The 2012 Global BI SW Market is $35 Billion
25 © 2009/2012 Pythian © All Rights Reserved
- 26. Where does BI and Big Data co-
exist?
PREDICTIVE ANALYTICS
26 © 2009/2012 Pythian © All Rights Reserved
- 27. How does Machine Learning and Big Data
relate?
PREDICTIVE ANALYTICS
27 © 2009/2012 Pythian © All Rights Reserved
- 28. When is Big Data valuable?
1. When better Business decisions result from practical
insights provided by data that were unavailable to
expert judgment or unaware by experts
2. When time-to-insight results in big returns or benefit
eg. Real time book recommendation
3. Where precision of analysis results in specific
alternative decisions
4. Where patterns from heterogeneous or seemingly
disparate data sources provide material competitive
insights/advantage versus competition
28 © 2009/2012 Pythian © All Rights Reserved
- 29. What is unique about Big Data
Technology?
MASSIVE PARRALLISM
AFFORDABLE HARDWARE
LOCAL PROCESSING
1. The tools do not require the data to be first
structured in a particular schema as is required in
relational databases
2. Data is analyzed in native format closest to where
it is stored, dramatically reducing the time and
effort for retrieval and restore.
29 © 2009/2012 Pythian © All Rights Reserved
- 31. What skills do I need in my organization
for Big Data?
1. Data scientists –
• Identify what analysis makes sense in context. Typical background in math and
statistics, as well as artificial intelligence and natural language processing.
2. Data architects –
• Create Data mode and identify required data sources and analytical tools
3. Data visualizers –
• Using visualizations exploring what the data means and presenting how it will
impact the company
4. Data change agents –
• Good communicators, and a Six Sigma background — Understand how to apply
statistics.
31 © 2009/2012 Pythian © All Rights Reserved
- 32. What skills do I need in my organization
for Big Data?
5. Data engineer/operators –
• Big Data infrastructure operations. Develop architecture that helps analyze and
supply data in the way the business needs, and make sure systems are
performing smoothly
6. Data stewards –
• Ensure that data sources are properly accounted for, and may also maintain a
centralized repository as part of a Master Data Management approach, in which
there is one ―gold copy‖ of enterprise data to be referenced.
7. Data virtualization/cloud specialists –
• Build and maintain a virtualized data service layer that can draw data from any
source and make it available across organizations in a consistent, easy-to-access
manner
8. Systems Administrators
32 © 2009/2012 Pythian © All Rights Reserved
- 33. Six Steps to Big Data alchemy?
1. Select the right data sets
• Identify rich data sources which may contain insights to a particular problem you are trying to
solve or insight you are trying to gain. Social media data is providing incredible insights to
changes in Brand positioning and new product introductions
2. Join the various sets of data
• Rich unstructured and sometimes incomplete data into a new set for manipulation and analysis
3. Clean the new large data set
• Begin to discover important and relevant patterns, signatures, anomalies, correlations, outliers
using advanced analytic models
4. Create models
• These models predict outcomes using the data. Iterate your hypothesis and keep experimenting
5. Use visualization tools
• Visualization may assist in discovery or presentation of key insights from the data
6. Iterate
• Keep varying your various models and data sets to assist future planning or decision making
33 © 2009/2012 Pythian © All Rights Reserved
- 34. How is Big Data providing Value
today?
• On line Media and Social Sites mine user behavior Big Data for what
interests whom, when, why and how. Big WEB SURF Data provides
insights to Sites of what people are interested in, whom do they
share that information with, and how long they stay engaged on
line.
• On line retailers mining Big Data to predict consumers buying
behavior, purchase preferences and high impact offers to drive up
total spend per session.
• Insurance companies mining Big Data can improve their overall
performance by facilitating greater pricing accuracy, deeper
relationships with customers, and more effective and efficient loss
prevention.
34 © 2009/2012 Pythian © All Rights Reserved
- 35. How can Pythian help you with Big
Data?
1. First, get informed.
2. Second, get started.
Recognize an opportunity for competitive Advantage within your company.
3. Third, get the right team of people involved.
Organize an internal task force to drive the Big Data initiative. Don‘t forget
to find the critical Data Scientist. That person who will understand the data
sources and know what questions to pose.
4. Fourth, identify the key sources of Big Data
both external and internal.
5. Fifth, with Pythian‘s assistance evaluate the
tools and technology that will help your Big
Data program.
35 © 2009/2012 Pythian © All Rights Reserved
- 36. Key Questions for Executives
• What does the data say?
• Where did the data come from?
• Has the data been sufficiently cleaned?
• How was the data analyzed?
• How confident can we be in our analysis?
• Can we distinguish correlation from causality?
• How much will the data influence the key
decision makers?
36 © 2009/2012 Pythian © All Rights Reserved
- 37. A compelling balanced perspective on Big
Data
Stephen Few- Perceptual Edge
37 © 2009/2012 Pythian © All Rights Reserved
- 39. Big Data Start-ups
• WeatherBill (which compiles large amounts of weather data from a
variety of sources, then sells insurance based on statistical
analysis),
• Klout (a controversial startup that processes large amounts of data
to create every users‘s social influence score) or
• Wonga (which crunches data to grant financial loans) are some
early examples of startups with big data as their core DNA.
• John Partridge, the president and CEO of Tokutek Inc. — a
Lexington company founded in 2006 that makes databases run
faster.
• Trifacta raised $$4.3 million from Accel‘s Big Data fund for a
solution that doesn‘t just visualize insight, but also the analytics
tools that produce it.
• Platfora is a software company based in San Mateo, California, building a revolutionary BI and analytics platform
that democratizes and simplifies use of big data and Hadoop. The company was founded by Ben Werther, former product
head of Greenplum, an analytical database company acquired by EMC. Platfora is assembling a superb team of data and
distributed systems architects/engineers, UI and UX developers, and data scientists.
39 © 2009/2012 Pythian © All Rights Reserved
- 40. Big Data Start-ups
• About MapR Technologies
MapR delivers on the promise of Hadoop, making managing and analyzing Big Data a reality for
more business users. MapR enables customers to harness the power of Big Data analytics.
Leading companies including Amazon, Cisco, EMC and Google partner with MapR to deliver an
enterprise-grade Hadoop solution. Investors include Lightspeed Venture Partners, NEA and
Redpoint Ventures.
• Alteryx provides indispensable analytic solutions for enterprise and SMB companies making
critical decisions about how to expand and grow. Our product, Alteryx Strategic Analytics, is a
desktop-to-cloud Agile BI and analytics solution designed for data artisans and business leaders
that brings together the market knowledge, location insight, and business intelligence today‘s
organizations require. For more than a decade, Alteryx has enabled strategic planning
executives to identify and seize market opportunities, outsmart their competitors, and drive
more revenue.
40 © 2009/2012 Pythian © All Rights Reserved