An overview on big data, big data analytics, data products and data scientist. My contribution at WAYDAY11 (a Bit Bang Web Analytics Day, Milan, Italy, 11.11.11)
2. Let’s talk about …
What: Big Data! Reality Beyond Hype
Why: Competing on (Big) Analytics
How: Data Products & Leadership
@CosimoAccoto Are you ready for the era of “Big Data”?
4. Sorting Reality from the Hype
ü Big Data: a top tech trend for 2012 (Forrester Research)
ü Big Data: a new game-changing asset (The Economist)
ü Big Data: a scientific revolution (Harvard Business Review)
@CosimoAccoto Are you ready for the era of “Big Data”?
5. Science Paradigms Evolution
- Empirical Science
describing natural phenomena
- Theoretical Modeling
using models and generalizations
- Computational Simulations
simulating complex phenomena
- A data-intensive computing
unify, theory, experiment and simulation at scale
@CosimoAccoto Are you ready for the era of “Big Data”? source: Gray J., The Fourth Paradigm. Data-Intensive Scientific Discovery, 2009, p. xviii
6. The “Forth” Paradigm
The techniques and technologies
for such data-intensive science
are so different that it is worth
distinguishing data-intensive
science from computational
science as a new, fourth paradigm
for scientific exploration
@CosimoAccoto Are you ready for the era of “Big Data”? source: Gray J., The Fourth Paradigm. Data-Intensive Scientific Discovery, 2009, p. xix
7. “Big Data!!!”
“…Say what?”
@CosimoAccoto Are you ready for the era of “Big Data”? source: Mckinsey, “Big Data: The Next Frontier for innovation, competition, productiviy, May, 2011, p.1
8. “Big Data!!!”
“…Say what?”
@CosimoAccoto Are you ready for the era of “Big Data”? source: Loukides, “Big Data Now”, O’Reilly Media, 2011, p. 8
9. The Attack of the
Exponentials
Over the past five
decades, the cost of
storage, CPU, and
bandwidth has
been exponentially
dropping, while network
access has exponentially
increased*
@CosimoAccoto Are you ready for the era of “Big Data”? source: Plattner and Zeier, “In-Memory Data Management”, 2011, p. 15-16; * Driscoll, “Big Data Now”;
10. 1.8ZB 7ZB
@CosimoAccoto Are you ready for the era of “Big Data”? source: IDC, “2011 Digital Universe Study”June, 2011, 2015; Image: Wikibon, 2011
11. Big Data is not just “big”
The 3V of Big Data
@CosimoAccoto Are you ready for the era of “Big Data”? source: TDWI Research, “Big Data Analytics”, Fourth Quarter, 2011
12. The Data Deluge: Volume
Boeing jet engines can produce
10 terabytes of operational information
for every 30 minutes they turn.
A four- engine jumbo jet can create 640 terabytes of data
on just one Atlantic crossing; multiply that by the more than 25,000
flights flown each day, and you get an understanding of the impact
that sensor and machine produced data can make on a BI
environment.
@CosimoAccoto Are you ready for the era of “Big Data”? source: Rogers, “Big Data is scaling BI andAnalytics”, Information Management Magazine, 10/2011
13. Streaming Real-Time Data: Velocity
Online Advertising Serving – 40 millisecond to respond
with the decision (deliver the right adv to the right user profile)
Financial Services – near 1 millisecond
to calculate customer scoring probabilities
There
are
many
examples
of
data
that
might
demand
analysis
in
real
4me
or
near
real
4me,
or
at
least
in
less
than
a
day.
RFID
sensor
data
and
GPS
spa4al
data
show
up
in
4me-‐sensi4ve
transporta4on
logis4cs.
Fast-‐moving
financial
trading
data
feeds
fraud-‐detec4on
and
risk
assessments.
@CosimoAccoto Are you ready for the era of “Big Data”? source: TDWI Research, “Big Data Analytics”, Fourth Quarter, 2011 (image)
14. Data outside of Databases: Variety
Channel,
Reseller,
Retailer,
DC,
Store,
Online
AdverJsing,
promoJon
liM
Brand,
Product,
SKU,
Wal-Mart, the world's largest retailer,
library,
web-‐ Serial
Number,
to-‐store,
POP
RFID
is logging one million customer
transactions per hour and
Sources
feeding information into
of
Retail
Price,
margin,
Data
Sell-‐in,
Sell-‐
thru
(and
databases estimated
at 2.5 petabytes.
elasJcity
again),
Sell-‐out
Channel/Trade
Old & New Data Sources:
CRM,
Loyalty,
personalized
programs,
discounts,
rfid’s, sensors, mobile payment,
coupons
rebates
in-vehicle tracking, …
@CosimoAccoto Are you ready for the era of “Big Data”? source: Craig and Craig, “Retail Lesson Learned..”, Strata Conference, 2011
15. Variety & Variability
if you just have high volume or
velocity, then big data may not
be appropriate. As characteristics
accumulate, however, big data
becomes attractive by way of
cost. The two main drivers are
volume and velocity, while variety
and variability shift the curve
@CosimoAccoto Are you ready for the era of “Big Data”? source: Evelson and Hopkins, “Expand Your Digital Horizon with Big Data”, Forrester Research, 2011
17. Not all industries
are created equal
@CosimoAccoto Are you ready for the era of “Big Data”? source: Mckinsey Quaterly, “Are you ready for the era of “Big Data”?”, October, 2011
18. Not all retail subsectors
are created equal
@CosimoAccoto Are you ready for the era of “Big Data”? source: Mckinsey, “Big Data: The Next Frontier for innovation, competition, productiviy, May, 2011, p.82
19. @CosimoAccoto Are you ready for the era of “Big Data”? source: IBM CMO C-suite studies, “From Stretched to Strengthened”, 2011, p. 16
20. Growing Pains
Storing, securing and
reconciling data are the
most fundamental aspects
of any data management
strategy
But the heavy lifting starts
when companies begin
extracting meaningful
insights from the data and
disseminating them
throughout the organization
@CosimoAccoto Are you ready for the era of “Big Data”? source: Economist Intelligence Unit, “Big Data: Harnessing a game-changing asset”, 2011, p. 17
21. From Stretched to Strengthened
@CosimoAccoto Are you ready for the era of “Big Data”? source: Economist Intelligence Unit, “Big Data: Harnessing a game-changing asset”, 2011, p. 11
22. If the 1st CMO
Challenge is Data
Deluge, the
1st CIO Plan
Investment 2012 is
BI and Analytics
@CosimoAccoto Are you ready for the era of “Big Data”? source: IBM CMO C-suite studies, “From Stretched to Strengthened”, 2011, p. 16
23. “Big Data” & “Analytics” Together?
big data analytics is the application of advanced
analytic techniques to very big data sets
advanced analytics as a discovery mission
… and a data products builder
@CosimoAccoto Are you ready for the era of “Big Data”? source: TDWI Research, “Big Data Analytics”, Fourth Quarter, 2011, p. 5
24. Big Data Analytics
@CosimoAccoto Are you ready for the era of “Big Data”? source: TDWI Research, “Big Data Analytics”, Fourth Quarter, 2011, p. 23
25. Growth/Commitment
Data visualizazion/discovery
BI/Predictive analytics
Data/Text/Content Mining
Pattern Recognition
In-memory/real-time analytics
Machine Learning
@CosimoAccoto Are you ready for the era of “Big Data”? source: TDWI Research, “Big Data Analytics”, Fourth Quarter, 2011, p. 25
26. Datavis vs Infographics
infographics is useful for referring to any visual representation
of data that is:
• manually drawn (and therefore a custom treatment of the
information);
• specific to the data at hand (and therefore nontrivial to
recreate with different data);
• aesthetically rich (strong visual content meant to draw the
eye and hold interest);
data visualization and information visualization (casually, data viz and
info viz) are useful for referring to any visual representation of data
that is:
• algorithmically drawn (may have custom touches but is largely
rendered with the help of computerized methods);
• easy to regenerate with different data (the same form may be
repurposed to represent different datasets with similar dimes/caract);
• often aesthetically barren (data is not decorated); and
• relatively data-rich (large volumes of data are welcome and viable,
in contrast to infographics).
@CosimoAccoto Are you ready for the era of “Big Data”? source: Iliinsky and Steele, “Designing Data Visualizations 2011, p. 5,6,7
27. Being (Big) Data-Driven
A data-driven organization acquires,
processes, and leverages data in a timely
fashion to create efficiencies, iterate on and
develop new products and navigate
the competitive landscape.
@CosimoAccoto Are you ready for the era of “Big Data”? source: Patil D.J., “Building Data Science Teams”, O’Reilly, 2011, p. 2
28. Being (Big) Data-Driven
Zynga constantly monitors who their users are and what they are
doing, generating an incredible amount of data in the process.
By analyzing how people interact with a game over time, they
have identified tipping points that lead to a successful game.
They know how the probability that users will become long-term
changes based on the number of interactions they have with
others, the number of buildings they build in the first n days, the
number of mobsters they kill in the first m hours, etc. They have
figured out the keys to the engagement challenge and have
built their product to encourage users to reach those goals.
@CosimoAccoto Are you ready for the era of “Big Data”? source: Patil D.J., “Building Data Science Teams”, O’Reilly, 2011, p. 2
29. Data Scientists and “Data Products”
• Products that provide highly personalized content
(e.g., the ordering/ ranking of information in a news feed).
• Products that help drive the company’s value proposition
(e.g., “People You May Know” and other applications that suggest friends or other
types of connections).
• Products that facilitate the introduction into other products
(e.g., “Groups You May Like,” which funnels you into LinkedIn’s Groups product area).
• Products that prevent dead ends
(e.g., collaborative filters that suggest further purchases, such as Amazon’s “People
who viewed this item also viewed ...”).
• Products that are stand alone
(e.g., news relevancy products like Google News, LinkedIn Today, etc.).
@CosimoAccoto Are you ready for the era of “Big Data”? source: Patil D.J., “Building Data Science Teams”, O’Reilly, 2011, p. 2
30. The roles of a data scientist
Decision sciences and business intelligence
Product and marketing analytics
Fraud, abuse, risk and security
Data services and operations
Data engineering and infrastructure
Organizational and reporting alignment
@CosimoAccoto Are you ready for the era of “Big Data”? source: Patil D.J., “Building Data Science Teams”, O’Reilly, 2011
31. Being (Big) Data-Driven
…Hey look!!!..I’m not Zynga,Google or FB…
I’m a retailer, I’m a bank, I’m an insurance,
I’m a publisher, I’m a fashionist …
…What does it mean to me?
@CosimoAccoto Are you ready for the era of “Big Data”?
32. Big Data in Consumer Electronics
HP and the “Project Fusion” - To
correlate social media
conversations about specific
product features to actual
customer transactions in real-time
1. “unstructured
data” (Amazon.com reviews,
customer surveys, customer
support logs, and other natural-
language text);
2. “structured data” (customer
support tickets, sales transactions,
customer demographics)
@CosimoAccoto Are you ready for the era of “Big Data”? source: Prasanna Dhore “Customer Intelligence at HP”, 2010
33. Big Data in Consumer Electronics
customer sentiment analysis (unstructured) +
product profile (structured)
@CosimoAccoto Are you ready for the era of “Big Data”? source: Prasanna Dhore “Customer Intelligence at HP”, 2010
34. Big Data in Consumer Electronics
@CosimoAccoto Are you ready for the era of “Big Data”? source: Prasanna Dhore “Customer Intelligence at HP”, 2010
35. Machine Learning in Travel Services
Improve the customer experience (reduce latency, increase
coverage) when searching for hotel rates while controlling impact
on suppliers (maintain “look-to-book”).
Hotel sort optimization: How can we improve the ranking of hotel search results in order to
show consumers hotels that more closely match their preferences?
Cache optimization: can we intelligently cache hotel rates in order to optimize the
performance of hotel searches?
Personalization/segmentation: can we show targeted search results to specific consumer
segments?
@CosimoAccoto Are you ready for the era of “Big Data”? source: Orvitz Worldwide, 2011
36. Machine Learning in Travel Services
Data Driven Approaches:
Traffic Partitioning: Identify
the subset of traffic that is
most efficient and
optimize that subset
through prefetching and
increased bursting.
TTL Optimization: Use
historic logs of availability
and rate change
information to predict
volatility of hotel rates and
optimize cache TTL.
@CosimoAccoto Are you ready for the era of “Big Data”? source: Orvitz Worldwide, 2011
37. Big Data Value for Car Rental
The goal is to identify car and equipment rental performance
levels to enable pinpointing issues and making the necessary
adjustments to improve customer satisfaction levels.
Using analytics software, Hertz location managers are able to
effectively monitor customer comments to deliver top customer
satisfaction scores for this critical level of service. In Philadelphia,
survey feedback led managers to discover that delays were
occurring at the returns area during certain parts of the day.
They quickly adjusted staffing levels and ensured a manager was
always present in the area during these specific times.
@CosimoAccoto Are you ready for the era of “Big Data”? source: IBM big Data Cases, 2011
38. Big Data in Location-Based Services
Customer supports/suggestions
@CosimoAccoto Are you ready for the era of “Big Data”? source: in Amazon Big Data Use Cases , 2011
39. Big Data in Location-Based Services and more…
Not only Data Products…
• Analyze ad stats (reporting,
billing, algorithm inputs)
• Analyze A/B test results
• Detect duplicate business
listings
• Email bounce processing
• Identify bots based on traffic
patterns
@CosimoAccoto Are you ready for the era of “Big Data”? source: in Amazon Big Data Use Cases , 2011
40. Are you ready to be a Big Data Leader?
;-)
@CosimoAccoto Are you ready for the era of “Big Data”?
41. Thanks
@CosimoAccoto
This research is part of a more general project on control and management in
digital markets, to which the author collaborates as field expert with prof.
Andreina Mandelli, SDA Bocconi Milan and USI Lugano,
within the framework of the project
BIT (Business Information Technology)
http://www.anderson.ucla.edu
For more information on the project:
Andreina.mandelli@sdabocconi.it
Andreina.mandelli@usi.ch