Data Science Tech Institute - Big Data and Data Science Conference around Dr Gregory Piatetsky-Shapiro.
Keynote - An overview on Big Data & Data Science Dr Gregory Piatetsky-Shapiro - KDnuggets.com Founder & Editor.
Paris May 23rd & Nice May 26th 2016 @ Data ScienceTech Institute (https://www.datasciencetech.institute/)
14. Earliest use of “data mining”: 1962
(c) KDnuggets 2016 15
Source: Google Books
After eliminating many “following data. Mining cost is ” examples
which refer to Mining of minerals,
and books from “1958” that have a CD attached (errors in book year)
The earliest “data mining” reference I found is
38. The best data scientists have one
thing in common –
unbelievable curiosity
DJ Patil, US First Chief Data Scientist
http://www.sciencefriday.com/articles/10-questions-for-the-
nations-first-chief-data-scientist
April 2016
39
51. Lesson 8: Limits to Predicting Human
Behavior?
• Inherent randomness, complexity in human
behavior
• Individual predictions have limited accuracy
(but can still be better than random and very
useful for consumer analytics)
• Aggregate predictions (eg who will win the
election) more accurate, because individual
randomness cancels out
(c) KDnuggets 2016 52
53. Direct Marketing Lift:
Random and Model-sorted Lists
0
10
20
30
40
50
60
70
80
90
100
5
15
25
35
45
55
65
75
85
95
Random
Model
5% of random list have 5% of hits
5% of model-score ranked list have 21% of hits.
Lift(5%) = 21%/5% = 4.2
Pct list
CPH:CumulativePctHits
54. Most lift curves are surprising similar-
limit to human predictability?
Study of lift curves in banking,
telecom
Best lift curves are similar
Special point T=Target
percentage
Lift(T) ~ sqrt (1/T)
G. Piatetsky-Shapiro, B. Masand,
Estimating Campaign Benefits and
Modeling Lift, in Proceedings of
KDD-99 Conference, ACM Press,
1999.
(c) KDnuggets 2016 55
0
2
4
6
8
10
12
14
0 5 10 15 20 25
100*T%
Lift
Actual lift(T) Est. lift(T)
88. Shortage of Data Scientists?
• McKinsey (2011): shortage by 2018 in US
– 140-190,000 people with deep analytical skills
– 1.5 M managers/analysts with the know-how to
use the analysis of big data to make effective
decisions.
Source:
www.mckinsey.com/mgi/publications/big_data/
93(c) KDnuggets 2016
89. Data Scientist –
Sexiest Job of the 21st Century?
• Thomas H. Davenport and D.J. Patil, (Harvard
Business Review, 2012)
94(c) KDnuggets 2016
96. Big Data
• Next Industrial Revolution
• Data Science is the Engine of Big Data
101(c) KDnuggets 2016
97. Doing Old Things Better
Application areas
– Direct marketing/Customer modeling
– Recommendations
– Fraud detection
– Security/Intelligence
– Healthcare
– …
• Competition will level companies
102(c) KDnuggets 2016
98. Big Data Enables New Things !
• Google – first big success of big data
• Social networks (Facebook, Twitter, LinkedIn,
…) success depends on network size, i.e. big
data
• Big Data in Health-care
– image analysis, diagnosis,
– Personalized medicine
• Recommendations - Netflix streaming
103(c) KDnuggets 2016
Churn: best algorithms for predicting churn have lift of 5-7 – 5-7 times better than random.
Behavioral advertising: 2-3% CTR – 10 times better than random
Future is Bright for Big Data, but need use caution when evaluating claims