3. Dataiku - Data Tuesday
Meet Hal Alowne
3
Big Guys
• 10B$+ Revenue
• 100M+ customers
• 100+ Data Scientist
Hal Alowne
BI Manager
Dim’s Private Showroom
Hey Hal ! We need
a big data platform
like the big guys.
Let’s just do as they do!
‟
”Average E-commerce Web site
• 100M$ Revenue
• 1 Million customer
• 1 Data Analyst (Hal Himself)
Dim Sum
CEO & Founder
Dim’s Private Showroom
Big Data
Copy Cat
Project
6. LOL PLATFORM ANTI-PATTERN
6
Test and Invest in Infrastructure == Skilled People
or
Go For Cloud / Packaged Infrastructure
Your Brand New Hadoop Cluster
is perceived as slow, not so used
and not reliable
7. TECHNO MISMATCH ANTI-PATTERN
7
Assume Being Polyglot
or
Be a Dictator
VS
VS
The Python
Clan
The R
Tribe
The Old Elephant
Fraternity
The New Elephant
Club
8. PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY
8
Website 2000’ winners
Companies that were able to release fast
"Artificial Intelligence with Data for
Internet of Things" 2010’ winners
Companies able to put intelligence in production
?
Design a way to put “PREDITICTIVE MODELS”
IN PRODUCTION
10. Classic BI Team Org
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor BI Solution Architect
Model Designer
ETL Developer
Dashboard / Report Designer
DBA / IT Data Owner
Specs
11. Data Science Team Org
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor Data Team Manager
Data Engineer
Data Analyst
Data System Engineer /
Data Architect
Specs
Data Scientist
12. Built From Scratch
12
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
DBA / IT Data Owner
Specs
21. What is the main reason for data project to fail ?
21
DATA
NOT
AVAILABLE
22. BUT FOR ONLY INCREMENTAL GAIN
Contribu=on to the overall project performance
0 % 25 % 50 % 75 % 100 %
20 %30 %50 %
Business Goal Definition and Data Feature Engineering Algorithm
23. How to Get Data if you don’t have it
23
THE GRASSHOPER THE SPIDER THE FOX
24. The Cicada : Optimistic and Opportunistic Data
24
THE CICADA
As a startup
As a group inside a company
- Build a new product using open data
- Benefit from the data sharing initiative within your company
- Wait for data to be available in your data lake
25. The Spider: Power of the Network
25
THE SPIDER
As a startup
As a group inside a company
- Create a network of (web trackers | sensors)
- Make it available for free
- Build your service on people’s collected data
- Make a web service available to collect data
- Promote it internally so that people use it
26. The Fox: Hunt for the Big Money first
26
THE FOX
As a startup
As a group inside a company
- Hunt for a Business Group within a large company with a problem
- Build a SaaS solution using their data
- Replicate to competitors
- Take in a charge a critical problem as per the CEO’s request
- Build your own integrated tech team to solve it
- Use those ressources to reset data services internally
29. The Age Of Distributed Intelligence
29
Global, Personalised
and Real Time Data
Driven Services
30. Data to Visualize or Data to Automate ?
30
2013 2014 2015 2015 2017 2018
Automated Decision VIsualize To Decide
Moving to a world of automated decision making
31. Where is your added value ?
31
Is the problem at the Core of
my Business Process?
Is it a common problem / with
share data ?
Go for Best of
Breed SAAS
Solution
Can I Solve it on my own ?
Really ?
Build by the
data team
Build by the
data team ?
Build by the
data team
Hire
Consultants
and Learn
Yes
Yes No
I can’t Ok, I can try
Yes!
No!
No
32. Be aware of the confort zone
32
Mission
Critical
Small
Structured
Large
Diverse
Sheer
Curiosity
Reporting
for Finance
in Any Industry
Analyze
Each Tweet
Web Navigation
For E-Merchant
Ticket Data
For Discounts
in Retail
Phone Call
Logs for Security
RTB Data
For Advertising
Customer
Consumption
For Anti-Churn
in Utilities
Optimization
Filings
For Fraud
in Insurance
Not Enough
Data To Learn
From ?
Not Enough
“Hard" Examples
So that you can learn
33. Infuse the Data and Try Mindset
33
Brendan Stern is now
a Specialist for Data
Science in Healthcare
at Dataiku
“ When I was 20, as I was working as a
manager at my Starbucks shop, I realised
that I could probably enhance the amount
of sales for ground coffee. Depending on the
day and time of days, I kept moving around
the ground coffee. I manage to made some
A/B tests that optimised the average sale
amount by 12%”
35. Create an "API" Culture
35
Do not share
• Random Piece of Code
• Flat File
Do share
• Reproductible documented workflows
• Clean, documented APIs
36. 36
WAITING FOR QUESTIONS SLIDEWAITING FOR QUESTIONS SLIDE
More food for thoughts
on Dataiku’s blog
http://www.dataiku.com/blog/
Find us on Twitter
@fdouetteau @dataiku