Hiring data scientists and deploying Hadoop is not enough. Your company needs a data driven culture, based on values such as honesty, democracy, creativity and strategy. Your company also needs good data engineering and good experimentation practices.
4. ✦ Big Data no longer considered an emerging
technology (pervasive in industry)
✦ Entered Trough of Disillusionment in 2013
https://knowledgeimmersion.wordpress.com/2016/06/22/disillusionment-of-big-data/
6. Data science
✦ Statistics (late 19th century)
✦ Computer Science (1950s)
✦ Machine Learning (1950s)
✦ Data Mining (1990s)
✦ Data Science (2010s)
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
yet another hyped term
7. Beware: controversy
✦ Data science is not all-science
✴ It’s getting more and more engineering-like, a practice
✴ Data storytelling is a creative endeavor
✦ Hyper-inflated expectations, misunderstood
concepts and hurry to get value: a dangerous
recipe
8. A new hope
machine learning
big data
https://trends.google.com/trends/explore?date=today%2012-m&geo=US&q=machine%20learning,big%20data
or hype
9. Hype: not that bad
✦ Haters gonna hate i.e. don’t fully hate the hype
✴ more practitioners = faster tech and processes evolution
✴ Highly skilled professionals and innovation
✦ Academics sometimes look for difficult unwanted
problems
✴
industry is more pragmatic, specially in tech
https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science
10. What we need…
✦ Forget about Big Data pokémons
✴ OH so in Big Data we don’t need people to think schemas?
✦ Forget about misunderstood business expectations
✴ OH in deep learning we don’t need people to train models?
✦ You need PEOPLE
✴ Collaborating with shared values
✴ Awesome in tech but more importantly: CREATIVE
13. Good people
✦ People are more important than ideas
✴ A mediocre team will screw up a good idea
✴ Mediocre idea to great team: they will fix it or rethink it
✦ A good lab: different kinds of autonomous thinkers
✴ Why hire smart people if they can't fix what’s broken?
✦ Prefer a heterogeneous and complimentary team
instead of looking for unicorns
14. The mythical 10x professional
https://twitter.com/icaromedeiros/status/838968884023668737
15. Good communication
✦ Honesty, excellence, originality and self-
criticism (values)
✦ Communication structure <> organizational
✦ Be ready to hear the truth
✴ Sincerity is only valuable if people are open and willing to give
up on ideas that will not work
✦ Braintrust: Leave ego and Jobs outside the door
16. Power to the people!
✦ Product quality is everyone’s responsibility
✴ Don’t ask permission to take responsibility
✦ Passion and excellence versus autonomy
✦ Good things might shadow the bad
✴ People struggle to explore bad things to avoid being called
“complainers”
18. Destroy data silos!
✦ Without information about data there is no science
✦ Software and data should be a collective property
within the company
✦ Knowledge management matter
✦ Communication between areas must be enforced
19. Data portals
✦ Self-service platforms to publish datasets
✴ Descriptions, schemas, samples, relations between datasets,
etc
✦ Open Data initiatives, mostly governments
✦ OSS platforms: CKAN, AirBNB’s Dataportal
✦ Examples: data.gov.uk, dados.gov.br, etc
20. “When it comes to creative
inspiration, job titles and
hierarchy are meaningless”
21.
22. Data storytelling
✦ Explain what numbers tell in layman, clear terms
✦ Make hidden premises clear
✴ Outside data insights
✦ Convince others about actions
✴ Decreases insights-to-value interval
✦ From data to knowledge
https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs
23. What is creativity
✦ Unexpected connections of concepts and ideas
✦ It's a marathon, it needs rhythm
✦ Creativity must start somewhere and there’s power
on healthy feedback in a iterative process
24. Visual communication
✦ Clean straightforward graphs > visually appealing
✴ Choose dataviz libs wisely
✦ “Don’t make me think”
✦ The right graph for the right audience
✴ Prefer a language everyone understands
29. Avoid egotrip data science
✦ “OH my cluster has 10 Petabytes, I’m awesome”
✦ Fancy ML algorithms are not the goal
✦ The most important V in Big Data is value
https://twitter.com/amyhoy/status/847097034536554497
30. KPI versus HiPPO
✦ Tech adoption per se is meaningless
✴ Slide-driven Big Data
✴ KPIs should grow from Big Data and data insights initatives
✦ Poor defined goals -> bad decisions
✦ Define viable but ambitious goals
✦ Data beats opinion
31. Set goal, plan and GO!
✦ Business questions can't be like “OH we want to
detect things related to millennials”
✦ Clear goals must be set, with actionable metrics
✦ Balance perfect models versus time-to-market
✦ Brad Bird: “Sometimes, as a director, you’re
guiding. Sometimes you’re letting the car drive”
https://hbr.org/2017/02/how-chief-data-officers-can-get-their-companies-to-collect-clean-data
32. The process
✦ The process is not the goal
✴ It has no agenda or taste, it’s just a tool
✦ Quality is the best business plan
✦ Agile is a mindset: not only kanbans or scrum
✦ If the model will become operational, mix scientists
and engineers from start
33. Build vs Buy
✦ If you buy and your core business is not techie, you can be
illiterate in tech
✴ Benchmark before buying
✴ Accelerate results and boost internal knowledge
✦ If you build and have a good-enough techie culture, you’re
more or less good to go
✴ Assess pros and cons consciously
✦ If you surf the tech hype AND build good systems you’re
awesome
37. Big Data vs Great Data
✦ If your logical models do not make sense
✦ Most performed queries are slow
✦ If you have string-only databases
✦ If you have unused expensive data
✦ Maybe your data lake is a swamp
38. “The data is a mess”
✦ First step: accelerate human understanding of data
✴ Metadata, context, hidden assumptions
✦ Datasets might serves multiple purposes
✴ Define rationale and context
✴ Data portals and understandable datasets > Dashboards
https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science
https://medium.com/airbnb-engineering/democratizing-data-at-airbnb-852d76c51770
39. Data lost in translation
✦ Heterogeneous and siloed databases (and people)
✦ Rethink ESB (microservices network)
✦ State-of-the-art: data workflow
✴ Luigi, Airflow (open source), almost every big tech vendor
✴ Transparency, reusability, reproducibility, traceability
✴ Automation and monitoring all the way!
https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science
40. Beyond relational models
✦ Not all data problems fits well in traditional SQL or
DW models
✴ Key-value, columnar, graph-based, inverted index, etc
✦ Models are a framework for problem-solving
✴ Not the ultimate answer
✴ There’s no one-size-fits-all model
41. Do not forget fluency
✦ Check the company lingua franca
✦ Make it easy for critical decision-makers
✴ Adhoc SQL queries?
✴ Dashboards?
✴ Reports?
43. Experiments
✦ Missions to discover facts towards understanding
✴ They don’t fail, any result produces new information
✴ If the initial theory was wrong: good
✴ With new facts you can reformulate the question
✦ Get more modeling questions asked more often
✦ Iterative data science
44. Product experimentation (A/B)
✦ Product experimentation should be hypothesis-
driven (not feature-driven)
✦ Define the proper exposed population
✴ No new users, no heavy users only, no early adopters
✦ Understanding effect is essential
https://medium.com/airbnb-engineering/4-principles-for-making-experimentation-count-7a5f1a5268a
45. 5 stages of A/B tests
https://www.linkedin.com/pulse/ab-testing-which-do-i-pick-sahar-heidari
46. Some other quick tips
✦ Focus on outcomes (not algorithms or methods)
✦ Design the right metric and evaluation
✦ Good experiments don't produce obvious insights
✦ Mix of data and intuition
https://twitter.com/mrdatascience/status/869957499662860288
47. Being data driven
✦ Be BAYESIAN - uncertainty is everywhere
✦ Be CURIOUS - keep learning
✦ Be AGILE - Fail fast, not too fast: evidence comes first
https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
48. Being data driven
✦ Be TRUTHFUL - don’t torture data to please opinions
✦ Be HELPFUL - work across silos, support democracy
✦ Be WISE - know when to be analytical or intuitive
https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
49. With the right people,
Democracy,
Creativity,
Strategy,
Big Great Data™
and Experiments
there's a good chance to do great
SCIENCE
Take-away message