SlideShare uma empresa Scribd logo
1 de 50
Baixar para ler offline
Data Science &
Culture
(Or how to stop worrying and love data driven culture)
Ícaro Medeiros
Data Science Forum
São Paulo, Jun 2017
Inspired by
(not limited to)
refs
Big Data
http://www.kdnuggets.com/2017/02/origins-big-data.html
✦ Fundamental blocks: evolutions on CS e.g.
distributed systems, databases, massive AI, etc

✦ Fuzzy concept, ill-defined

✦ Popularized by Gartner

(hype-fueled consulting firm)
✦ Big Data no longer considered an emerging
technology (pervasive in industry)

✦ Entered Trough of Disillusionment in 2013
https://knowledgeimmersion.wordpress.com/2016/06/22/disillusionment-of-big-data/
http://www.mikelnino.com/2016/03/chronology-big-data.html
Chronology of antecedents
Data science
✦ Statistics (late 19th century)

✦ Computer Science (1950s)

✦ Machine Learning (1950s)

✦ Data Mining (1990s)

✦ Data Science (2010s)
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
yet another hyped term
Beware: controversy
✦ Data science is not all-science
✴ It’s getting more and more engineering-like, a practice

✴ Data storytelling is a creative endeavor
✦ Hyper-inflated expectations, misunderstood
concepts and hurry to get value: a dangerous
recipe
A new hope
machine learning
big data
https://trends.google.com/trends/explore?date=today%2012-m&geo=US&q=machine%20learning,big%20data
or hype
Hype: not that bad
✦ Haters gonna hate i.e. don’t fully hate the hype

✴ more practitioners = faster tech and processes evolution
✴ Highly skilled professionals and innovation

✦ Academics sometimes look for difficult unwanted
problems

✴
industry is more pragmatic, specially in tech
https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science
What we need…
✦ Forget about Big Data pokémons

✴ OH so in Big Data we don’t need people to think schemas?

✦ Forget about misunderstood business expectations

✴ OH in deep learning we don’t need people to train models?

✦ You need PEOPLE

✴ Collaborating with shared values

✴ Awesome in tech but more importantly: CREATIVE
Shared values
and practices
Culture
Good people
✦ People are more important than ideas

✴ A mediocre team will screw up a good idea

✴ Mediocre idea to great team: they will fix it or rethink it

✦ A good lab: different kinds of autonomous thinkers

✴ Why hire smart people if they can't fix what’s broken?

✦ Prefer a heterogeneous and complimentary team
instead of looking for unicorns
The mythical 10x professional
https://twitter.com/icaromedeiros/status/838968884023668737
Good communication
✦ Honesty, excellence, originality and self-
criticism (values)

✦ Communication structure <> organizational

✦ Be ready to hear the truth

✴ Sincerity is only valuable if people are open and willing to give
up on ideas that will not work

✦ Braintrust: Leave ego and Jobs outside the door
Power to the people!
✦ Product quality is everyone’s responsibility
✴ Don’t ask permission to take responsibility

✦ Passion and excellence versus autonomy

✦ Good things might shadow the bad

✴ People struggle to explore bad things to avoid being called
“complainers”
Rebels
http://qaspire.com/2017/05/19/sketchnote-what-rebels-want-from-their-boss/
Destroy data silos!
✦ Without information about data there is no science

✦ Software and data should be a collective property
within the company

✦ Knowledge management matter

✦ Communication between areas must be enforced
Data portals
✦ Self-service platforms to publish datasets

✴ Descriptions, schemas, samples, relations between datasets,
etc

✦ Open Data initiatives, mostly governments

✦ OSS platforms: CKAN, AirBNB’s Dataportal

✦ Examples: data.gov.uk, dados.gov.br, etc
“When it comes to creative
inspiration, job titles and
hierarchy are meaningless”
Data storytelling
✦ Explain what numbers tell in layman, clear terms

✦ Make hidden premises clear

✴ Outside data insights

✦ Convince others about actions

✴ Decreases insights-to-value interval
✦ From data to knowledge
https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs
What is creativity
✦ Unexpected connections of concepts and ideas

✦ It's a marathon, it needs rhythm

✦ Creativity must start somewhere and there’s power
on healthy feedback in a iterative process
Visual communication
✦ Clean straightforward graphs > visually appealing

✴ Choose dataviz libs wisely

✦ “Don’t make me think”

✦ The right graph for the right audience

✴ Prefer a language everyone understands
Visual communication 101
Stats are not enough
https://www.autodeskresearch.com/publications/samestats
Stats are not enough
https://www.autodeskresearch.com/publications/samestats
Strateg a
Avoid egotrip data science
✦ “OH my cluster has 10 Petabytes, I’m awesome”

✦ Fancy ML algorithms are not the goal

✦ The most important V in Big Data is value
https://twitter.com/amyhoy/status/847097034536554497
KPI versus HiPPO
✦ Tech adoption per se is meaningless

✴ Slide-driven Big Data

✴ KPIs should grow from Big Data and data insights initatives

✦ Poor defined goals -> bad decisions

✦ Define viable but ambitious goals

✦ Data beats opinion
Set goal, plan and GO!
✦ Business questions can't be like “OH we want to
detect things related to millennials”

✦ Clear goals must be set, with actionable metrics

✦ Balance perfect models versus time-to-market

✦ Brad Bird: “Sometimes, as a director, you’re
guiding. Sometimes you’re letting the car drive”
https://hbr.org/2017/02/how-chief-data-officers-can-get-their-companies-to-collect-clean-data
The process
✦ The process is not the goal

✴ It has no agenda or taste, it’s just a tool

✦ Quality is the best business plan

✦ Agile is a mindset: not only kanbans or scrum

✦ If the model will become operational, mix scientists
and engineers from start
Build vs Buy
✦ If you buy and your core business is not techie, you can be
illiterate in tech
✴ Benchmark before buying

✴ Accelerate results and boost internal knowledge

✦ If you build and have a good-enough techie culture, you’re
more or less good to go

✴ Assess pros and cons consciously

✦ If you surf the tech hype AND build good systems you’re
awesome
https://twitter.com/Doug_Laney/status/847452219641356288
When data goes to vendors…
http://www.louisdorard.com/machine-learning-canvas/
DATA
ENGINEERING
Big Data vs Great Data
✦ If your logical models do not make sense

✦ Most performed queries are slow

✦ If you have string-only databases

✦ If you have unused expensive data

✦ Maybe your data lake is a swamp
“The data is a mess”
✦ First step: accelerate human understanding of data

✴ Metadata, context, hidden assumptions

✦ Datasets might serves multiple purposes

✴ Define rationale and context

✴ Data portals and understandable datasets > Dashboards
https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science
https://medium.com/airbnb-engineering/democratizing-data-at-airbnb-852d76c51770
Data lost in translation
✦ Heterogeneous and siloed databases (and people)

✦ Rethink ESB (microservices network)

✦ State-of-the-art: data workflow

✴ Luigi, Airflow (open source), almost every big tech vendor

✴ Transparency, reusability, reproducibility, traceability

✴ Automation and monitoring all the way!
https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science
Beyond relational models
✦ Not all data problems fits well in traditional SQL or
DW models

✴ Key-value, columnar, graph-based, inverted index, etc

✦ Models are a framework for problem-solving
✴ Not the ultimate answer

✴ There’s no one-size-fits-all model
Do not forget fluency
✦ Check the company lingua franca

✦ Make it easy for critical decision-makers

✴ Adhoc SQL queries?

✴ Dashboards?

✴ Reports?
EXPERIMENTATION
Experiments
✦ Missions to discover facts towards understanding

✴ They don’t fail, any result produces new information

✴ If the initial theory was wrong: good

✴ With new facts you can reformulate the question

✦ Get more modeling questions asked more often

✦ Iterative data science
Product experimentation (A/B)
✦ Product experimentation should be hypothesis-
driven (not feature-driven)

✦ Define the proper exposed population
✴ No new users, no heavy users only, no early adopters

✦ Understanding effect is essential
https://medium.com/airbnb-engineering/4-principles-for-making-experimentation-count-7a5f1a5268a
5 stages of A/B tests
https://www.linkedin.com/pulse/ab-testing-which-do-i-pick-sahar-heidari
Some other quick tips
✦ Focus on outcomes (not algorithms or methods)

✦ Design the right metric and evaluation
✦ Good experiments don't produce obvious insights

✦ Mix of data and intuition
https://twitter.com/mrdatascience/status/869957499662860288
Being data driven
✦ Be BAYESIAN - uncertainty is everywhere

✦ Be CURIOUS - keep learning
✦ Be AGILE - Fail fast, not too fast: evidence comes first
https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
Being data driven
✦ Be TRUTHFUL - don’t torture data to please opinions

✦ Be HELPFUL - work across silos, support democracy
✦ Be WISE - know when to be analytical or intuitive
https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
With the right people,
Democracy,
Creativity,
Strategy,
Big Great Data™
and Experiments
there's a good chance to do great
SCIENCE
Take-away message
Ícaro Medeiros
Data Scientist
icaromedeiros

Mais conteúdo relacionado

Mais procurados

The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data ScienceEMC
 
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...ux singapore
 
Lessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsLessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsGregory Kamradt
 
Science in the context of journals, Open, and the future
Science in the context of journals, Open, and the futureScience in the context of journals, Open, and the future
Science in the context of journals, Open, and the futureBenjamin Laken
 
Less is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/OLess is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/OMichael Roytman
 
Data and Algorithmic Bias in the Web
Data and Algorithmic Bias in the WebData and Algorithmic Bias in the Web
Data and Algorithmic Bias in the WebWebVisions
 
Mental Health Informatics - What we can learn from the past and where we can be
Mental Health Informatics - What we can learn from the past and where we can beMental Health Informatics - What we can learn from the past and where we can be
Mental Health Informatics - What we can learn from the past and where we can beHimanshu Tyagi
 
The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)Lakshmi Prasanna
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science processMathieu d'Aquin
 
Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)Tim O'Reilly
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-surveyAdam Rabinovitch
 
Trusting a Distributed Data Pipeline | Masters of Conversion
Trusting a Distributed Data Pipeline | Masters of ConversionTrusting a Distributed Data Pipeline | Masters of Conversion
Trusting a Distributed Data Pipeline | Masters of ConversionVWO
 

Mais procurados (17)

The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
 
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
 
Lessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsLessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science Interviews
 
Science in the context of journals, Open, and the future
Science in the context of journals, Open, and the futureScience in the context of journals, Open, and the future
Science in the context of journals, Open, and the future
 
Less is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/OLess is More: Behind the Data at Risk I/O
Less is More: Behind the Data at Risk I/O
 
Data and Algorithmic Bias in the Web
Data and Algorithmic Bias in the WebData and Algorithmic Bias in the Web
Data and Algorithmic Bias in the Web
 
Mental Health Informatics - What we can learn from the past and where we can be
Mental Health Informatics - What we can learn from the past and where we can beMental Health Informatics - What we can learn from the past and where we can be
Mental Health Informatics - What we can learn from the past and where we can be
 
The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science process
 
Big data to big understanding
Big data to big understandingBig data to big understanding
Big data to big understanding
 
Designing Data for Dignity StrataRx
Designing Data for Dignity StrataRxDesigning Data for Dignity StrataRx
Designing Data for Dignity StrataRx
 
Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)Solving the Wanamaker Problem for Healthcare (keynote file)
Solving the Wanamaker Problem for Healthcare (keynote file)
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-survey
 
Connect, communicate, collaborate
Connect, communicate, collaborateConnect, communicate, collaborate
Connect, communicate, collaborate
 
How Change Happens
How Change HappensHow Change Happens
How Change Happens
 
Small data big impact
Small data big impactSmall data big impact
Small data big impact
 
Trusting a Distributed Data Pipeline | Masters of Conversion
Trusting a Distributed Data Pipeline | Masters of ConversionTrusting a Distributed Data Pipeline | Masters of Conversion
Trusting a Distributed Data Pipeline | Masters of Conversion
 

Semelhante a Data Science Culture & the Importance of People

How AI is revolutionizing the world
How AI is revolutionizing the worldHow AI is revolutionizing the world
How AI is revolutionizing the worldSK Reddy
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressedBonnie Holub
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressedBonnie Holub
 
Big Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its powerBig Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its powerLucas Group
 
DevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the TraumaDevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the TraumaDevOpsDays DFW
 
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...DATAVERSITY
 
How to get on the AI journey?
How to get on the AI journey? How to get on the AI journey?
How to get on the AI journey? Aarthi Srinivasan
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesDATAVERSITY
 
15 Ways to Stand Out on Your IT Team
15 Ways to Stand Out on Your IT Team15 Ways to Stand Out on Your IT Team
15 Ways to Stand Out on Your IT TeamAll Things Open
 
Maximizing Business Connections Through Social Media
Maximizing Business Connections Through Social MediaMaximizing Business Connections Through Social Media
Maximizing Business Connections Through Social Mediadrewblue
 
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsThe New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsJohn Blossom
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a professionJose Quesada
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...Kai Wähner
 
Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)DATAVERSITY
 
DataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data SinsDataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data SinsDATAVERSITY
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Chris Dagdigian
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunitiesJose Quesada
 
Data science market insights usa
Data science market insights usaData science market insights usa
Data science market insights usaKaitlin McAndrews
 

Semelhante a Data Science Culture & the Importance of People (20)

How AI is revolutionizing the world
How AI is revolutionizing the worldHow AI is revolutionizing the world
How AI is revolutionizing the world
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressed
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressed
 
Big Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its powerBig Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its power
 
DevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the TraumaDevSecOps Through Blunt Force Trauma, I'm the Trauma
DevSecOps Through Blunt Force Trauma, I'm the Trauma
 
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
 
How to get on the AI journey?
How to get on the AI journey? How to get on the AI journey?
How to get on the AI journey?
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph Databases
 
Technical Communication, Marketing , Truth
Technical Communication, Marketing , TruthTechnical Communication, Marketing , Truth
Technical Communication, Marketing , Truth
 
15 Ways to Stand Out on Your IT Team
15 Ways to Stand Out on Your IT Team15 Ways to Stand Out on Your IT Team
15 Ways to Stand Out on Your IT Team
 
Maximizing Business Connections Through Social Media
Maximizing Business Connections Through Social MediaMaximizing Business Connections Through Social Media
Maximizing Business Connections Through Social Media
 
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsThe New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
Tf wdvds
Tf wdvdsTf wdvds
Tf wdvds
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
 
Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)
 
DataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data SinsDataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data Sins
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunities
 
Data science market insights usa
Data science market insights usaData science market insights usa
Data science market insights usa
 

Mais de Ícaro Medeiros

Why Python is better for Data Science
Why Python is better for Data ScienceWhy Python is better for Data Science
Why Python is better for Data ScienceÍcaro Medeiros
 
Statistics: the grammar of Data Science
Statistics: the grammar of Data ScienceStatistics: the grammar of Data Science
Statistics: the grammar of Data ScienceÍcaro Medeiros
 
Linked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comLinked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comÍcaro Medeiros
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...Ícaro Medeiros
 
Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Ícaro Medeiros
 
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Ícaro Medeiros
 
Engenharia de ontologias
Engenharia de ontologiasEngenharia de ontologias
Engenharia de ontologiasÍcaro Medeiros
 
Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Ícaro Medeiros
 
R2R Framework: Ontology Mapping
R2R Framework: Ontology MappingR2R Framework: Ontology Mapping
R2R Framework: Ontology MappingÍcaro Medeiros
 
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...Ícaro Medeiros
 
Tag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeTag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeÍcaro Medeiros
 
Expressões regulares no Linux
Expressões regulares no LinuxExpressões regulares no Linux
Expressões regulares no LinuxÍcaro Medeiros
 

Mais de Ícaro Medeiros (15)

Why Python is better for Data Science
Why Python is better for Data ScienceWhy Python is better for Data Science
Why Python is better for Data Science
 
Statistics: the grammar of Data Science
Statistics: the grammar of Data ScienceStatistics: the grammar of Data Science
Statistics: the grammar of Data Science
 
Linked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comLinked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.com
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
 
Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)
 
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
 
Engenharia de ontologias
Engenharia de ontologiasEngenharia de ontologias
Engenharia de ontologias
 
Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012
 
Ontology matching
Ontology matchingOntology matching
Ontology matching
 
R2R Framework: Ontology Mapping
R2R Framework: Ontology MappingR2R Framework: Ontology Mapping
R2R Framework: Ontology Mapping
 
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
 
Tag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeTag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of Knowledge
 
Expressões regulares no Linux
Expressões regulares no LinuxExpressões regulares no Linux
Expressões regulares no Linux
 
Ontology Learning
Ontology LearningOntology Learning
Ontology Learning
 
Tag Suggestion
Tag SuggestionTag Suggestion
Tag Suggestion
 

Último

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Último (20)

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

Data Science Culture & the Importance of People

  • 1. Data Science & Culture (Or how to stop worrying and love data driven culture) Ícaro Medeiros Data Science Forum São Paulo, Jun 2017
  • 3. Big Data http://www.kdnuggets.com/2017/02/origins-big-data.html ✦ Fundamental blocks: evolutions on CS e.g. distributed systems, databases, massive AI, etc ✦ Fuzzy concept, ill-defined ✦ Popularized by Gartner
 (hype-fueled consulting firm)
  • 4. ✦ Big Data no longer considered an emerging technology (pervasive in industry) ✦ Entered Trough of Disillusionment in 2013 https://knowledgeimmersion.wordpress.com/2016/06/22/disillusionment-of-big-data/
  • 6. Data science ✦ Statistics (late 19th century) ✦ Computer Science (1950s) ✦ Machine Learning (1950s) ✦ Data Mining (1990s) ✦ Data Science (2010s) https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century yet another hyped term
  • 7. Beware: controversy ✦ Data science is not all-science ✴ It’s getting more and more engineering-like, a practice ✴ Data storytelling is a creative endeavor ✦ Hyper-inflated expectations, misunderstood concepts and hurry to get value: a dangerous recipe
  • 8. A new hope machine learning big data https://trends.google.com/trends/explore?date=today%2012-m&geo=US&q=machine%20learning,big%20data or hype
  • 9. Hype: not that bad ✦ Haters gonna hate i.e. don’t fully hate the hype ✴ more practitioners = faster tech and processes evolution ✴ Highly skilled professionals and innovation ✦ Academics sometimes look for difficult unwanted problems ✴ industry is more pragmatic, specially in tech https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science
  • 10. What we need… ✦ Forget about Big Data pokémons ✴ OH so in Big Data we don’t need people to think schemas? ✦ Forget about misunderstood business expectations ✴ OH in deep learning we don’t need people to train models? ✦ You need PEOPLE ✴ Collaborating with shared values ✴ Awesome in tech but more importantly: CREATIVE
  • 12.
  • 13. Good people ✦ People are more important than ideas ✴ A mediocre team will screw up a good idea ✴ Mediocre idea to great team: they will fix it or rethink it ✦ A good lab: different kinds of autonomous thinkers ✴ Why hire smart people if they can't fix what’s broken? ✦ Prefer a heterogeneous and complimentary team instead of looking for unicorns
  • 14. The mythical 10x professional https://twitter.com/icaromedeiros/status/838968884023668737
  • 15. Good communication ✦ Honesty, excellence, originality and self- criticism (values) ✦ Communication structure <> organizational ✦ Be ready to hear the truth ✴ Sincerity is only valuable if people are open and willing to give up on ideas that will not work ✦ Braintrust: Leave ego and Jobs outside the door
  • 16. Power to the people! ✦ Product quality is everyone’s responsibility ✴ Don’t ask permission to take responsibility ✦ Passion and excellence versus autonomy ✦ Good things might shadow the bad ✴ People struggle to explore bad things to avoid being called “complainers”
  • 18. Destroy data silos! ✦ Without information about data there is no science ✦ Software and data should be a collective property within the company ✦ Knowledge management matter ✦ Communication between areas must be enforced
  • 19. Data portals ✦ Self-service platforms to publish datasets ✴ Descriptions, schemas, samples, relations between datasets, etc ✦ Open Data initiatives, mostly governments ✦ OSS platforms: CKAN, AirBNB’s Dataportal ✦ Examples: data.gov.uk, dados.gov.br, etc
  • 20. “When it comes to creative inspiration, job titles and hierarchy are meaningless”
  • 21.
  • 22. Data storytelling ✦ Explain what numbers tell in layman, clear terms ✦ Make hidden premises clear ✴ Outside data insights ✦ Convince others about actions ✴ Decreases insights-to-value interval ✦ From data to knowledge https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs
  • 23. What is creativity ✦ Unexpected connections of concepts and ideas ✦ It's a marathon, it needs rhythm ✦ Creativity must start somewhere and there’s power on healthy feedback in a iterative process
  • 24. Visual communication ✦ Clean straightforward graphs > visually appealing ✴ Choose dataviz libs wisely ✦ “Don’t make me think” ✦ The right graph for the right audience ✴ Prefer a language everyone understands
  • 26. Stats are not enough https://www.autodeskresearch.com/publications/samestats
  • 27. Stats are not enough https://www.autodeskresearch.com/publications/samestats
  • 29. Avoid egotrip data science ✦ “OH my cluster has 10 Petabytes, I’m awesome” ✦ Fancy ML algorithms are not the goal ✦ The most important V in Big Data is value https://twitter.com/amyhoy/status/847097034536554497
  • 30. KPI versus HiPPO ✦ Tech adoption per se is meaningless ✴ Slide-driven Big Data ✴ KPIs should grow from Big Data and data insights initatives ✦ Poor defined goals -> bad decisions ✦ Define viable but ambitious goals ✦ Data beats opinion
  • 31. Set goal, plan and GO! ✦ Business questions can't be like “OH we want to detect things related to millennials” ✦ Clear goals must be set, with actionable metrics ✦ Balance perfect models versus time-to-market ✦ Brad Bird: “Sometimes, as a director, you’re guiding. Sometimes you’re letting the car drive” https://hbr.org/2017/02/how-chief-data-officers-can-get-their-companies-to-collect-clean-data
  • 32. The process ✦ The process is not the goal ✴ It has no agenda or taste, it’s just a tool ✦ Quality is the best business plan ✦ Agile is a mindset: not only kanbans or scrum ✦ If the model will become operational, mix scientists and engineers from start
  • 33. Build vs Buy ✦ If you buy and your core business is not techie, you can be illiterate in tech ✴ Benchmark before buying ✴ Accelerate results and boost internal knowledge ✦ If you build and have a good-enough techie culture, you’re more or less good to go ✴ Assess pros and cons consciously ✦ If you surf the tech hype AND build good systems you’re awesome
  • 37. Big Data vs Great Data ✦ If your logical models do not make sense ✦ Most performed queries are slow ✦ If you have string-only databases ✦ If you have unused expensive data ✦ Maybe your data lake is a swamp
  • 38. “The data is a mess” ✦ First step: accelerate human understanding of data ✴ Metadata, context, hidden assumptions ✦ Datasets might serves multiple purposes ✴ Define rationale and context ✴ Data portals and understandable datasets > Dashboards https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science https://medium.com/airbnb-engineering/democratizing-data-at-airbnb-852d76c51770
  • 39. Data lost in translation ✦ Heterogeneous and siloed databases (and people) ✦ Rethink ESB (microservices network) ✦ State-of-the-art: data workflow ✴ Luigi, Airflow (open source), almost every big tech vendor ✴ Transparency, reusability, reproducibility, traceability ✴ Automation and monitoring all the way! https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science
  • 40. Beyond relational models ✦ Not all data problems fits well in traditional SQL or DW models ✴ Key-value, columnar, graph-based, inverted index, etc ✦ Models are a framework for problem-solving ✴ Not the ultimate answer ✴ There’s no one-size-fits-all model
  • 41. Do not forget fluency ✦ Check the company lingua franca ✦ Make it easy for critical decision-makers ✴ Adhoc SQL queries? ✴ Dashboards? ✴ Reports?
  • 43. Experiments ✦ Missions to discover facts towards understanding ✴ They don’t fail, any result produces new information ✴ If the initial theory was wrong: good ✴ With new facts you can reformulate the question ✦ Get more modeling questions asked more often ✦ Iterative data science
  • 44. Product experimentation (A/B) ✦ Product experimentation should be hypothesis- driven (not feature-driven) ✦ Define the proper exposed population ✴ No new users, no heavy users only, no early adopters ✦ Understanding effect is essential https://medium.com/airbnb-engineering/4-principles-for-making-experimentation-count-7a5f1a5268a
  • 45. 5 stages of A/B tests https://www.linkedin.com/pulse/ab-testing-which-do-i-pick-sahar-heidari
  • 46. Some other quick tips ✦ Focus on outcomes (not algorithms or methods) ✦ Design the right metric and evaluation ✦ Good experiments don't produce obvious insights ✦ Mix of data and intuition https://twitter.com/mrdatascience/status/869957499662860288
  • 47. Being data driven ✦ Be BAYESIAN - uncertainty is everywhere ✦ Be CURIOUS - keep learning ✦ Be AGILE - Fail fast, not too fast: evidence comes first https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
  • 48. Being data driven ✦ Be TRUTHFUL - don’t torture data to please opinions ✦ Be HELPFUL - work across silos, support democracy ✦ Be WISE - know when to be analytical or intuitive https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
  • 49. With the right people, Democracy, Creativity, Strategy, Big Great Data™ and Experiments there's a good chance to do great SCIENCE Take-away message