SlideShare uma empresa Scribd logo
1 de 20
The Evolving Data Science Landscape
Kyle Polich
Data Science, Inc.
LIGO
One of the most
advanced metrology
projects; one of the
more precise
instruments ever
created
2
Measures changes
1 / 10,000th the
width of a proton
4km interferometer
to measure
gravitational
fluctuations from
cosmic explosions
LIGO
According to Scientific American, cost $1.1 billion over last 40 years
3
Turned on in 2002Construction took
8 years
Managed by ~1k
scientists
Gravity waves
detected 2016
Value Delivery
“Despite the hype of big data, a majority of the business value
produced by data still happens in this more traditional setting, and
we would like to support these communities.”
- Szilard Pafka (Dec, 2014) announcing new
- DW/BI/Analytics Meetup
4
Bias, Variance, Heterogeneity
“Up until late last year, tracking would be done unpredictably after
almost every release.”
“We changed the way we capture that last April and again this
January.”
“We have four divisions that all do their analytics differently.”
5
Excitement Scale
6
Excitement hierarchy
7
Report generation
ML on 10k observations, 20 features
ML on 1 billion observations, 1500 features
ML on 1 million observations, 100 features
1000 node clustered computing
A/B testing
High performance computing
Econometric modeling for adtech
Deep
learning
SQL queriesLogistic regression
Off the shelf OpenCV implementation
Online multi-armed bandit
Online streaming algorithms
Commercial opportunities for quantum computing
Measures of effectiveness?
8
F1-scoreAccuracy and
precision
Area under an ROC
curve (AUC)
Bias-variance
tradeoff
Measure of effectiveness
 Return on Investment (ROI)
 Revenue savings from automation
 Lift
 Impact Factor*
 Causal Impact
 Value of information
9
Goodhart’s Law
10
When a measure becomes a target, it ceases to be a good measure
Value of Information
11
Expected Revenue
if information know=Value
(information)
Expected revenue if
Information NOT
know- - Cost of
Information
Iteration and precision
Early objectives
• Maximize conversion rate
• Send / don’t send offer
• Raise / lower budget
• Predict number of machine failures
• Find available service provider
Late objectives
• Maximize lifetime value
• Personalized offer
• Real time bid optimization
• Optimize factory environmental
controls
• Global service pairing optimization
12
Business Conversations
Optimize within the constraints of your product
Discuss opportunities with product owner
13
The Evolution of Software Engineering
Angular developer, UI/UX architect, AWS Infrastructure engineer, Spring
integration manager, DevOps engineer, Change management / continuous
integration specialist, Security engineer, Unity developer, Serverless
evangelist, mobile developer, wordpress developer, Accessibility specialist
14
Pre
1990s
1990s
2000s
DBA, VLSI engineer, Embedded systems
programmer, Front end, back end, QA
“Computer Programmer”
Scope of Data Science
15
Pre
2008
2008-
2016
2016-

Statistician, ML researcher, etc.
Data scientist, data engineer
2016 – Future ???
An arbitrary timeline
1950s
1993
1993
1995
2001
2004
2007
2010
2011
2014
2015
16
Perceptron algorithm
R first appearance
C4.5 described
False discovery rates
Weka
MapReduce paper
Scikit learn initial release
Theano
h2o launched
Spark initial release, XGBoost on github
Tensorflow
17
Data science community
 Meetups
 Events
 MOOCs
 Bootcamps
18
 Podcasts
 Blogs
 Books
DataForward Event Series
DataForward is a gathering of professionals across industries who are passionate about data
science, big data technologies, and data driven businesses. The group meets once a month at
keynote events featuring talks and presentations by industry leaders. The DataForward events
are hosted and organized by DataScience Inc, and livestreamed to audiences all over the
world.
The monthly events are dedicated to key topics facing data-driven organizations- disruptive
technologies, data-driven culture, investment trends, and insights into how existing
organizations can unlock the value from their data. To signup for our first keynote event in
August, please visit meetup.com/DataForward.
19
DataScience
20
facebook.com/datascience
twitter.com/datascienceinc
linkedin.com/company/datascience-inc
(310) 579 - 6200

Mais conteúdo relacionado

Mais procurados

User Focused Security at Netflix: Stethoscope
User Focused Security at Netflix: StethoscopeUser Focused Security at Netflix: Stethoscope
User Focused Security at Netflix: Stethoscope
Jesse Kriss
 
Effective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekEffective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a Week
Databricks
 

Mais procurados (20)

Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
 
Data Driven Decisions at Scale
Data Driven Decisions at ScaleData Driven Decisions at Scale
Data Driven Decisions at Scale
 
User Focused Security at Netflix: Stethoscope
User Focused Security at Netflix: StethoscopeUser Focused Security at Netflix: Stethoscope
User Focused Security at Netflix: Stethoscope
 
Azure Stream Analytics - Webinar
Azure Stream Analytics - WebinarAzure Stream Analytics - Webinar
Azure Stream Analytics - Webinar
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big Data
 
Azure stream analytics by Nico Jacobs
Azure stream analytics by Nico JacobsAzure stream analytics by Nico Jacobs
Azure stream analytics by Nico Jacobs
 
Effective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekEffective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a Week
 
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic StackSiscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
 
InfoTrack: Creating a single source of truth with the Elastic Stack
InfoTrack: Creating a single source of truth with the Elastic StackInfoTrack: Creating a single source of truth with the Elastic Stack
InfoTrack: Creating a single source of truth with the Elastic Stack
 
Predicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCAPredicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCA
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
 
Security Events Logging at Bell with the Elastic Stack
Security Events Logging at Bell with the Elastic StackSecurity Events Logging at Bell with the Elastic Stack
Security Events Logging at Bell with the Elastic Stack
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
The Life of an Internet of Things Electron
The Life of an Internet of Things ElectronThe Life of an Internet of Things Electron
The Life of an Internet of Things Electron
 
Accelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraAccelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & Privacera
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
Enterprise Metadata Integration
Enterprise Metadata IntegrationEnterprise Metadata Integration
Enterprise Metadata Integration
 
Machine Learning for Anomaly Detection, Time Series Modeling, and More
Machine Learning for Anomaly Detection, Time Series Modeling, and MoreMachine Learning for Anomaly Detection, Time Series Modeling, and More
Machine Learning for Anomaly Detection, Time Series Modeling, and More
 

Destaque

Física filosofía de la naturaleza
Física filosofía de la naturalezaFísica filosofía de la naturaleza
Física filosofía de la naturaleza
Johana De León
 
Colegio del centro
Colegio del centroColegio del centro
Colegio del centro
Vics_321
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
Data Con LA
 

Destaque (20)

Big Data Day LA 2016/ Big Data Track - Puree through Trillion of Clicks in Se...
Big Data Day LA 2016/ Big Data Track - Puree through Trillion of Clicks in Se...Big Data Day LA 2016/ Big Data Track - Puree through Trillion of Clicks in Se...
Big Data Day LA 2016/ Big Data Track - Puree through Trillion of Clicks in Se...
 
Física filosofía de la naturaleza
Física filosofía de la naturalezaFísica filosofía de la naturaleza
Física filosofía de la naturaleza
 
Redes sociales fb TouchMedia
Redes sociales fb TouchMediaRedes sociales fb TouchMedia
Redes sociales fb TouchMedia
 
Presentación general 2 part
Presentación general 2 partPresentación general 2 part
Presentación general 2 part
 
Chuletada de socios 3
Chuletada de socios 3Chuletada de socios 3
Chuletada de socios 3
 
Elektrificeren van de recreatievaart in Noord Holland levert veel op.
Elektrificeren van de recreatievaart in Noord Holland levert veel op.Elektrificeren van de recreatievaart in Noord Holland levert veel op.
Elektrificeren van de recreatievaart in Noord Holland levert veel op.
 
Colegio del centro
Colegio del centroColegio del centro
Colegio del centro
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
 
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...
 
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
 
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
 
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
 

Semelhante a Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscape, Kyle Polich - Principal Consulting Engineer, Datascience Inc

Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Rainer Sternfeld
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and research
kchine3
 
SKA_in_Seoul_2015_NicolasErdody v2.0
SKA_in_Seoul_2015_NicolasErdody v2.0SKA_in_Seoul_2015_NicolasErdody v2.0
SKA_in_Seoul_2015_NicolasErdody v2.0
Nicolás Erdödy
 

Semelhante a Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscape, Kyle Polich - Principal Consulting Engineer, Datascience Inc (20)

Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Totten presidio presentation feb 20 2015 pdf
Totten presidio presentation feb 20 2015 pdfTotten presidio presentation feb 20 2015 pdf
Totten presidio presentation feb 20 2015 pdf
 
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
 
Robert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelRobert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans Excel
 
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
 
Planet OS: Indexing the Real World (a lecture at the Stanford Engineering Sch...
Planet OS: Indexing the Real World (a lecture at the Stanford Engineering Sch...Planet OS: Indexing the Real World (a lecture at the Stanford Engineering Sch...
Planet OS: Indexing the Real World (a lecture at the Stanford Engineering Sch...
 
Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...
Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...
Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and research
 
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Scaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesScaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data Challenges
 
Cloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and InnovationCloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and Innovation
 
ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016
 
SKA_in_Seoul_2015_NicolasErdody v2.0
SKA_in_Seoul_2015_NicolasErdody v2.0SKA_in_Seoul_2015_NicolasErdody v2.0
SKA_in_Seoul_2015_NicolasErdody v2.0
 
Engaging earth observation in the platform economy
Engaging earth observation in the platform economyEngaging earth observation in the platform economy
Engaging earth observation in the platform economy
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
 

Mais de Data Con LA

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 

Mais de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscape, Kyle Polich - Principal Consulting Engineer, Datascience Inc

  • 1. The Evolving Data Science Landscape Kyle Polich Data Science, Inc.
  • 2. LIGO One of the most advanced metrology projects; one of the more precise instruments ever created 2 Measures changes 1 / 10,000th the width of a proton 4km interferometer to measure gravitational fluctuations from cosmic explosions
  • 3. LIGO According to Scientific American, cost $1.1 billion over last 40 years 3 Turned on in 2002Construction took 8 years Managed by ~1k scientists Gravity waves detected 2016
  • 4. Value Delivery “Despite the hype of big data, a majority of the business value produced by data still happens in this more traditional setting, and we would like to support these communities.” - Szilard Pafka (Dec, 2014) announcing new - DW/BI/Analytics Meetup 4
  • 5. Bias, Variance, Heterogeneity “Up until late last year, tracking would be done unpredictably after almost every release.” “We changed the way we capture that last April and again this January.” “We have four divisions that all do their analytics differently.” 5
  • 7. Excitement hierarchy 7 Report generation ML on 10k observations, 20 features ML on 1 billion observations, 1500 features ML on 1 million observations, 100 features 1000 node clustered computing A/B testing High performance computing Econometric modeling for adtech Deep learning SQL queriesLogistic regression Off the shelf OpenCV implementation Online multi-armed bandit Online streaming algorithms Commercial opportunities for quantum computing
  • 8. Measures of effectiveness? 8 F1-scoreAccuracy and precision Area under an ROC curve (AUC) Bias-variance tradeoff
  • 9. Measure of effectiveness  Return on Investment (ROI)  Revenue savings from automation  Lift  Impact Factor*  Causal Impact  Value of information 9
  • 10. Goodhart’s Law 10 When a measure becomes a target, it ceases to be a good measure
  • 11. Value of Information 11 Expected Revenue if information know=Value (information) Expected revenue if Information NOT know- - Cost of Information
  • 12. Iteration and precision Early objectives • Maximize conversion rate • Send / don’t send offer • Raise / lower budget • Predict number of machine failures • Find available service provider Late objectives • Maximize lifetime value • Personalized offer • Real time bid optimization • Optimize factory environmental controls • Global service pairing optimization 12
  • 13. Business Conversations Optimize within the constraints of your product Discuss opportunities with product owner 13
  • 14. The Evolution of Software Engineering Angular developer, UI/UX architect, AWS Infrastructure engineer, Spring integration manager, DevOps engineer, Change management / continuous integration specialist, Security engineer, Unity developer, Serverless evangelist, mobile developer, wordpress developer, Accessibility specialist 14 Pre 1990s 1990s 2000s DBA, VLSI engineer, Embedded systems programmer, Front end, back end, QA “Computer Programmer”
  • 15. Scope of Data Science 15 Pre 2008 2008- 2016 2016-  Statistician, ML researcher, etc. Data scientist, data engineer 2016 – Future ???
  • 16. An arbitrary timeline 1950s 1993 1993 1995 2001 2004 2007 2010 2011 2014 2015 16 Perceptron algorithm R first appearance C4.5 described False discovery rates Weka MapReduce paper Scikit learn initial release Theano h2o launched Spark initial release, XGBoost on github Tensorflow
  • 17. 17
  • 18. Data science community  Meetups  Events  MOOCs  Bootcamps 18  Podcasts  Blogs  Books
  • 19. DataForward Event Series DataForward is a gathering of professionals across industries who are passionate about data science, big data technologies, and data driven businesses. The group meets once a month at keynote events featuring talks and presentations by industry leaders. The DataForward events are hosted and organized by DataScience Inc, and livestreamed to audiences all over the world. The monthly events are dedicated to key topics facing data-driven organizations- disruptive technologies, data-driven culture, investment trends, and insights into how existing organizations can unlock the value from their data. To signup for our first keynote event in August, please visit meetup.com/DataForward. 19

Notas do Editor

  1. The impact of data science on business is undeniable, and the value it provides is growing without signs of slowing. To keep up with this rapidly evolving methodology and technology landscape, data scientists must adapt and specialize through continuous learning. This talk focuses on how they can do that in a way that maximizes the positive impact data science will have on their organization.
  2. This is equivalent to measuring the distance to the nearest star to an accuracy smaller than the width of a human hair! The truest big data problem
  3. DataScience takes on ambitious problems, but not this ambitious of costly
  4. LIGO – scientific value Business impact and value
  5. In the real world, it’s a metrologist’s worst nightmare 80% of time cleaning -> business understanding
  6. How many of the top problems does a data scientist get to solve in their career?
  7. YM mistake
  8. Impact factor – used for citations; choice differs from previous solution
  9. Lemonade stand
  10. Reference LIGO again
  11. Recommender engine vs. social network integration
  12. I’m looking forward to the day when my title is no longer data scientist
  13. How many new techniques since you got your degree? I’ll accept PR on this list, but it’s going to end up looking like next slide
  14. A need for specialization; students Astar A focus on business impact; problem and person match (soft pitch here) A need for community and continuous learning
  15. “Like TED, but for data professionals” DataForward is a gathering of professionals across industries who are passionate about data science, big data technologies, and data driven businesses. The group meets once a month at keynote events featuring talks and presentations by industry leaders. The DataForward events are hosted and organized by DataScience Inc, and livestreamed to audiences all over the world. The monthly events are dedicated to key topics facing data-driven organizations- disruptive technologies, data-driven culture, investment trends, and insights into how existing organizations can unlock the value from their data. To signup for our first keynote event in August, please visit meetup.com/DataForward.