SlideShare a Scribd company logo
1 of 37
Easier, Faster, Smarter
Data Science without the Scientist
Matt Schumpert
10.30.13

© 2013 Datameer, Inc. All rights reserved.
Agenda
Background
First principles
Mind-blowing fun fact
Current state & challenges
Suggestions for making life easier
Demo!

© 2013 Datameer, Inc. All rights reserved.
Me
Enterprise infrastructure software guy
Focused on abstraction and customers
Likes simplicity

© 2013 Datameer, Inc. All rights reserved.
A favorite example...
Buffered Web Services:
“When a buffered operation is invoked by a client, the method operation goes on a JMS queue and WebLogic
Server deals with it asynchronously by transparently creating a Message Driven Bean to consume the message.
As with Web Service reliable messaging, if WebLogic Server goes down while the method invocation is still in the
queue, it will be dealt with as soon as WebLogic Server is restarted. When a client invokes the buffered Web
Service, the client does not wait for a response from the invoke, and the execution of the client can continue”

© 2013 Datameer, Inc. All rights reserved.
1. First Principles
First Principles from an Expert
Instrument everything
Invest in infrastructure
Put all your data in one place
Data first, questions later
Keep raw data forever
Let everyone party on the data
Produce tools to support the whole lifecycle
- Jeff Hammerbacher
© 2013 Datameer, Inc. All rights reserved.
2. Mind-boggling fun fact
190,000 unfilled data
scientist jobs by 2018

-McKinsey
Signal-to-Noise Ratio is Dropping!
3. Current state + challenges
Hallmarks of Traditional Analytics
Esoteric skills
Long cycle times
Low transparency
Data & application silos
Mired in data prep
Sampling (guesstimation)
Expensive!
Extremely valuable work products
© 2013 Datameer, Inc. All rights reserved.
Current Recipe:
Pull historical data
Sample
Cleanse / Pre-process
Design / implement model
Train
Hand-code / Integrate
Deploy
Fine-Tune, rinse and repeat
© 2013 Datameer, Inc. All rights reserved.
Science != Everyday Decisions
There must be a better
way!
Apply traditional tools to big data?

SAS

R

Mahout

Expensive
Not Scalable
Silo’ed

Requires Coding
Retraining
Clunky Architecture

Coding Required
Immature
Limited Support

© 2013 Datameer, Inc. All rights reserved.
And what about the rest
of the (big data) story?
Big Data Analytics is NOT (just):
A sexy new visualization tool
Machine learning / Predictive analytics
Data science
Hadoop
The data warehousing movie replayed

© 2013 Datameer, Inc. All rights reserved.
Big Data Analytics IS:
A granular, complete and current understanding
of your operations and customers
Answering questions at the speed of business
Relevancy in all customer interactions
Closed-loop decisioning that’s data-driven
Managing data through a lifecycle
© 2013 Datameer, Inc. All rights reserved.
The Big Data Analytics Lifecycle
Prepare and
Analyze
Analyze

Create your
Integrate
hypothesis

Visualize
Visualize

Act on insight and
measure ROI
Deploy

© 2013 Datameer, Inc. All rights reserved.
A lesson from data warehousing / BI
traditional / schema-on-write:
slow

static

complex

agile / schema-on-read:
fast

dynamic

simple

Source: TDWI
© 2013 Datameer, Inc. All rights reserved.
Don’t rebuild Rome... again!!

© 2013 Datameer, Inc. All rights reserved.
There must be a better
way!
4. Making life easier
How (without army):
Speak the language of the business
Generate (don’t write) code
Simplify data integration and preparation
Move the computation (analytics) to the data

© 2013 Datameer, Inc. All rights reserved.
Esoteric Language == Obscurity
K-Means

CART

Mutual Information

Matrix Factorization
Random Forest?

Logistical Regression

Support Vector Machine??

© 2013 Datameer, Inc. All rights reserved.
Algorithms can be straightforward!

© 2013 Datameer, Inc. All rights reserved.
Clustering

© 2013 Datameer, Inc. All rights reserved.
Column Dependencies

© 2013 Datameer, Inc. All rights reserved.
Decision Trees

© 2013 Datameer, Inc. All rights reserved.
Recommendations

© 2013 Datameer, Inc. All rights reserved.
Example:
Fraud Investigation
Sales Conversion
DEMO
Data Wrangling
DEMO
© 2013 Datameer, Inc. All rights reserved.
@Datameer

More Related Content

What's hot

Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...
Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...
Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...Brian Lalancette
 
AOS - Cloud Solutions
AOS - Cloud SolutionsAOS - Cloud Solutions
AOS - Cloud SolutionsNGINX at F5
 
Lean Enterprise Finding Your Innovation Focus AWS Summit SG 2017
Lean Enterprise Finding Your Innovation Focus  AWS Summit SG 2017Lean Enterprise Finding Your Innovation Focus  AWS Summit SG 2017
Lean Enterprise Finding Your Innovation Focus AWS Summit SG 2017Amazon Web Services
 
CEQIT Company Profile
CEQIT Company ProfileCEQIT Company Profile
CEQIT Company ProfileJonathan Ang
 
All analytics assets, one launchpad
All analytics assets, one launchpadAll analytics assets, one launchpad
All analytics assets, one launchpadRobert Hankey
 
What is managed IT service?
What is managed IT service?What is managed IT service?
What is managed IT service?supportnerds
 
Using Netsuite For Your Distribution Company.
Using Netsuite For Your Distribution Company.Using Netsuite For Your Distribution Company.
Using Netsuite For Your Distribution Company.Your Virtual CTO
 
AWS re:Invent 2017 | CloudHealth Tech Session
AWS re:Invent 2017 |  CloudHealth Tech SessionAWS re:Invent 2017 |  CloudHealth Tech Session
AWS re:Invent 2017 | CloudHealth Tech SessionCloudHealth by VMware
 
Critical data center move case study
Critical data center move case study Critical data center move case study
Critical data center move case study NinthDimension
 
Dun & Bradstreet Business Information Solutions
Dun & Bradstreet Business Information SolutionsDun & Bradstreet Business Information Solutions
Dun & Bradstreet Business Information SolutionsAmazon Web Services
 
Instacarma Portfolio
Instacarma PortfolioInstacarma Portfolio
Instacarma PortfolioInsta Crama
 
Freeing Minds - Reduce waste, improve efficiency
Freeing Minds - Reduce waste, improve efficiencyFreeing Minds - Reduce waste, improve efficiency
Freeing Minds - Reduce waste, improve efficiencySolarwinds N-able
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_WebinarSean Spediacci
 
Full-Service NetSuite Team: Implementation, Integration, Training & Support
Full-Service NetSuite Team: Implementation, Integration, Training & SupportFull-Service NetSuite Team: Implementation, Integration, Training & Support
Full-Service NetSuite Team: Implementation, Integration, Training & SupportProtelo, Inc.
 
Learn NetSuite: Top NetSuite Training Resources For Self-Teaching
Learn NetSuite: Top NetSuite Training Resources For Self-TeachingLearn NetSuite: Top NetSuite Training Resources For Self-Teaching
Learn NetSuite: Top NetSuite Training Resources For Self-TeachingProtelo, Inc.
 

What's hot (20)

Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...
Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...
Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...
 
Profile
ProfileProfile
Profile
 
AOS - Cloud Solutions
AOS - Cloud SolutionsAOS - Cloud Solutions
AOS - Cloud Solutions
 
Lean Enterprise Finding Your Innovation Focus AWS Summit SG 2017
Lean Enterprise Finding Your Innovation Focus  AWS Summit SG 2017Lean Enterprise Finding Your Innovation Focus  AWS Summit SG 2017
Lean Enterprise Finding Your Innovation Focus AWS Summit SG 2017
 
Corporate Profile
Corporate ProfileCorporate Profile
Corporate Profile
 
CEQIT Company Profile
CEQIT Company ProfileCEQIT Company Profile
CEQIT Company Profile
 
Cloud for-startup
Cloud for-startupCloud for-startup
Cloud for-startup
 
All analytics assets, one launchpad
All analytics assets, one launchpadAll analytics assets, one launchpad
All analytics assets, one launchpad
 
What is managed IT service?
What is managed IT service?What is managed IT service?
What is managed IT service?
 
Using Netsuite For Your Distribution Company.
Using Netsuite For Your Distribution Company.Using Netsuite For Your Distribution Company.
Using Netsuite For Your Distribution Company.
 
AWS re:Invent 2017 | CloudHealth Tech Session
AWS re:Invent 2017 |  CloudHealth Tech SessionAWS re:Invent 2017 |  CloudHealth Tech Session
AWS re:Invent 2017 | CloudHealth Tech Session
 
Critical data center move case study
Critical data center move case study Critical data center move case study
Critical data center move case study
 
Softchoice overview
Softchoice overviewSoftchoice overview
Softchoice overview
 
Dun & Bradstreet Business Information Solutions
Dun & Bradstreet Business Information SolutionsDun & Bradstreet Business Information Solutions
Dun & Bradstreet Business Information Solutions
 
Instacarma Portfolio
Instacarma PortfolioInstacarma Portfolio
Instacarma Portfolio
 
Freeing Minds - Reduce waste, improve efficiency
Freeing Minds - Reduce waste, improve efficiencyFreeing Minds - Reduce waste, improve efficiency
Freeing Minds - Reduce waste, improve efficiency
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_Webinar
 
Full-Service NetSuite Team: Implementation, Integration, Training & Support
Full-Service NetSuite Team: Implementation, Integration, Training & SupportFull-Service NetSuite Team: Implementation, Integration, Training & Support
Full-Service NetSuite Team: Implementation, Integration, Training & Support
 
Moogilu StartupKit
Moogilu StartupKitMoogilu StartupKit
Moogilu StartupKit
 
Learn NetSuite: Top NetSuite Training Resources For Self-Teaching
Learn NetSuite: Top NetSuite Training Resources For Self-TeachingLearn NetSuite: Top NetSuite Training Resources For Self-Teaching
Learn NetSuite: Top NetSuite Training Resources For Self-Teaching
 

Similar to How to do Data Science Without the Scientist

The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyCloudera, Inc.
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on HadoopDatameer
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarDatameer
 
The Journey to Success with Big Data
The Journey to Success with Big DataThe Journey to Success with Big Data
The Journey to Success with Big DataCloudera, Inc.
 
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessWhere the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessInside Analysis
 
Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Guido Schmutz
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServicePoornima Vijayashanker
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data SnapLogic
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
The Automotive Journey Into the Cloud
The Automotive Journey Into the CloudThe Automotive Journey Into the Cloud
The Automotive Journey Into the CloudEmtec Inc.
 
The Automotive Journey Into the Cloud
The Automotive Journey Into the CloudThe Automotive Journey Into the Cloud
The Automotive Journey Into the CloudKim Pike
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsightsWilfried Hoge
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini
 
Iasa Architect responsibilities in the cloud
Iasa Architect responsibilities in the cloudIasa Architect responsibilities in the cloud
Iasa Architect responsibilities in the cloudiasaglobal
 
RoMT - Part 2 Marketing Technology Webinar
RoMT - Part 2 Marketing Technology WebinarRoMT - Part 2 Marketing Technology Webinar
RoMT - Part 2 Marketing Technology WebinarSmart Insights
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeCloudera, Inc.
 
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...Dario Mangano
 
Microsoft Whitepaper: Running Your Business in the Cloud
Microsoft Whitepaper: Running Your Business in the CloudMicrosoft Whitepaper: Running Your Business in the Cloud
Microsoft Whitepaper: Running Your Business in the CloudDWP Information Architects Inc.
 
Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Cloudera, Inc.
 

Similar to How to do Data Science Without the Scientist (20)

The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data Journey
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on Hadoop
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics Webinar
 
The Journey to Success with Big Data
The Journey to Success with Big DataThe Journey to Success with Big Data
The Journey to Success with Big Data
 
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessWhere the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information Access
 
Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web Service
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
The Automotive Journey Into the Cloud
The Automotive Journey Into the CloudThe Automotive Journey Into the Cloud
The Automotive Journey Into the Cloud
 
The Automotive Journey Into the Cloud
The Automotive Journey Into the CloudThe Automotive Journey Into the Cloud
The Automotive Journey Into the Cloud
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with Cloudera
 
Iasa Architect responsibilities in the cloud
Iasa Architect responsibilities in the cloudIasa Architect responsibilities in the cloud
Iasa Architect responsibilities in the cloud
 
RoMT - Part 2 Marketing Technology Webinar
RoMT - Part 2 Marketing Technology WebinarRoMT - Part 2 Marketing Technology Webinar
RoMT - Part 2 Marketing Technology Webinar
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
 
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
 
Microsoft Whitepaper: Running Your Business in the Cloud
Microsoft Whitepaper: Running Your Business in the CloudMicrosoft Whitepaper: Running Your Business in the Cloud
Microsoft Whitepaper: Running Your Business in the Cloud
 
Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17
 

More from Datameer

Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...Datameer
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarDatameer
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Datameer
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Datameer
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?Datameer
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarDatameer
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisDatameer
 
Customer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsCustomer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsDatameer
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? Datameer
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Datameer
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseDatameer
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataDatameer
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerDatameer
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataDatameer
 

More from Datameer (19)

Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics Webinar
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of Analysis
 
Customer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsCustomer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data Analytics
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use Case
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big Data
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by Datameer
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
 

Recently uploaded

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 

Recently uploaded (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

How to do Data Science Without the Scientist

  • 2. Data Science without the Scientist Matt Schumpert 10.30.13 © 2013 Datameer, Inc. All rights reserved.
  • 3. Agenda Background First principles Mind-blowing fun fact Current state & challenges Suggestions for making life easier Demo! © 2013 Datameer, Inc. All rights reserved.
  • 4. Me Enterprise infrastructure software guy Focused on abstraction and customers Likes simplicity © 2013 Datameer, Inc. All rights reserved.
  • 5. A favorite example... Buffered Web Services: “When a buffered operation is invoked by a client, the method operation goes on a JMS queue and WebLogic Server deals with it asynchronously by transparently creating a Message Driven Bean to consume the message. As with Web Service reliable messaging, if WebLogic Server goes down while the method invocation is still in the queue, it will be dealt with as soon as WebLogic Server is restarted. When a client invokes the buffered Web Service, the client does not wait for a response from the invoke, and the execution of the client can continue” © 2013 Datameer, Inc. All rights reserved.
  • 7. First Principles from an Expert Instrument everything Invest in infrastructure Put all your data in one place Data first, questions later Keep raw data forever Let everyone party on the data Produce tools to support the whole lifecycle - Jeff Hammerbacher © 2013 Datameer, Inc. All rights reserved.
  • 9. 190,000 unfilled data scientist jobs by 2018 -McKinsey
  • 11. 3. Current state + challenges
  • 12. Hallmarks of Traditional Analytics Esoteric skills Long cycle times Low transparency Data & application silos Mired in data prep Sampling (guesstimation) Expensive! Extremely valuable work products © 2013 Datameer, Inc. All rights reserved.
  • 13. Current Recipe: Pull historical data Sample Cleanse / Pre-process Design / implement model Train Hand-code / Integrate Deploy Fine-Tune, rinse and repeat © 2013 Datameer, Inc. All rights reserved.
  • 14. Science != Everyday Decisions
  • 15. There must be a better way!
  • 16. Apply traditional tools to big data? SAS R Mahout Expensive Not Scalable Silo’ed Requires Coding Retraining Clunky Architecture Coding Required Immature Limited Support © 2013 Datameer, Inc. All rights reserved.
  • 17. And what about the rest of the (big data) story?
  • 18. Big Data Analytics is NOT (just): A sexy new visualization tool Machine learning / Predictive analytics Data science Hadoop The data warehousing movie replayed © 2013 Datameer, Inc. All rights reserved.
  • 19. Big Data Analytics IS: A granular, complete and current understanding of your operations and customers Answering questions at the speed of business Relevancy in all customer interactions Closed-loop decisioning that’s data-driven Managing data through a lifecycle © 2013 Datameer, Inc. All rights reserved.
  • 20. The Big Data Analytics Lifecycle Prepare and Analyze Analyze Create your Integrate hypothesis Visualize Visualize Act on insight and measure ROI Deploy © 2013 Datameer, Inc. All rights reserved.
  • 21. A lesson from data warehousing / BI traditional / schema-on-write: slow static complex agile / schema-on-read: fast dynamic simple Source: TDWI © 2013 Datameer, Inc. All rights reserved.
  • 22. Don’t rebuild Rome... again!! © 2013 Datameer, Inc. All rights reserved.
  • 23. There must be a better way!
  • 24. 4. Making life easier
  • 25. How (without army): Speak the language of the business Generate (don’t write) code Simplify data integration and preparation Move the computation (analytics) to the data © 2013 Datameer, Inc. All rights reserved.
  • 26. Esoteric Language == Obscurity K-Means CART Mutual Information Matrix Factorization Random Forest? Logistical Regression Support Vector Machine?? © 2013 Datameer, Inc. All rights reserved.
  • 27. Algorithms can be straightforward! © 2013 Datameer, Inc. All rights reserved.
  • 28. Clustering © 2013 Datameer, Inc. All rights reserved.
  • 29. Column Dependencies © 2013 Datameer, Inc. All rights reserved.
  • 30. Decision Trees © 2013 Datameer, Inc. All rights reserved.
  • 31. Recommendations © 2013 Datameer, Inc. All rights reserved.
  • 33. DEMO
  • 35. DEMO
  • 36. © 2013 Datameer, Inc. All rights reserved.