SlideShare uma empresa Scribd logo
1 de 18
Copyright © Think Big Analytics and Neustar Inc.1
Asking the Right
Questions of your
Data
Mike Peterson
VP of Platforms and Data Architecture, Neustar
Jun 26, 2013
2 Copyright © Neustar Inc.
We have come a long way!!!
3
But where/when is the GOLD?
Unintended Consequence of Big
Data
We need to ask the right Questions
Oh, and lets remember religion
and not forget GOVERNANCE
Copyright © Neustar Inc.
Big Data Evolution Status
4
» New data platform is built – 3Tier
» Collected many Pbs of data
» Hadoop infrastructure in place for 2yrs
» Established Data Science teams
» Machine Learning is in place
» Increased technology skills
» Focused data teams
» Active in the community
Copyright © Neustar Inc.
Our Partners are still a part of our process
5 Copyright © Think Big Analytics and Neustar Inc.
» Expertise in Technologies
» Trusted partner
» Collaborative Teams
» Open source leader
» Invested in client success
» Price/performance
Some Unintended Consequences
6
» More Customer Reporting Request
» Because we suddenly have lots of customer
data available
» Meaning more work for the DW team!!!
» DR Site is more required than ever
» More data, means more critical data to protect
» Network Stress to support DR and other additional
access
» Data Governance is overwhelmed with request
» Retention Policies need to be re-thought
Copyright © Neustar Inc.
Questions
7
» Customer Driven Questions
» Easy to understand
» Subject Questions
» Discover the pivot and you have a good start
» Exploratory Questions
» Thinking of the unformed questions
» Working from the top down
» Narrowing the answer before you test all the data
Copyright © Neustar Inc.
Questions - Approaches
• Understand what manual process you want to automate:
what is currently manually predicted that could be
automated and determine if there’s any way to get training
data comprising of <input,output> pairs.
• Consider methods to augment existing data with a “pivot”
column that can be used to join. For example, geo-location
of an IP address could lead to joining with Census Data
based on zip+4.
Questions - Approaches
• Determine if your problem is one of prediction or one of
grouping (clustering). The latter is more of a task that can
lead to better understanding rather than solving a direct
business problem.
Questions - Approaches
• Determine if you are more interested in finding “interesting”
relationships among data columns rather than knowing the
columns. This is a task I’d call more of “discovery” than
prediction but the idea is to determine one column as the
output column in terms of the other columns as input.
• Doing this for all output columns can lead to “discovery”
of those correlations that are the strongest (e.g., every
time a customer buys beer at 5PM, he is likely to buy
diapers). This is more of a fishing expedition, but can
lead to unusual insights.
Impetus Approach to Questioning Data
11 Copyright © Neustar Inc.
EXISTING DATA
PROPERTY
BUSINESS
STRATEGY
CUSTOMER
PROBLEM
STATEMENTS
ANALYSIS OF
DATA PROPERTY
DISCUSSION
WITH
STAKEHOLDERS
ANALYSIS OF
PROBLEM
STATEMENT
DATA NEEDS
STATEMENT
REFINED
PROBLEM
STATEMENT
DATA ANALYTICS
PLAN
Who knew there was religion in Analytics
12
» Statistical Analysis vs. Machine Learning
» Stats people think “truth”
» Machine Learning people think “near truth”
» Truth is easy to bound
» Cost models make sense to org
» Near Truth is hard to explain and bound
» It is where the real exploration happens
» But – it can consume the Data Scientist
» Both can net real returns – and they need to co-
exist
Copyright © Neustar Inc.
13 Copyright © Neustar Inc.
GOVERNANCE
14
» Don’t forget about Governance
» Contracts
» PII
» Brand
» CPO & CISO are your friends - honestly
» Protect your CUSTOMER DATA
» It will slow you down in the beginning
» But you want your results to be reputable
» We need to get to a policy framework at some
point that is automated
Copyright © Neustar Inc.
About Impetus
» Accelerated consulting and services leader for Big Data;
Headquartered in San Jose since 1996; 1400+; Presences
in Silicon Valley, Atlanta, NYC; offices in India; Expertise
through Architects
» Pioneers in distributed software engineering with vertical
and functional expertise; Dedicated innovation labs; 200+
Big Data practitioners; 80+ dedicated to R&D
Drill
* Incoming
Question
* Problem
Landscape
* Underlying
Constraints
* Specific Goals
Assess
* Goal Driven
Hypotheses
* Data
Requirement
* Resource
Requirements
* Analysis Plan
Target
* Data Collection
* Quality
Assessment
* Cross
Validation
* Restructuring
Analyze
* Test Previous
Hypotheses
* Explore New
Hypotheses
* Test
* Quantify
Results
Recommend
* Summary of
Results
* Key Novel
Insights
* Impact Analysis
* Action Items
Data Science Approach
» Recommender Systems
» Sentiment Analysis
» Topic Identification
» Predictive Analytics
» Data Stream Analytics
Data Science Focus
Areas
Contact us at bigdata@impetus.com
Thank you
Questions?

Mais conteúdo relacionado

Semelhante a Asking the Right Questions of Your Data

Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
Databricks
 
How to Scale your Analytics in a Maturing Organization
How to Scale your Analytics in a Maturing OrganizationHow to Scale your Analytics in a Maturing Organization
How to Scale your Analytics in a Maturing Organization
Kissmetrics on SlideShare
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Cloudera, Inc.
 
Data_Harmonization_ClearStory
Data_Harmonization_ClearStoryData_Harmonization_ClearStory
Data_Harmonization_ClearStory
William Davis
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
mark madsen
 
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Sandra Fernandes
 

Semelhante a Asking the Right Questions of Your Data (20)

Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
What makes an effective data team?
What makes an effective data team?What makes an effective data team?
What makes an effective data team?
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
 
How to Scale your Analytics in a Maturing Organization
How to Scale your Analytics in a Maturing OrganizationHow to Scale your Analytics in a Maturing Organization
How to Scale your Analytics in a Maturing Organization
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
Why Big Data is Really about Small Data
Why Big Data is Really about Small DataWhy Big Data is Really about Small Data
Why Big Data is Really about Small Data
 
Data_Harmonization_ClearStory
Data_Harmonization_ClearStoryData_Harmonization_ClearStory
Data_Harmonization_ClearStory
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
How To Make The Most Out of Enterprise Data
How To Make The Most Out of Enterprise DataHow To Make The Most Out of Enterprise Data
How To Make The Most Out of Enterprise Data
 
Data and analytic strategies for developing ethical it
Data and analytic strategies for developing ethical itData and analytic strategies for developing ethical it
Data and analytic strategies for developing ethical it
 
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
 
Rapid-fire BI
Rapid-fire BIRapid-fire BI
Rapid-fire BI
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Data Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseData Science towards the Digital Enterprise
Data Science towards the Digital Enterprise
 
5 ways to get more from data science
5 ways to get more from data science5 ways to get more from data science
5 ways to get more from data science
 

Mais de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Asking the Right Questions of Your Data

  • 1. Copyright © Think Big Analytics and Neustar Inc.1 Asking the Right Questions of your Data Mike Peterson VP of Platforms and Data Architecture, Neustar Jun 26, 2013
  • 2. 2 Copyright © Neustar Inc.
  • 3. We have come a long way!!! 3 But where/when is the GOLD? Unintended Consequence of Big Data We need to ask the right Questions Oh, and lets remember religion and not forget GOVERNANCE Copyright © Neustar Inc.
  • 4. Big Data Evolution Status 4 » New data platform is built – 3Tier » Collected many Pbs of data » Hadoop infrastructure in place for 2yrs » Established Data Science teams » Machine Learning is in place » Increased technology skills » Focused data teams » Active in the community Copyright © Neustar Inc.
  • 5. Our Partners are still a part of our process 5 Copyright © Think Big Analytics and Neustar Inc. » Expertise in Technologies » Trusted partner » Collaborative Teams » Open source leader » Invested in client success » Price/performance
  • 6. Some Unintended Consequences 6 » More Customer Reporting Request » Because we suddenly have lots of customer data available » Meaning more work for the DW team!!! » DR Site is more required than ever » More data, means more critical data to protect » Network Stress to support DR and other additional access » Data Governance is overwhelmed with request » Retention Policies need to be re-thought Copyright © Neustar Inc.
  • 7. Questions 7 » Customer Driven Questions » Easy to understand » Subject Questions » Discover the pivot and you have a good start » Exploratory Questions » Thinking of the unformed questions » Working from the top down » Narrowing the answer before you test all the data Copyright © Neustar Inc.
  • 8. Questions - Approaches • Understand what manual process you want to automate: what is currently manually predicted that could be automated and determine if there’s any way to get training data comprising of <input,output> pairs. • Consider methods to augment existing data with a “pivot” column that can be used to join. For example, geo-location of an IP address could lead to joining with Census Data based on zip+4.
  • 9. Questions - Approaches • Determine if your problem is one of prediction or one of grouping (clustering). The latter is more of a task that can lead to better understanding rather than solving a direct business problem.
  • 10. Questions - Approaches • Determine if you are more interested in finding “interesting” relationships among data columns rather than knowing the columns. This is a task I’d call more of “discovery” than prediction but the idea is to determine one column as the output column in terms of the other columns as input. • Doing this for all output columns can lead to “discovery” of those correlations that are the strongest (e.g., every time a customer buys beer at 5PM, he is likely to buy diapers). This is more of a fishing expedition, but can lead to unusual insights.
  • 11. Impetus Approach to Questioning Data 11 Copyright © Neustar Inc. EXISTING DATA PROPERTY BUSINESS STRATEGY CUSTOMER PROBLEM STATEMENTS ANALYSIS OF DATA PROPERTY DISCUSSION WITH STAKEHOLDERS ANALYSIS OF PROBLEM STATEMENT DATA NEEDS STATEMENT REFINED PROBLEM STATEMENT DATA ANALYTICS PLAN
  • 12. Who knew there was religion in Analytics 12 » Statistical Analysis vs. Machine Learning » Stats people think “truth” » Machine Learning people think “near truth” » Truth is easy to bound » Cost models make sense to org » Near Truth is hard to explain and bound » It is where the real exploration happens » But – it can consume the Data Scientist » Both can net real returns – and they need to co- exist Copyright © Neustar Inc.
  • 13. 13 Copyright © Neustar Inc.
  • 14. GOVERNANCE 14 » Don’t forget about Governance » Contracts » PII » Brand » CPO & CISO are your friends - honestly » Protect your CUSTOMER DATA » It will slow you down in the beginning » But you want your results to be reputable » We need to get to a policy framework at some point that is automated Copyright © Neustar Inc.
  • 15. About Impetus » Accelerated consulting and services leader for Big Data; Headquartered in San Jose since 1996; 1400+; Presences in Silicon Valley, Atlanta, NYC; offices in India; Expertise through Architects » Pioneers in distributed software engineering with vertical and functional expertise; Dedicated innovation labs; 200+ Big Data practitioners; 80+ dedicated to R&D
  • 16. Drill * Incoming Question * Problem Landscape * Underlying Constraints * Specific Goals Assess * Goal Driven Hypotheses * Data Requirement * Resource Requirements * Analysis Plan Target * Data Collection * Quality Assessment * Cross Validation * Restructuring Analyze * Test Previous Hypotheses * Explore New Hypotheses * Test * Quantify Results Recommend * Summary of Results * Key Novel Insights * Impact Analysis * Action Items Data Science Approach
  • 17. » Recommender Systems » Sentiment Analysis » Topic Identification » Predictive Analytics » Data Stream Analytics Data Science Focus Areas Contact us at bigdata@impetus.com

Notas do Editor

  1. Sometimes clustering could be enough to solve a business problem
  2.   We must understand the columns well before understanding the relationships
  3. Data Science results lead to better database marketing – churn analytics, upselling, cross selling, RFM/LTVThese are some of the areas where we’ve used data science and machine learning to come up w/ some interesting models.