SlideShare uma empresa Scribd logo
1 de 10
Data Science
Building a Business or a Business
Practice in Data Science
Data Science is…
• An art of mining large quantities of data
• An art of combining disparate data sources and blending
public data with corporate data
• Forming hypothesis to solve hard problems
• Building models to solve current problems and provide
forecast
• Anticipate future events (based on historical data) and
provide correcting actions (yield curve in finance, fraud
detection in banking, storms effect on travel, operational
downtime)
• Automating the analytics processes to reduce time to
solve future problems
A Data Scientists has following minimum
set of core skills…
• Problem solver
• Creative and can form an hypothesis
• Is able to program with large quantities of data
• Can think of bringing data from appropriate data
source and can bring and blend data
• Stats/math/analytics background to build models and
write algorithms
• Can quickly develop domain knowledge to understand
key factors which influence the performance of a
business problem
Roles data scientists play…
• Problem description
• Hypothesis formation
• Data assembly, ETL and data integration role
• Model development (pattern recognition or any other
model to provide answers) and training
• Data visualization
• AB Testing
• Propose solutions and/or new business ideas
The balance between human vs. machines…
• Current: humans play a significant role in the
process – ETL, joins, models, visualization, machine-
learning and then repeating and recycling this process
as the problem changes
• Tomorrow: a big portion of the food-chain can be
automated via machine learning so machines can take
over and data-scientists can be freed up to build more
algorithms/models
• The process can be automated so repeating/recycling
can be cheaper and less time consuming
The Data Science pipeline currently looks
like…
• From Data to Insights – this entire process requires
mundane skills (IT), specialized skills (data-scientist)
and elements of human psychology to present the
right information at the right time
• The data needs to be discovered, assembled,
semantically enriched and anchored to a business
logic – this task can be be automated through
machine learning (a set of harmonized tools with AI)
to free up scarce resources
The Data Science pipeline currently looks like
(cont’d)…
• Specialized skills today get addressed by open source
technologies such as R and expensive solutions like
Matlab and SPSS.
• Very few software solution carefully introduce human
interface to make their application consumable
without requiring customer training (i.e. not Google
easy)
The pipeline needs complete rethinking…
• Automate mundane tasks that IT gets tagged with
• Discover data automatically
• Detach business logic from data models
• Make blending public data with corporate data a
second nature
• Free up data-scientists so that they can build
analytics micro-apps for a domain or a sub-domain
• data-science need not be a niche (or a specialized
category), it should appeal to the masses
(democratization of data and brining insights to
everyone without needed specialized skills)
Opportunity in Data Science…
• Understand the value chain (IT + Business Analyst +
Data Scientists + Business Users)
• Provide something for everyone - a single integrated
platform (ETL + Data Integration + Predictive modeling
+ in-memory computing + storage) for data scientists
so that they can build standard analytical apps and
move away from proprietary models and standardize
(which also helps IT)
• Analytical apps on this platform (think of them as
rapid deployment solutions) for business users
Opportunity in Data Science (cont’d)…
• Help business analysts write basic models (churn,
segmentation, correlation etc.) without requiring
advanced skills
• Work with consulting companies so that they can
consult and build apps on your platform for
companies that do not have data scientists on their
pay-roll (like Mu-Sigma and Opera Solutions)
• Partner with public data provider (to help clients),
consulting companies (for rapid solutions),
R/Python/ML communities (to grab mind-share and
show thought-leadership)

Mais conteúdo relacionado

Destaque

Destaque (6)

CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodology
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
 
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
 

Mais de Jitender Aswani

SAP HANA Big Customer Intelligence Concept App
SAP HANA Big Customer Intelligence Concept AppSAP HANA Big Customer Intelligence Concept App
SAP HANA Big Customer Intelligence Concept App
Jitender Aswani
 
The 20 Most Important Questions In Business
The 20 Most Important Questions In Business The 20 Most Important Questions In Business
The 20 Most Important Questions In Business
Jitender Aswani
 
Tableau Software - High Level Due-Diligence March 2011
Tableau Software - High Level Due-Diligence March 2011Tableau Software - High Level Due-Diligence March 2011
Tableau Software - High Level Due-Diligence March 2011
Jitender Aswani
 
Pervasive Location Analytics − The Next Frontier to Fall In the Enterprise So...
Pervasive Location Analytics − The Next Frontier to Fall In the Enterprise So...Pervasive Location Analytics − The Next Frontier to Fall In the Enterprise So...
Pervasive Location Analytics − The Next Frontier to Fall In the Enterprise So...
Jitender Aswani
 
Pervasive Location Analytics
Pervasive Location Analytics Pervasive Location Analytics
Pervasive Location Analytics
Jitender Aswani
 

Mais de Jitender Aswani (10)

SAP HANA Big Customer Intelligence Concept App
SAP HANA Big Customer Intelligence Concept AppSAP HANA Big Customer Intelligence Concept App
SAP HANA Big Customer Intelligence Concept App
 
The 20 Most Important Questions In Business
The 20 Most Important Questions In Business The 20 Most Important Questions In Business
The 20 Most Important Questions In Business
 
Static Nature of Ticketing In Sports For Nearly a Century
Static Nature of Ticketing In Sports For Nearly a CenturyStatic Nature of Ticketing In Sports For Nearly a Century
Static Nature of Ticketing In Sports For Nearly a Century
 
The Disney Experience - Every Customer's "I Wish"
The Disney Experience - Every Customer's  "I Wish"The Disney Experience - Every Customer's  "I Wish"
The Disney Experience - Every Customer's "I Wish"
 
Future of Visual Analytics
Future of Visual AnalyticsFuture of Visual Analytics
Future of Visual Analytics
 
Tableau Software - High Level Due-Diligence March 2011
Tableau Software - High Level Due-Diligence March 2011Tableau Software - High Level Due-Diligence March 2011
Tableau Software - High Level Due-Diligence March 2011
 
Mobile Analytics (Mobile BI) - A Game Changer
Mobile Analytics (Mobile BI) - A Game Changer Mobile Analytics (Mobile BI) - A Game Changer
Mobile Analytics (Mobile BI) - A Game Changer
 
Location Analytics - Session Evaluation Results
Location Analytics  - Session Evaluation ResultsLocation Analytics  - Session Evaluation Results
Location Analytics - Session Evaluation Results
 
Pervasive Location Analytics − The Next Frontier to Fall In the Enterprise So...
Pervasive Location Analytics − The Next Frontier to Fall In the Enterprise So...Pervasive Location Analytics − The Next Frontier to Fall In the Enterprise So...
Pervasive Location Analytics − The Next Frontier to Fall In the Enterprise So...
 
Pervasive Location Analytics
Pervasive Location Analytics Pervasive Location Analytics
Pervasive Location Analytics
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Building a Business Practice in Data Science

  • 1. Data Science Building a Business or a Business Practice in Data Science
  • 2. Data Science is… • An art of mining large quantities of data • An art of combining disparate data sources and blending public data with corporate data • Forming hypothesis to solve hard problems • Building models to solve current problems and provide forecast • Anticipate future events (based on historical data) and provide correcting actions (yield curve in finance, fraud detection in banking, storms effect on travel, operational downtime) • Automating the analytics processes to reduce time to solve future problems
  • 3. A Data Scientists has following minimum set of core skills… • Problem solver • Creative and can form an hypothesis • Is able to program with large quantities of data • Can think of bringing data from appropriate data source and can bring and blend data • Stats/math/analytics background to build models and write algorithms • Can quickly develop domain knowledge to understand key factors which influence the performance of a business problem
  • 4. Roles data scientists play… • Problem description • Hypothesis formation • Data assembly, ETL and data integration role • Model development (pattern recognition or any other model to provide answers) and training • Data visualization • AB Testing • Propose solutions and/or new business ideas
  • 5. The balance between human vs. machines… • Current: humans play a significant role in the process – ETL, joins, models, visualization, machine- learning and then repeating and recycling this process as the problem changes • Tomorrow: a big portion of the food-chain can be automated via machine learning so machines can take over and data-scientists can be freed up to build more algorithms/models • The process can be automated so repeating/recycling can be cheaper and less time consuming
  • 6. The Data Science pipeline currently looks like… • From Data to Insights – this entire process requires mundane skills (IT), specialized skills (data-scientist) and elements of human psychology to present the right information at the right time • The data needs to be discovered, assembled, semantically enriched and anchored to a business logic – this task can be be automated through machine learning (a set of harmonized tools with AI) to free up scarce resources
  • 7. The Data Science pipeline currently looks like (cont’d)… • Specialized skills today get addressed by open source technologies such as R and expensive solutions like Matlab and SPSS. • Very few software solution carefully introduce human interface to make their application consumable without requiring customer training (i.e. not Google easy)
  • 8. The pipeline needs complete rethinking… • Automate mundane tasks that IT gets tagged with • Discover data automatically • Detach business logic from data models • Make blending public data with corporate data a second nature • Free up data-scientists so that they can build analytics micro-apps for a domain or a sub-domain • data-science need not be a niche (or a specialized category), it should appeal to the masses (democratization of data and brining insights to everyone without needed specialized skills)
  • 9. Opportunity in Data Science… • Understand the value chain (IT + Business Analyst + Data Scientists + Business Users) • Provide something for everyone - a single integrated platform (ETL + Data Integration + Predictive modeling + in-memory computing + storage) for data scientists so that they can build standard analytical apps and move away from proprietary models and standardize (which also helps IT) • Analytical apps on this platform (think of them as rapid deployment solutions) for business users
  • 10. Opportunity in Data Science (cont’d)… • Help business analysts write basic models (churn, segmentation, correlation etc.) without requiring advanced skills • Work with consulting companies so that they can consult and build apps on your platform for companies that do not have data scientists on their pay-roll (like Mu-Sigma and Opera Solutions) • Partner with public data provider (to help clients), consulting companies (for rapid solutions), R/Python/ML communities (to grab mind-share and show thought-leadership)