SlideShare uma empresa Scribd logo
1 de 29
A Day of
Empowerment

Building Predictive Analytics on
Big Data Platforms
1. Opportunity: Big Data
2. Demystifying Predictive Analytics
3. Taking advantage of combined power
Striving for an
“unfair”
competitive advantage
Old Days
New Days
Big Data
could be looking
like rubbish
Until
you
find out
the use
of it
“Data are becoming the new raw
material of business”
- Craig Mundie, head of research and strategy, Microsoft
Modeling true risk

Network data analysis to
predict failure

Customer churn analysis

Threat analysis

Recommendations

Feature Usage analysis

Ad targeting

…
Collect and
Store

• Complex data (text
files, audio, video, images, …)
• Multiple sources
• Lots of data

Process

• Batch processing
• Parallel execution
• Cluster solution

Analyze

•
•
•
•
•

Simple visualization (reports, dashboard)
Text mining
Sentiment analysis
Prediction models
Collaborative filtering
Event sources (Log files, Windows Event Log, WMI, SNMP, database, etc.)

Event Storage

Event Aggregation and
Transformation

Event Transport

Event Serialization and
Archiving

Event Processing
and
Analytics

Presentation

Query
Engine

Interactive
Search

User

Full-text
Search engine

Event DB

Rules
Engine

Reports and
Dashboards

Full-text
Index

Predictive
Analytics

Alerts
Visualization

E-mail, SMS, SNMP, etc.

Operational Management Tools

Event Ingestion
Event sources (Log files, Windows Event Log, WMI, SNMP, database, etc.)

Event Storage

Event Transport

Event Aggregation and

Apache Flume Transformation

Event Serialization and
Archiving

Protobuf, Avro, Thrif
t, MessagePack

Event Processing
and
Analytics

Presentation

Query
Engine

Impala

Interactive
Search
Custom

User

Full-text

Solr, ElasticSe
Search engine
arch

Full-text

Event DB
HDFS, Hbase, Cas Index
sandra

Rules
Engine

Drools

Reports and
JasperSoft,
Dashboards
Tableau

Predictive
Analytics

R

Alerts
Visualization
Custom

E-mail, SMS, SNMP, etc.

Operational Management Tools

Event Ingestion

Cloudera
Manager, Apache
Ambari
“The idea that the future is
unpredictable is undermined every
day by the ease with which the
past is explained”
― Daniel Kahneman, Thinking, Fast and Slow
More data is
available for
companies

Storage
technologies
allow to store
and operate it

Advanced
analytics could
be applied to
this new data to
achieve
competitive
advantage
Descriptive

Diagnostic

Predictive

Prescriptive

What happened?

Why did it
happen?

What is going to
happen?

What should we
do about that?

Hindsight

Insight

Foresight
Senior
(Executive)
Management

Ambiguity
The goals to be achieved or the problem to be solved is unclear
Alternatives are difficult to define
Information about outcomes is unavailable.

Uncertainty
Middle
Management

Managers know which goals they wish to achieve.
Information about alternatives and future events is incomplete.

Risk
Junior (Line)
Management

A decision has clear goals and good information is available, but the
future outcomes associated with each alternative are subject to chance.

Certainty
All of the information the decision maker needs is fully available
Define objective

• Increase customer
satisfaction level
• Identify
prospective
customers
• Identify crossselling
opportunities
• Decrease time to
market
• Decrease costs of
marketing
campaigns

Identify
data sets

Design the
model

• Historical data on • Classification
model for Internet
customers from
users defining
CRM system
what one is
• Geographical
interested in
location data
• Smartphone data • Adaptive control
models for
• Social network
managing IT and
data
network
• Text data from the
infrastructure
Internet pages
• Probabilistic
• Image data from
model for defining
the medical
credit worthiness
sources

Design the
solution

• Data storage type
• Logical database
design
• Availability and
scalability of the
solution
• Integration into
corporate
information
environment
• Solution
deployment
model

Implement
the solution

• Add new
functionality to
the existing
corporate BI
platform
• Implement new BI
solution
• Enrich existing
business system
(CRM, ERP) with
the predictive
analytics
functionality
Business
Tasks

Model Family

Algorithms

• Define prospective
customers
• Define traffic jams in
the city
• Recommend
restaurants and menus
• Adjust UI to the
particular user
• Classify body part on
X-Ray image

• Define market
niche
• Define influencers
in the social
networks
• Define similar
customers or
projects in
portfolio
• Define informal
groups in the
organization

• Define fraud bank
transaction
• Define network
intrusion attempts
• Provide automatic
aircraft engine
testing
• Provide automatic IT
infrastructure
monitoring
• Provide clinical test
analysis

• Define the best
price for the goods
or services to
maximize profits
• Define best working
schedule for the
store
• Define best amount
of production
• Define best
business rules

Classification

Clustering

Anomaly Detection

Optimization

• Naïve Bayes
• Logistic regression
• Support Vector
Machines
• Neural Networks

• K-Means
• K nearest
neighbor
• Self-organized
maps
• Mixture of
Gaussians

• Mixture of Gaussians
• Self-learning
anomaly detection

•
•
•
•
•

Gradient descent
Simplex method
Newton’s method
Normal equations
Genetic algorithms
Google to Buy Waze
for $1.3 Billion
Xerox plans to clear
traffic on I-10

The promise of better
data has MetLife investing
$300M in new tech

Gracenote did a whole
business on recommending
music

Obama’s data scientists built
a volunteer army on Facebook
Description:
Cloud-based service for providing more
accurate estimates of the credit
worthiness (loan scoring) using publicly
available data from social networks.
Service is oriented to be used by banks.

Technologies:






Amazon EC2
MySQL
SAP HANA
R
JAVA

Credit Score
Facebook

Twitter

LinkedIn
API

Processing

Preprocessing

MySQL

(data filtering,
data cleansing)

SAP HANA

Credit scoring API

(scoring model)
Description:
Computer aid diagnostic
system that can
recognize human body
part on X-Ray image and
detect broken or
fractured bones

X-Ray Image

Technologies:






Matlab/Octave
Python
PyBrain
NumPy
SciPy

Analytical Engine

This is a hand.
Broken bone
detected
Technology Expertise
Services
Big Data and NoSQL

Data Warehouse

Data Integration

BI Platforms
Big Data Analytics
Predictive Analytics
Data Science Service
Data Integration
Data Warehousing

Data Visualization and Analysis
Building Predictive Analytics on Big Data Platforms

Mais conteúdo relacionado

Mais procurados

Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad Richeson
SocietyConsulting
 

Mais procurados (20)

From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraFrom Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
 
Mohammed AL Madhani
Mohammed AL MadhaniMohammed AL Madhani
Mohammed AL Madhani
 
Simplify Your Analytics Strategy
Simplify Your Analytics StrategySimplify Your Analytics Strategy
Simplify Your Analytics Strategy
 
Data Science in Digital Marketing - Forest Cassidy, LeadFerret
Data Science in Digital Marketing - Forest Cassidy, LeadFerretData Science in Digital Marketing - Forest Cassidy, LeadFerret
Data Science in Digital Marketing - Forest Cassidy, LeadFerret
 
Unit 4 Advanced Data Analytics
Unit 4 Advanced Data AnalyticsUnit 4 Advanced Data Analytics
Unit 4 Advanced Data Analytics
 
Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad Richeson
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Analytics for actuaries cia
Analytics for actuaries ciaAnalytics for actuaries cia
Analytics for actuaries cia
 
Hiring and Developing Analytics Talent in the CPG and Retail Industry - Mohi...
 Hiring and Developing Analytics Talent in the CPG and Retail Industry - Mohi... Hiring and Developing Analytics Talent in the CPG and Retail Industry - Mohi...
Hiring and Developing Analytics Talent in the CPG and Retail Industry - Mohi...
 
Integrate Your Data Science & Omni-channel Strategy to Reduce Cost and Increa...
Integrate Your Data Science & Omni-channel Strategy to Reduce Cost and Increa...Integrate Your Data Science & Omni-channel Strategy to Reduce Cost and Increa...
Integrate Your Data Science & Omni-channel Strategy to Reduce Cost and Increa...
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
 
Data analytics
Data analyticsData analytics
Data analytics
 
LoanHD Overview
LoanHD OverviewLoanHD Overview
LoanHD Overview
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Data Science in Action for an Insurance Product - Shawn Jin
Data Science in Action for an Insurance Product - Shawn JinData Science in Action for an Insurance Product - Shawn Jin
Data Science in Action for an Insurance Product - Shawn Jin
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Scaling Person to Person Enterprise sales
Scaling Person to Person Enterprise salesScaling Person to Person Enterprise sales
Scaling Person to Person Enterprise sales
 
Impact of big data on DCMI market
Impact of big data on DCMI marketImpact of big data on DCMI market
Impact of big data on DCMI market
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 

Destaque

Final Health Learning Platform - Strategy Presentation (Summer 2012)
Final Health Learning Platform - Strategy Presentation (Summer 2012)Final Health Learning Platform - Strategy Presentation (Summer 2012)
Final Health Learning Platform - Strategy Presentation (Summer 2012)
Abram Guerra
 
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Spark Summit
 

Destaque (15)

Platforms for data science
Platforms for data sciencePlatforms for data science
Platforms for data science
 
Open Platforms & Data Smarts: How We Can Do Good Better
Open Platforms & Data Smarts: How We Can Do Good BetterOpen Platforms & Data Smarts: How We Can Do Good Better
Open Platforms & Data Smarts: How We Can Do Good Better
 
Building Open Data Platforms from Nordic APIs Platform Summit
Building Open Data Platforms from Nordic APIs Platform SummitBuilding Open Data Platforms from Nordic APIs Platform Summit
Building Open Data Platforms from Nordic APIs Platform Summit
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
GrowthStack 2016 — Driving Conversions Beyond the Install
GrowthStack 2016 — Driving Conversions Beyond the InstallGrowthStack 2016 — Driving Conversions Beyond the Install
GrowthStack 2016 — Driving Conversions Beyond the Install
 
Final Health Learning Platform - Strategy Presentation (Summer 2012)
Final Health Learning Platform - Strategy Presentation (Summer 2012)Final Health Learning Platform - Strategy Presentation (Summer 2012)
Final Health Learning Platform - Strategy Presentation (Summer 2012)
 
GrowthStack 2016 — Data Platforms: Why Nothing Has Changed Except Everything
GrowthStack 2016 — Data Platforms: Why Nothing Has Changed Except EverythingGrowthStack 2016 — Data Platforms: Why Nothing Has Changed Except Everything
GrowthStack 2016 — Data Platforms: Why Nothing Has Changed Except Everything
 
Accelerating the Value of Data Management Platforms with Tag Management Systems
Accelerating the Value of Data Management Platforms with Tag Management SystemsAccelerating the Value of Data Management Platforms with Tag Management Systems
Accelerating the Value of Data Management Platforms with Tag Management Systems
 
Keynote slides: Platform Strategy Creating Exponential Value in a Connected ...
Keynote slides: Platform Strategy Creating Exponential Value  in a Connected ...Keynote slides: Platform Strategy Creating Exponential Value  in a Connected ...
Keynote slides: Platform Strategy Creating Exponential Value in a Connected ...
 
The Fundamentals of Platform Strategy: Creating Genuine Value with APIs
The Fundamentals of Platform Strategy: Creating Genuine Value with APIsThe Fundamentals of Platform Strategy: Creating Genuine Value with APIs
The Fundamentals of Platform Strategy: Creating Genuine Value with APIs
 
PwC: New IT Platform From Strategy Through Execution
PwC: New IT Platform From Strategy Through ExecutionPwC: New IT Platform From Strategy Through Execution
PwC: New IT Platform From Strategy Through Execution
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 
Platform Strategy and Digital Ecosystems
Platform Strategy and Digital EcosystemsPlatform Strategy and Digital Ecosystems
Platform Strategy and Digital Ecosystems
 
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
 
Platform Strategy: Openness, Innovation & Control
Platform Strategy: Openness, Innovation & ControlPlatform Strategy: Openness, Innovation & Control
Platform Strategy: Openness, Innovation & Control
 

Semelhante a Building Predictive Analytics on Big Data Platforms

Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
dataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
Phi Jack
 

Semelhante a Building Predictive Analytics on Big Data Platforms (20)

Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business Advantage
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Data Science: The Art of Foul Play by Serhiy Shelpuk
Data Science: The Art of Foul Play by Serhiy ShelpukData Science: The Art of Foul Play by Serhiy Shelpuk
Data Science: The Art of Foul Play by Serhiy Shelpuk
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise
 
Data mining
Data miningData mining
Data mining
 
ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...
ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...
ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Data mining
Data miningData mining
Data mining
 
Big data overview
Big data overviewBig data overview
Big data overview
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Big data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessBig data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-business
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Tools and techniques for predictive analytics
Tools and techniques for predictive analyticsTools and techniques for predictive analytics
Tools and techniques for predictive analytics
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi MojabData Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
 
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 

Building Predictive Analytics on Big Data Platforms

  • 1. A Day of Empowerment Building Predictive Analytics on Big Data Platforms
  • 2. 1. Opportunity: Big Data 2. Demystifying Predictive Analytics 3. Taking advantage of combined power
  • 4.
  • 7. Big Data could be looking like rubbish
  • 9. “Data are becoming the new raw material of business” - Craig Mundie, head of research and strategy, Microsoft
  • 10. Modeling true risk Network data analysis to predict failure Customer churn analysis Threat analysis Recommendations Feature Usage analysis Ad targeting …
  • 11. Collect and Store • Complex data (text files, audio, video, images, …) • Multiple sources • Lots of data Process • Batch processing • Parallel execution • Cluster solution Analyze • • • • • Simple visualization (reports, dashboard) Text mining Sentiment analysis Prediction models Collaborative filtering
  • 12.
  • 13. Event sources (Log files, Windows Event Log, WMI, SNMP, database, etc.) Event Storage Event Aggregation and Transformation Event Transport Event Serialization and Archiving Event Processing and Analytics Presentation Query Engine Interactive Search User Full-text Search engine Event DB Rules Engine Reports and Dashboards Full-text Index Predictive Analytics Alerts Visualization E-mail, SMS, SNMP, etc. Operational Management Tools Event Ingestion
  • 14. Event sources (Log files, Windows Event Log, WMI, SNMP, database, etc.) Event Storage Event Transport Event Aggregation and Apache Flume Transformation Event Serialization and Archiving Protobuf, Avro, Thrif t, MessagePack Event Processing and Analytics Presentation Query Engine Impala Interactive Search Custom User Full-text Solr, ElasticSe Search engine arch Full-text Event DB HDFS, Hbase, Cas Index sandra Rules Engine Drools Reports and JasperSoft, Dashboards Tableau Predictive Analytics R Alerts Visualization Custom E-mail, SMS, SNMP, etc. Operational Management Tools Event Ingestion Cloudera Manager, Apache Ambari
  • 15. “The idea that the future is unpredictable is undermined every day by the ease with which the past is explained” ― Daniel Kahneman, Thinking, Fast and Slow
  • 16. More data is available for companies Storage technologies allow to store and operate it Advanced analytics could be applied to this new data to achieve competitive advantage
  • 17. Descriptive Diagnostic Predictive Prescriptive What happened? Why did it happen? What is going to happen? What should we do about that? Hindsight Insight Foresight
  • 18. Senior (Executive) Management Ambiguity The goals to be achieved or the problem to be solved is unclear Alternatives are difficult to define Information about outcomes is unavailable. Uncertainty Middle Management Managers know which goals they wish to achieve. Information about alternatives and future events is incomplete. Risk Junior (Line) Management A decision has clear goals and good information is available, but the future outcomes associated with each alternative are subject to chance. Certainty All of the information the decision maker needs is fully available
  • 19. Define objective • Increase customer satisfaction level • Identify prospective customers • Identify crossselling opportunities • Decrease time to market • Decrease costs of marketing campaigns Identify data sets Design the model • Historical data on • Classification model for Internet customers from users defining CRM system what one is • Geographical interested in location data • Smartphone data • Adaptive control models for • Social network managing IT and data network • Text data from the infrastructure Internet pages • Probabilistic • Image data from model for defining the medical credit worthiness sources Design the solution • Data storage type • Logical database design • Availability and scalability of the solution • Integration into corporate information environment • Solution deployment model Implement the solution • Add new functionality to the existing corporate BI platform • Implement new BI solution • Enrich existing business system (CRM, ERP) with the predictive analytics functionality
  • 20. Business Tasks Model Family Algorithms • Define prospective customers • Define traffic jams in the city • Recommend restaurants and menus • Adjust UI to the particular user • Classify body part on X-Ray image • Define market niche • Define influencers in the social networks • Define similar customers or projects in portfolio • Define informal groups in the organization • Define fraud bank transaction • Define network intrusion attempts • Provide automatic aircraft engine testing • Provide automatic IT infrastructure monitoring • Provide clinical test analysis • Define the best price for the goods or services to maximize profits • Define best working schedule for the store • Define best amount of production • Define best business rules Classification Clustering Anomaly Detection Optimization • Naïve Bayes • Logistic regression • Support Vector Machines • Neural Networks • K-Means • K nearest neighbor • Self-organized maps • Mixture of Gaussians • Mixture of Gaussians • Self-learning anomaly detection • • • • • Gradient descent Simplex method Newton’s method Normal equations Genetic algorithms
  • 21. Google to Buy Waze for $1.3 Billion Xerox plans to clear traffic on I-10 The promise of better data has MetLife investing $300M in new tech Gracenote did a whole business on recommending music Obama’s data scientists built a volunteer army on Facebook
  • 22.
  • 23. Description: Cloud-based service for providing more accurate estimates of the credit worthiness (loan scoring) using publicly available data from social networks. Service is oriented to be used by banks. Technologies:      Amazon EC2 MySQL SAP HANA R JAVA Credit Score
  • 25. Description: Computer aid diagnostic system that can recognize human body part on X-Ray image and detect broken or fractured bones X-Ray Image Technologies:      Matlab/Octave Python PyBrain NumPy SciPy Analytical Engine This is a hand. Broken bone detected
  • 27. Big Data and NoSQL Data Warehouse Data Integration BI Platforms
  • 28. Big Data Analytics Predictive Analytics Data Science Service Data Integration Data Warehousing Data Visualization and Analysis