SlideShare uma empresa Scribd logo
1 de 42
Decision Making And Lambda Architecture
Girish S Kathalagiri
Samsung SDS Research America
AGENDA
• Introduction
• Decision Making System: Intro and Algorithms
• Decision Making System: Architecture and components
INTRODUCTION
SAMSUNG SDS
SAMSUNG SDS IS THE ENTERPRISE SOLUTIONS ARM OF THE SAMSUNG GROUP, WITH A
MAJOR FOOTPRINT IN ASIA AND EMERGING PRESENCE IN THE US
3.9 4.1
5.7
6.7
7.2
2010 2011 2012 2013 2014
REVENUE (2014)
$7.2B
GLOBAL PRESENCE
47+ offices1 in 30 countries
EMPLOYEES
21,796
MARKET POSITION2
No. 1 Korean IT services provider
No. 2 largest IT service provider
in the Asia-Pacific region (excluding Japan)
Source: 1 includes IT outsourcing and logistics offices, as of December 31, 2014 2 Market Share, Gartner, 2014 3 Expressed in U.S. dollars at exchange rate in effect on December 31 of respective year
SAMSUNG SDS RESEARCH AMERICA
SDS Research America Focus
Decision Making
Recommendation
Decision
Insights
Model
Feature
Data
TEAM
Decision Making System: Intro and Algorithm
EXAMPLES of DECISION Making in online world
• Ad Selection
• News Article Recommendations
• Website Optimization
• Auction and real-time bidding.
• Recommendation Systems.
TERMINOLOGY
• Set of options that are available for a problem.
Action/Arm
• Clicks, profit, revenue
Reward
• Software system that takes the decisions
Agent
• Factors external to the system with which the agent
is interacting
Environment
• Side information that is available
Context
Learning from interaction
EXPLORATION vs EXPLOITATION TRADE off
Decision-making involves a
fundamental choice
Exploitation :
Make the best decision with
existing information that was
collected.
Exploration :
Gather more information to see
if there are better decisions that
can be made.
EXPLORATION vs EXPLOITATION EXAMPLES
• Online Advertising :
– Exploitation : Show most successful
ad
– Exploration: Show a different ad
• Restaurant Selection:
– Exploitation : favorite restaurant
– Exploration : Trying a new one
• Cuisine selection:
– Exploitation : favorite dish
– Exploration : Try a new one
• Game :
– Exploitation : Play the best move
(your belief)
– Exploration : Try a new move
EXPLORATION vs EXPLOITATION TRADE off
Area Exploration Exploitation
Economics Risk-Taking Risk-Avoiding
Finance Investing Saving
Marketing Diversification Concentration
Medicine Experimental treatment Safety and efficacy
CUMMULATIVE REWARD
Objective : Maximizing the Expected Cumulative Reward
REGRET
Objective : Minimize the Regret , over time horizon T
CHARACTERISTICS OF LEARNING WITH
INTERACTION
• Agent Interacts with the environment to gather more data
• Agent performance is based on Agent’s decision
• Data available to Agent to learn is based on its decision
Multi ARMED BANDIT
[Robbins ‘52]
Multi-armed bandit
Set of K arms ( actions, choices , options )
At each time step t = 1 .. N
Agent selects an arm
Receives a reward from the environment
Agent updates the belief about the arms
(estimates the value).
How does Agent selects the arm at any point of time ?
Multi-armed bandit : EPSILON - GREEDY
Greedy (Exploit) : Highest estimated
reward
Epsilon (Explore ) : Random choice
Dealing with Epsilon:
• Constant epsilon value (Epsilon Greedy
Strategy)
• Epsilon-Decreasing Strategy
• Epsilon-First Strategy
Multi-armed bandit : SOFTMAX
• Epsilon-Greedy is relatively
insensitive towards relative
performance levels
– Arms 0.99 vs. 0.01 and 0.52 vs. 0.48
• Softmax Strategy (Structured
Exploration)
– Chooses the arm proportional to
the estimated value of arms
What if the initial few exploration was not so rewarding ?
Multi-armed bandit : Upper Confidence bound
(UCB)
1. Take action that has best
estimated mean reward plus
confidence
2. Environment generates reward
3. Agent Updates its expected mean
reward and confidence interval.Optimism in the face of uncertainty
[Auer ’02]
Multi-armed bandit : Thompson sampling
1. For each arm, sample parameter
from Beta distribution.
2. Choose the arm that has
maximum reward for the chosen
parameter.
3. Environment generates reward
4. Agent Updates the distribution
for the arm.
[Thompson 1993]
Stream Processing of Multi-armed bandit
Time
Update
stats for
arms
Update
stats for
arms
Update
stats
Data (t-1) Data (t) Data (t+1)
Arm
stats (t-1)
Arm
stats (t)
Arm
stats (t)
Epsilon Greedy : estimate mean rewards for each arm
Softmax : estimate mean rewards for each arm , calculate softmax
Upper Confidence bound : estimate mean and confidence interval
Thompson Sampling : Update the parameters of beta dist.
Contextual Multi-armed bandit
• For t = 1, . . . , T:
1. The Environment request with
some context xt ∈ X
2. The Agent chooses an action at ∈
{1, . . . ,K} for the context
1. The Environment reacts with
reward rt(at)
2. The Agent updates the model
Goal : Best action for the context.
[Auer-CesaBianchi-Freund-Schapire ’02]
Optimization
Initialize Model Parameter
Repeat {
Using data, update the model
parameters
} until convergence
ONLINE and batch learning
Online Learning (Stream Processing)
Batch Learning
Quick update on
Parameters
Update parameters
from prev mini-batch
Update parameters
from prev mini-batch
Data (t-1)
Data (t)
Data (t+1)
Initialize Parameters
Initialize Parameters
All the training
data
Learn Model
Parameters
Faster Learning ,Approximation
Vs
Long term trends , Accurate Learning
TIMESCALEs FOR LEARNING
Algorithms for Contextual Multi-armed Bandit
LinUCB [ Li et al 2010]
Thompson Sampling with Logistic Regression[Chapelle and Li 2011 ]
DECISION MAKING SYSTEM: ARCHITECTURE
AND COMPONENTS
SOFTWARE STACK
• Real time decision making
• Scalable System
• Batch and Online Learning
Analytics Framework
KAFKA : Distributed Messaging system
• Distributed by design (Fault
tolerant).
• Fast and Scalable.
• High throughput for both
publishing and subscribing.
• Multi-subscribers.
• Persist messages on disk :
batched consumption as well as
real time applications.
http://kafka.apache.org/
SPARK and SPARK STREAMING
• High volume data processing for
feature extraction as a means of
modeling business environment
state;
• Model training on historical events
• Stream processing for Online
updates
• Machine Learning Library
http://spark.apache.org/
MLLIB : Machine Learning Library
• Spark Integration
• Distributed Machine Learning
Algorithms
• Algorithmic Optimization
• High and Developer APIs
• Community
Basic Statistics
Summary Statistics
Correlations
Stratified Sampling
Hypothesis testing
Random Data Generator
Classification and
Regression
Linear Models ( SVM, logistic
regression )
Naïve bayes
Tree based models ( GBT, RF,
DT)
Collaborative filtering
Alternating
Least
Squares
(ALS)
Optimization
Stochastic gradient descent
(SGD)
Limited-memory BFGS
(L-BFGS)
Dimensionality
Reduction
Singular value decomposition
(SVD)
Principal component analysis
(PCA)
Clustering
K-means
Gaussian Mixture
Power iteration clustering
Latent Dirichlet allocation
Streaming k-means
http://www.jmlr.org/papers/volume17/15-237/15-237.pdf
Model Storage
• Hbase
• Models stored in PMML format.
– Import and Export from external
system
• Model metrics and statistics are
stored.
• Configuration information of the
system.
http://dmg.org/pmml/pmml_examples/index.html
LAMBDA Architecture
SERVING LAYER
• PLAY Framework
• Interfacing with external system
• Low Latency
• Mechanism for Multiple Models.
• Processes Request and Reward
messages.
• Retrieves Model from Model
store and caches.
• Logs the messages to Kafka topic.
SPEED LAYER
• Spark streaming application
• Receives messages from Kafka in
micro batches for processing.
• Latest model from Model Store
and updates and stores the
model.
• Notifies the Model update to
serving layer.
HISTORY LOGGER
• Spark Streaming application
• Kafka consumer.
– Archives messages logged by
serving layer
• HDFS long term storage.
• Archived data used by batch
layer.
BATCH LAYER
• Spark application
• Reads the historical archived
data.
• Configured sliding window.
• Generates training data
• New Model from scratch.
• Stores it into Model Storage
MANAGEMENT SERVICES
• Suite of application
• Configuration of the system
• Monitoring the processes
• Administrative UI
• Authorization and Role based
access control.
• Scheduling of workflows
LAMBDA Architecture
RECAP
• Decision making algorithms that has Exploration vs
Exploitation tradeoffs
• Multi-armed bandit and Contextual Multi-armed bandit
algorithms.
• Lambda architecture
QUESTIONS ?
REFERENCES
1. A contextual-bandit approach to personalized news article recommendation; Lihong Li, Wei Chu, John Langford,
Robert E. Schapire
2. Generalized Thompson Sampling for Contextual Bandits; Lihong Li
3. Big Data: Principles and best practices of scalable realtime data systems. Nathan Marz & Warren J.
4. Data Mining Group. Predictive Model Markup Language.
5. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits ; Alekh Agarwal, Daniel Hsu, Satyen Kale,
John Langford, Lihong Li, Robert E. Schapire
6. Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms; Lihong Li, Wei
Chu, John Langford, Xuanhui Wang
7. Reinforcement Learning: An Introduction ; Richard S. Sutton ,Andrew G. Barto

Mais conteúdo relacionado

Mais procurados

Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Databricks
 
AI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceAI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer Experience
Databricks
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit
 
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Introduction to Large Scale Data Analysis with WSO2 Analytics PlatformIntroduction to Large Scale Data Analysis with WSO2 Analytics Platform
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Srinath Perera
 
Apache Spark for Cyber Security in an Enterprise Company
Apache Spark for Cyber Security in an Enterprise CompanyApache Spark for Cyber Security in an Enterprise Company
Apache Spark for Cyber Security in an Enterprise Company
Databricks
 
Online learning talk
Online learning talkOnline learning talk
Online learning talk
Emily Chin
 

Mais procurados (20)

Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
 
AI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceAI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer Experience
 
Streamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache Pulsar
 
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Active Learning for Fraud Prevention
Active Learning for Fraud PreventionActive Learning for Fraud Prevention
Active Learning for Fraud Prevention
 
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisBig Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with Redis
 
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Introduction to Large Scale Data Analysis with WSO2 Analytics PlatformIntroduction to Large Scale Data Analysis with WSO2 Analytics Platform
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
 
Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018
 
Apache Spark for Cyber Security in an Enterprise Company
Apache Spark for Cyber Security in an Enterprise CompanyApache Spark for Cyber Security in an Enterprise Company
Apache Spark for Cyber Security in an Enterprise Company
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
 
Online learning talk
Online learning talkOnline learning talk
Online learning talk
 
Big Data LDN 2017: Delivering Instant Experience with Redid Enterprise
Big Data LDN 2017: Delivering Instant Experience with Redid EnterpriseBig Data LDN 2017: Delivering Instant Experience with Redid Enterprise
Big Data LDN 2017: Delivering Instant Experience with Redid Enterprise
 

Destaque

MAB_EE_冷启动-jinghuixiao
MAB_EE_冷启动-jinghuixiaoMAB_EE_冷启动-jinghuixiao
MAB_EE_冷启动-jinghuixiao
xceman
 

Destaque (20)

MAB_EE_冷启动-jinghuixiao
MAB_EE_冷启动-jinghuixiaoMAB_EE_冷启动-jinghuixiao
MAB_EE_冷启动-jinghuixiao
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Samsung SDS FIDO for Financial Services
Samsung SDS FIDO for Financial ServicesSamsung SDS FIDO for Financial Services
Samsung SDS FIDO for Financial Services
 
Ee 想说爱你不容易
Ee 想说爱你不容易Ee 想说爱你不容易
Ee 想说爱你不容易
 
multi-armed bandit
multi-armed banditmulti-armed bandit
multi-armed bandit
 
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
 
140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
 
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
 
20140614 introduction to spark-ben white
20140614 introduction to spark-ben white20140614 introduction to spark-ben white
20140614 introduction to spark-ben white
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
 
Summit v4 dave wolcott
Summit v4 dave wolcottSummit v4 dave wolcott
Summit v4 dave wolcott
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jha
 
Ag big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopalAg big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopal
 
Yarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-tingYarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-ting
 
Big datacamp june14_alex_liu
Big datacamp june14_alex_liuBig datacamp june14_alex_liu
Big datacamp june14_alex_liu
 
Kiji cassandra la june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kellyKiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la june 2014 - v02 clint-kelly
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
 

Semelhante a Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Architecture, Girish Kathalagiri - Staff Engineer, Samsung SDS Research America

BI Chapter 04.pdf business business business business
BI Chapter 04.pdf business business business businessBI Chapter 04.pdf business business business business
BI Chapter 04.pdf business business business business
JawaherAlbaddawi
 
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
Deltares
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
VMware Tanzu
 
Sql saturday databasemonitoringbestpractices_updated
Sql saturday databasemonitoringbestpractices_updatedSql saturday databasemonitoringbestpractices_updated
Sql saturday databasemonitoringbestpractices_updated
aspectconsult
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
Databricks
 

Semelhante a Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Architecture, Girish Kathalagiri - Staff Engineer, Samsung SDS Research America (20)

IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1
 
kdd2015
kdd2015kdd2015
kdd2015
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
 
BI Chapter 04.pdf business business business business
BI Chapter 04.pdf business business business businessBI Chapter 04.pdf business business business business
BI Chapter 04.pdf business business business business
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
 
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
how to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept projecthow to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept project
 
Sql saturday databasemonitoringbestpractices_updated
Sql saturday databasemonitoringbestpractices_updatedSql saturday databasemonitoringbestpractices_updated
Sql saturday databasemonitoringbestpractices_updated
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
 
1710 track3 zhu
1710 track3 zhu1710 track3 zhu
1710 track3 zhu
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
 
The Core of Testing – Dynamic Testing Process – According to ISO 29119 with...
The Core of Testing  – Dynamic Testing Process –  According to ISO 29119 with...The Core of Testing  – Dynamic Testing Process –  According to ISO 29119 with...
The Core of Testing – Dynamic Testing Process – According to ISO 29119 with...
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
 
How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)
 
dd presentation.pdf
dd presentation.pdfdd presentation.pdf
dd presentation.pdf
 

Mais de Data Con LA

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 

Mais de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Architecture, Girish Kathalagiri - Staff Engineer, Samsung SDS Research America

  • 1. Decision Making And Lambda Architecture Girish S Kathalagiri Samsung SDS Research America
  • 2. AGENDA • Introduction • Decision Making System: Intro and Algorithms • Decision Making System: Architecture and components
  • 4. SAMSUNG SDS SAMSUNG SDS IS THE ENTERPRISE SOLUTIONS ARM OF THE SAMSUNG GROUP, WITH A MAJOR FOOTPRINT IN ASIA AND EMERGING PRESENCE IN THE US 3.9 4.1 5.7 6.7 7.2 2010 2011 2012 2013 2014 REVENUE (2014) $7.2B GLOBAL PRESENCE 47+ offices1 in 30 countries EMPLOYEES 21,796 MARKET POSITION2 No. 1 Korean IT services provider No. 2 largest IT service provider in the Asia-Pacific region (excluding Japan) Source: 1 includes IT outsourcing and logistics offices, as of December 31, 2014 2 Market Share, Gartner, 2014 3 Expressed in U.S. dollars at exchange rate in effect on December 31 of respective year
  • 5. SAMSUNG SDS RESEARCH AMERICA SDS Research America Focus Decision Making Recommendation Decision Insights Model Feature Data
  • 7. Decision Making System: Intro and Algorithm
  • 8. EXAMPLES of DECISION Making in online world • Ad Selection • News Article Recommendations • Website Optimization • Auction and real-time bidding. • Recommendation Systems.
  • 9. TERMINOLOGY • Set of options that are available for a problem. Action/Arm • Clicks, profit, revenue Reward • Software system that takes the decisions Agent • Factors external to the system with which the agent is interacting Environment • Side information that is available Context Learning from interaction
  • 10. EXPLORATION vs EXPLOITATION TRADE off Decision-making involves a fundamental choice Exploitation : Make the best decision with existing information that was collected. Exploration : Gather more information to see if there are better decisions that can be made.
  • 11. EXPLORATION vs EXPLOITATION EXAMPLES • Online Advertising : – Exploitation : Show most successful ad – Exploration: Show a different ad • Restaurant Selection: – Exploitation : favorite restaurant – Exploration : Trying a new one • Cuisine selection: – Exploitation : favorite dish – Exploration : Try a new one • Game : – Exploitation : Play the best move (your belief) – Exploration : Try a new move
  • 12. EXPLORATION vs EXPLOITATION TRADE off Area Exploration Exploitation Economics Risk-Taking Risk-Avoiding Finance Investing Saving Marketing Diversification Concentration Medicine Experimental treatment Safety and efficacy
  • 13. CUMMULATIVE REWARD Objective : Maximizing the Expected Cumulative Reward
  • 14. REGRET Objective : Minimize the Regret , over time horizon T
  • 15. CHARACTERISTICS OF LEARNING WITH INTERACTION • Agent Interacts with the environment to gather more data • Agent performance is based on Agent’s decision • Data available to Agent to learn is based on its decision
  • 17. Multi-armed bandit Set of K arms ( actions, choices , options ) At each time step t = 1 .. N Agent selects an arm Receives a reward from the environment Agent updates the belief about the arms (estimates the value). How does Agent selects the arm at any point of time ?
  • 18. Multi-armed bandit : EPSILON - GREEDY Greedy (Exploit) : Highest estimated reward Epsilon (Explore ) : Random choice Dealing with Epsilon: • Constant epsilon value (Epsilon Greedy Strategy) • Epsilon-Decreasing Strategy • Epsilon-First Strategy
  • 19. Multi-armed bandit : SOFTMAX • Epsilon-Greedy is relatively insensitive towards relative performance levels – Arms 0.99 vs. 0.01 and 0.52 vs. 0.48 • Softmax Strategy (Structured Exploration) – Chooses the arm proportional to the estimated value of arms What if the initial few exploration was not so rewarding ?
  • 20. Multi-armed bandit : Upper Confidence bound (UCB) 1. Take action that has best estimated mean reward plus confidence 2. Environment generates reward 3. Agent Updates its expected mean reward and confidence interval.Optimism in the face of uncertainty [Auer ’02]
  • 21. Multi-armed bandit : Thompson sampling 1. For each arm, sample parameter from Beta distribution. 2. Choose the arm that has maximum reward for the chosen parameter. 3. Environment generates reward 4. Agent Updates the distribution for the arm. [Thompson 1993]
  • 22. Stream Processing of Multi-armed bandit Time Update stats for arms Update stats for arms Update stats Data (t-1) Data (t) Data (t+1) Arm stats (t-1) Arm stats (t) Arm stats (t) Epsilon Greedy : estimate mean rewards for each arm Softmax : estimate mean rewards for each arm , calculate softmax Upper Confidence bound : estimate mean and confidence interval Thompson Sampling : Update the parameters of beta dist.
  • 23. Contextual Multi-armed bandit • For t = 1, . . . , T: 1. The Environment request with some context xt ∈ X 2. The Agent chooses an action at ∈ {1, . . . ,K} for the context 1. The Environment reacts with reward rt(at) 2. The Agent updates the model Goal : Best action for the context. [Auer-CesaBianchi-Freund-Schapire ’02]
  • 24. Optimization Initialize Model Parameter Repeat { Using data, update the model parameters } until convergence
  • 25. ONLINE and batch learning Online Learning (Stream Processing) Batch Learning Quick update on Parameters Update parameters from prev mini-batch Update parameters from prev mini-batch Data (t-1) Data (t) Data (t+1) Initialize Parameters Initialize Parameters All the training data Learn Model Parameters Faster Learning ,Approximation Vs Long term trends , Accurate Learning
  • 26. TIMESCALEs FOR LEARNING Algorithms for Contextual Multi-armed Bandit LinUCB [ Li et al 2010] Thompson Sampling with Logistic Regression[Chapelle and Li 2011 ]
  • 27. DECISION MAKING SYSTEM: ARCHITECTURE AND COMPONENTS
  • 28. SOFTWARE STACK • Real time decision making • Scalable System • Batch and Online Learning Analytics Framework
  • 29. KAFKA : Distributed Messaging system • Distributed by design (Fault tolerant). • Fast and Scalable. • High throughput for both publishing and subscribing. • Multi-subscribers. • Persist messages on disk : batched consumption as well as real time applications. http://kafka.apache.org/
  • 30. SPARK and SPARK STREAMING • High volume data processing for feature extraction as a means of modeling business environment state; • Model training on historical events • Stream processing for Online updates • Machine Learning Library http://spark.apache.org/
  • 31. MLLIB : Machine Learning Library • Spark Integration • Distributed Machine Learning Algorithms • Algorithmic Optimization • High and Developer APIs • Community Basic Statistics Summary Statistics Correlations Stratified Sampling Hypothesis testing Random Data Generator Classification and Regression Linear Models ( SVM, logistic regression ) Naïve bayes Tree based models ( GBT, RF, DT) Collaborative filtering Alternating Least Squares (ALS) Optimization Stochastic gradient descent (SGD) Limited-memory BFGS (L-BFGS) Dimensionality Reduction Singular value decomposition (SVD) Principal component analysis (PCA) Clustering K-means Gaussian Mixture Power iteration clustering Latent Dirichlet allocation Streaming k-means http://www.jmlr.org/papers/volume17/15-237/15-237.pdf
  • 32. Model Storage • Hbase • Models stored in PMML format. – Import and Export from external system • Model metrics and statistics are stored. • Configuration information of the system. http://dmg.org/pmml/pmml_examples/index.html
  • 34. SERVING LAYER • PLAY Framework • Interfacing with external system • Low Latency • Mechanism for Multiple Models. • Processes Request and Reward messages. • Retrieves Model from Model store and caches. • Logs the messages to Kafka topic.
  • 35. SPEED LAYER • Spark streaming application • Receives messages from Kafka in micro batches for processing. • Latest model from Model Store and updates and stores the model. • Notifies the Model update to serving layer.
  • 36. HISTORY LOGGER • Spark Streaming application • Kafka consumer. – Archives messages logged by serving layer • HDFS long term storage. • Archived data used by batch layer.
  • 37. BATCH LAYER • Spark application • Reads the historical archived data. • Configured sliding window. • Generates training data • New Model from scratch. • Stores it into Model Storage
  • 38. MANAGEMENT SERVICES • Suite of application • Configuration of the system • Monitoring the processes • Administrative UI • Authorization and Role based access control. • Scheduling of workflows
  • 40. RECAP • Decision making algorithms that has Exploration vs Exploitation tradeoffs • Multi-armed bandit and Contextual Multi-armed bandit algorithms. • Lambda architecture
  • 42. REFERENCES 1. A contextual-bandit approach to personalized news article recommendation; Lihong Li, Wei Chu, John Langford, Robert E. Schapire 2. Generalized Thompson Sampling for Contextual Bandits; Lihong Li 3. Big Data: Principles and best practices of scalable realtime data systems. Nathan Marz & Warren J. 4. Data Mining Group. Predictive Model Markup Language. 5. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits ; Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire 6. Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms; Lihong Li, Wei Chu, John Langford, Xuanhui Wang 7. Reinforcement Learning: An Introduction ; Richard S. Sutton ,Andrew G. Barto

Notas do Editor

  1. Focus : Decision making algorithms and solutions using these algorithms. Some of it we will be talking about through course of the presentation.
  2. Lets first look at decision making in general and algorithms in this section
  3. Learning from interaction
  4. Fields
  5. Imagine a casino setting … Also, K-armed bandit problem where a Gambler is faced with set of slot machines with different payout distributions. At each time Gambler has to choose an arm , which pays out some reward. Objective : To maximize the sum of rewards earned in a sequence of lever pulls.
  6. Little more formal definition.
  7. Under explore the options that initially gave less reward.
  8. the Agent’s aim is to collect enough information about how the context vectors and rewards relate to each other, so that it can predict the next best arm to play by looking at the feature vectors
  9. More explanation ….. ----- Meeting Notes (5/22/16 20:01) ----- Iterative jobs and In Memory Computing.... Moves to optimal value.
  10. Challenges that are presented by these algorithms Lambda Architecture
  11. Sliding window on the data , so that we can decrease the influence of historical data. New article example ..