SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
The Impact of Big Data on
Classic Machine Learning
Algorithms
Thomas Jensen, Senior Business Analyst @ Expedia
Who am I?
• Senior Business Analyst @ Expedia
• Working within the competitive
intelligence unit
• Responsible for :
• Algorithm that score new hotels
• Algorithm that predicts room nights
sold on existing Expedia hotels
• Scraping competitor sites
• Other stuff….
The Promise of Big Data
Real time data
Data driven decision
More accurate and
robust models
Granularity
Big Data Challenges
Data Processing – not
going to talk about
this.
Speed at which to use
data – how fast should
we update
algorithms?
How do we train
algorithms on data
sets that do not fit
into memory?
Big Data Challenges
Taken from: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Classification - Logistic Regression
• One classic task in machine learning / statistics is to classify some
objects/events/decisions correctly
• Examples are:
• Customer churn
• Click behavior
• Purchase behavior
• ….
• One of the most popular algorithms to carry out these tasks is logistic
regression
What is logistic regression?
• Logistic regression attaches probabilities to individual outcomes,
showing how likely they are to belong to one class or the other
• Pr 𝑦 𝑥 =
1
1+𝑒−𝑥𝛽
• The challenge is to choose the
optimal beta(s)
• To do that we minimize a cost
function
Why Use Logistic Regression?
• It is simple and well understood algorithm
• Outputs probabilities
• There are tried and tested models to estimate the parameters
• It is flexible – can handle a number of different inputs, and feature
transformations
Usual Approaches
• Batch training (offline approach)
• Get all the data and train the algorithm in one go
• Disadvantages when data is big
• Requires all data to be loaded into memory
• Periodic retraining is necessary
• Very time consuming with big data!
Batch Training
Examples of Logistic Regression in Industry
Settings – Real Time Bidding
• RTB
• RTB algorithms are usually
based on logistic regression
• Whether or not to bid on a
user is determined by the
probability that the user will
click on an add
• Each day billions of bids are
processed
• Each bid has to be processed
within 80 milliseconds
Examples of Logistic Regression in Industry
Settings – Fraud Detection
Detecting Fraudulent Credit Card
Transactions
• The probability that a transaction
is using a stolen credit card is
typically estimated with logistic
regression
• Billions of transactions are
analyzed each day
How Slow is the Batch Version of Logistic
Regression?
One target variable and two feature vectors.
All randomly generated.
A Real World Problem
A Real World Problem
• Some stats on the training job in the pipeline:
• Runs training jobs on a per country basis
• Longest running job lasts ~9 hours
• Shortest running job lasts ~3 hours
• There are often convergence failures
• What we need an algorithm that:
• Can reduce training time
• Is robust towards convergence failures
A Big Data Friendly Approach
Online Training
• Pass each data point sequentially through the algorithm
• Only requires one data point at a time in memory
• Allows for on-the-fly training of the algorithm
Online Learning
• We want to learn a vector of
weights
• Initialize all weights. Begin loop:
1. Get training example
2. Make a prediction for the target
variable
3. Learn the true value of the
target
4. Update the weights and go to 1
Online Learning
• Initialise all weights. Begin loop:
Repeat {
For i = 1 to m {
𝜃𝑗 = 𝜃𝑗 − 𝛼
𝜕
𝜕𝜃 𝑗
𝑐𝑜𝑠𝑡(𝜃, (𝑥𝑖, 𝑦𝑖))
}
}
the partial derivative
of the cost functions
the cost function – given
theta and row i, i.e. how wrong
Are we?
the step size – how fast
we should climb the
gradient
Online Learning
• Approaches the maximum of the function in a jumpy manner and
never actually settles on the maximum.
Batch vs. Online Learning
Data
Size: 4.8GB
Rows: 500,000
Columns: 5000
0
20
40
60
80
100
120
Batch SGDClassifier Sofia-ml
Training
*Times include reading data and training algorithm
Online Learning Vs. Batch
Online Learning
• When we have a continuous
stream of data
• When It is important to update
the algorithm in real time – can
hit a moving target
• When training speed is
important
• Parameters are “jumpy” around
the optimal values
Batch
• When it is very important to get
the exact optimal values
• When data can fit in memory
• When training time is not of the
essence
Popular Online Learning Libraries
• Sofia-ml (c/c++)
• Requires data in svmLight format
• Have implementations of SVM, Neural networks and logistic regression
• Supports classification and ranking
• Wovbal wabbit (c/c++)
• Requires data in own wv format
• Have implementations of the most popular loss functions
• Supports classification, ranking and regression
• Pandas + scikit-learn (python)
• Pandas has a nice function for reading files in batches
• Can handle sparse and non-sparse matrices
• Scikit–learn has an SGD classifier that can fit the model in batches
• Supports classification, ranking and regression
Thomas Jensen. Machine Learning

Mais conteúdo relacionado

Mais procurados

Apache Spark and Machine Learning Boosts Revenue Growth for Online Retailers ...
Apache Spark and Machine Learning Boosts Revenue Growth for Online Retailers ...Apache Spark and Machine Learning Boosts Revenue Growth for Online Retailers ...
Apache Spark and Machine Learning Boosts Revenue Growth for Online Retailers ...
Databricks
 
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
Databricks
 

Mais procurados (20)

Gender Prediction with Databricks AutoML Pipeline
Gender Prediction with Databricks AutoML PipelineGender Prediction with Databricks AutoML Pipeline
Gender Prediction with Databricks AutoML Pipeline
 
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
 
Parallel machines flinkforward2017
Parallel machines flinkforward2017Parallel machines flinkforward2017
Parallel machines flinkforward2017
 
Conference 2014: Rajat Arya - Deployment with GraphLab Create
Conference 2014: Rajat Arya - Deployment with GraphLab Create Conference 2014: Rajat Arya - Deployment with GraphLab Create
Conference 2014: Rajat Arya - Deployment with GraphLab Create
 
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
Zipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering FrameworkZipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering Framework
 
Apache Spark and Machine Learning Boosts Revenue Growth for Online Retailers ...
Apache Spark and Machine Learning Boosts Revenue Growth for Online Retailers ...Apache Spark and Machine Learning Boosts Revenue Growth for Online Retailers ...
Apache Spark and Machine Learning Boosts Revenue Growth for Online Retailers ...
 
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
 
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
 
Is This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the PeopleIs This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the People
 
AutoML Toolkit – Deep Dive
AutoML Toolkit – Deep DiveAutoML Toolkit – Deep Dive
AutoML Toolkit – Deep Dive
 

Destaque (8)

Ramunas Urbonas. The Journey
Ramunas Urbonas. The JourneyRamunas Urbonas. The Journey
Ramunas Urbonas. The Journey
 
Dionizas Antipenkovas. Big Data Intro
Dionizas Antipenkovas. Big Data IntroDionizas Antipenkovas. Big Data Intro
Dionizas Antipenkovas. Big Data Intro
 
Tadas Pivorius. Married to Cassandra
Tadas Pivorius. Married to CassandraTadas Pivorius. Married to Cassandra
Tadas Pivorius. Married to Cassandra
 
Ramunas Balukonis. Research DWH
Ramunas Balukonis. Research DWHRamunas Balukonis. Research DWH
Ramunas Balukonis. Research DWH
 
Brian Bulkowski. Aerospike
Brian Bulkowski. AerospikeBrian Bulkowski. Aerospike
Brian Bulkowski. Aerospike
 
Andrei Kirilenkov. Vertica
Andrei Kirilenkov. VerticaAndrei Kirilenkov. Vertica
Andrei Kirilenkov. Vertica
 
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and EcosystemErnestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
 
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL database
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL databaseСергей Сверчков и Виталий Руденя. Choosing a NoSQL database
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL database
 

Semelhante a Thomas Jensen. Machine Learning

An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
Randy Shoup
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
Srinath Perera
 

Semelhante a Thomas Jensen. Machine Learning (20)

Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Learn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-TimeLearn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-Time
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Pragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML SpainPragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML Spain
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
 
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
 
The Fine Art of Combining Capacity Management with Machine Learning
The Fine Art of Combining Capacity Management with Machine LearningThe Fine Art of Combining Capacity Management with Machine Learning
The Fine Art of Combining Capacity Management with Machine Learning
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 

Mais de Volha Banadyseva

Андрей Светлов. Aiohttp
Андрей Светлов. AiohttpАндрей Светлов. Aiohttp
Андрей Светлов. Aiohttp
Volha Banadyseva
 
Александр Чекан. 28 правДИвых слайдов о белорусах в интернете
Александр Чекан. 28 правДИвых слайдов о белорусах в интернетеАлександр Чекан. 28 правДИвых слайдов о белорусах в интернете
Александр Чекан. 28 правДИвых слайдов о белорусах в интернете
Volha Banadyseva
 
Мастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в store
Мастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в storeМастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в store
Мастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в store
Volha Banadyseva
 
Бахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App Store
Бахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App StoreБахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App Store
Бахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App Store
Volha Banadyseva
 
Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...
Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...
Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...
Volha Banadyseva
 
Евгений Невгень. Оптимизация мета-данных приложения для App Store и Google Play
Евгений Невгень. Оптимизация мета-данных приложения для App Store и Google PlayЕвгений Невгень. Оптимизация мета-данных приложения для App Store и Google Play
Евгений Невгень. Оптимизация мета-данных приложения для App Store и Google Play
Volha Banadyseva
 
Егор Белый. Модели успешной монетизации мобильных приложений
Егор Белый. Модели успешной монетизации мобильных приложенийЕгор Белый. Модели успешной монетизации мобильных приложений
Егор Белый. Модели успешной монетизации мобильных приложений
Volha Banadyseva
 
Станислав Пацкевич. Инструменты аналитики для мобильных платформ
Станислав Пацкевич. Инструменты аналитики для мобильных платформСтанислав Пацкевич. Инструменты аналитики для мобильных платформ
Станислав Пацкевич. Инструменты аналитики для мобильных платформ
Volha Banadyseva
 
Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...
Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...
Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...
Volha Banadyseva
 
Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...
Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...
Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...
Volha Banadyseva
 
Александр Дзюба. Знать игрока: плейтест на стадии прототипа и позже
Александр Дзюба. Знать игрока: плейтест на стадии прототипа и позжеАлександр Дзюба. Знать игрока: плейтест на стадии прототипа и позже
Александр Дзюба. Знать игрока: плейтест на стадии прототипа и позже
Volha Banadyseva
 

Mais de Volha Banadyseva (20)

Андрей Светлов. Aiohttp
Андрей Светлов. AiohttpАндрей Светлов. Aiohttp
Андрей Светлов. Aiohttp
 
Сергей Зефиров
Сергей ЗефировСергей Зефиров
Сергей Зефиров
 
Eugene Burmako
Eugene BurmakoEugene Burmako
Eugene Burmako
 
Heather Miller
Heather MillerHeather Miller
Heather Miller
 
Валерий Прытков, декан факультета КСиС, БГУИР
Валерий Прытков, декан факультета КСиС, БГУИРВалерий Прытков, декан факультета КСиС, БГУИР
Валерий Прытков, декан факультета КСиС, БГУИР
 
Елена Локтева, «Инфопарк»
Елена Локтева, «Инфопарк»Елена Локтева, «Инфопарк»
Елена Локтева, «Инфопарк»
 
Татьяна Милова, директор института непрерывного образования БГУ
Татьяна Милова, директор института непрерывного образования БГУТатьяна Милова, директор института непрерывного образования БГУ
Татьяна Милова, директор института непрерывного образования БГУ
 
Trillhaas Goetz. Innovations in Google and Global Digital Trends
Trillhaas Goetz. Innovations in Google and Global Digital TrendsTrillhaas Goetz. Innovations in Google and Global Digital Trends
Trillhaas Goetz. Innovations in Google and Global Digital Trends
 
Александр Чекан. 28 правДИвых слайдов о белорусах в интернете
Александр Чекан. 28 правДИвых слайдов о белорусах в интернетеАлександр Чекан. 28 правДИвых слайдов о белорусах в интернете
Александр Чекан. 28 правДИвых слайдов о белорусах в интернете
 
Мастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в store
Мастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в storeМастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в store
Мастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в store
 
Бахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App Store
Бахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App StoreБахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App Store
Бахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App Store
 
Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...
Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...
Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...
 
Евгений Невгень. Оптимизация мета-данных приложения для App Store и Google Play
Евгений Невгень. Оптимизация мета-данных приложения для App Store и Google PlayЕвгений Невгень. Оптимизация мета-данных приложения для App Store и Google Play
Евгений Невгень. Оптимизация мета-данных приложения для App Store и Google Play
 
Евгений Козяк. Tips & Tricks мобильного прототипирования
Евгений Козяк. Tips & Tricks мобильного прототипированияЕвгений Козяк. Tips & Tricks мобильного прототипирования
Евгений Козяк. Tips & Tricks мобильного прототипирования
 
Егор Белый. Модели успешной монетизации мобильных приложений
Егор Белый. Модели успешной монетизации мобильных приложенийЕгор Белый. Модели успешной монетизации мобильных приложений
Егор Белый. Модели успешной монетизации мобильных приложений
 
Станислав Пацкевич. Инструменты аналитики для мобильных платформ
Станислав Пацкевич. Инструменты аналитики для мобильных платформСтанислав Пацкевич. Инструменты аналитики для мобильных платформ
Станислав Пацкевич. Инструменты аналитики для мобильных платформ
 
Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...
Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...
Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...
 
Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...
Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...
Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...
 
Юлия Ерина. Augmented Reality Games: становление и развитие
Юлия Ерина. Augmented Reality Games: становление и развитиеЮлия Ерина. Augmented Reality Games: становление и развитие
Юлия Ерина. Augmented Reality Games: становление и развитие
 
Александр Дзюба. Знать игрока: плейтест на стадии прототипа и позже
Александр Дзюба. Знать игрока: плейтест на стадии прототипа и позжеАлександр Дзюба. Знать игрока: плейтест на стадии прототипа и позже
Александр Дзюба. Знать игрока: плейтест на стадии прототипа и позже
 

Último

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 

Último (20)

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 

Thomas Jensen. Machine Learning

  • 1. The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia
  • 2. Who am I? • Senior Business Analyst @ Expedia • Working within the competitive intelligence unit • Responsible for : • Algorithm that score new hotels • Algorithm that predicts room nights sold on existing Expedia hotels • Scraping competitor sites • Other stuff….
  • 3. The Promise of Big Data Real time data Data driven decision More accurate and robust models Granularity
  • 4. Big Data Challenges Data Processing – not going to talk about this. Speed at which to use data – how fast should we update algorithms? How do we train algorithms on data sets that do not fit into memory?
  • 5. Big Data Challenges Taken from: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  • 6. Classification - Logistic Regression • One classic task in machine learning / statistics is to classify some objects/events/decisions correctly • Examples are: • Customer churn • Click behavior • Purchase behavior • …. • One of the most popular algorithms to carry out these tasks is logistic regression
  • 7. What is logistic regression? • Logistic regression attaches probabilities to individual outcomes, showing how likely they are to belong to one class or the other • Pr 𝑦 𝑥 = 1 1+𝑒−𝑥𝛽 • The challenge is to choose the optimal beta(s) • To do that we minimize a cost function
  • 8. Why Use Logistic Regression? • It is simple and well understood algorithm • Outputs probabilities • There are tried and tested models to estimate the parameters • It is flexible – can handle a number of different inputs, and feature transformations
  • 9. Usual Approaches • Batch training (offline approach) • Get all the data and train the algorithm in one go • Disadvantages when data is big • Requires all data to be loaded into memory • Periodic retraining is necessary • Very time consuming with big data!
  • 11. Examples of Logistic Regression in Industry Settings – Real Time Bidding • RTB • RTB algorithms are usually based on logistic regression • Whether or not to bid on a user is determined by the probability that the user will click on an add • Each day billions of bids are processed • Each bid has to be processed within 80 milliseconds
  • 12. Examples of Logistic Regression in Industry Settings – Fraud Detection Detecting Fraudulent Credit Card Transactions • The probability that a transaction is using a stolen credit card is typically estimated with logistic regression • Billions of transactions are analyzed each day
  • 13. How Slow is the Batch Version of Logistic Regression? One target variable and two feature vectors. All randomly generated.
  • 14. A Real World Problem
  • 15. A Real World Problem • Some stats on the training job in the pipeline: • Runs training jobs on a per country basis • Longest running job lasts ~9 hours • Shortest running job lasts ~3 hours • There are often convergence failures • What we need an algorithm that: • Can reduce training time • Is robust towards convergence failures
  • 16. A Big Data Friendly Approach Online Training • Pass each data point sequentially through the algorithm • Only requires one data point at a time in memory • Allows for on-the-fly training of the algorithm
  • 17. Online Learning • We want to learn a vector of weights • Initialize all weights. Begin loop: 1. Get training example 2. Make a prediction for the target variable 3. Learn the true value of the target 4. Update the weights and go to 1
  • 18. Online Learning • Initialise all weights. Begin loop: Repeat { For i = 1 to m { 𝜃𝑗 = 𝜃𝑗 − 𝛼 𝜕 𝜕𝜃 𝑗 𝑐𝑜𝑠𝑡(𝜃, (𝑥𝑖, 𝑦𝑖)) } } the partial derivative of the cost functions the cost function – given theta and row i, i.e. how wrong Are we? the step size – how fast we should climb the gradient
  • 19. Online Learning • Approaches the maximum of the function in a jumpy manner and never actually settles on the maximum.
  • 20. Batch vs. Online Learning Data Size: 4.8GB Rows: 500,000 Columns: 5000 0 20 40 60 80 100 120 Batch SGDClassifier Sofia-ml Training *Times include reading data and training algorithm
  • 21. Online Learning Vs. Batch Online Learning • When we have a continuous stream of data • When It is important to update the algorithm in real time – can hit a moving target • When training speed is important • Parameters are “jumpy” around the optimal values Batch • When it is very important to get the exact optimal values • When data can fit in memory • When training time is not of the essence
  • 22. Popular Online Learning Libraries • Sofia-ml (c/c++) • Requires data in svmLight format • Have implementations of SVM, Neural networks and logistic regression • Supports classification and ranking • Wovbal wabbit (c/c++) • Requires data in own wv format • Have implementations of the most popular loss functions • Supports classification, ranking and regression • Pandas + scikit-learn (python) • Pandas has a nice function for reading files in batches • Can handle sparse and non-sparse matrices • Scikit–learn has an SGD classifier that can fit the model in batches • Supports classification, ranking and regression