SlideShare a Scribd company logo
1 of 26
Enabling Cross-Screen
Advertising with

Machine Learning and Spark
Deb Ray

Chief Data Officer

Big Data Day LA 2016
How Media is Consumed
Consumers don’t differentiate between screens



Same Game of Thrones on 

Tablet, TV, Desktop, XBox, Roku, Apple TV
How Media is Sold
But Advertising is sold in Silos
Creates a Gap
2	Sides	of	the	Advertising	Market
Selling	and	Buying	of	Ad	Inventory.
Publishers
Exchanges
TV	Providers
DSP
Advertisers
Brands
Agencies
Trading	Desks
Websites
Mobile	apps
TV	Programs
OTT
DMP
Bridging the Gap
VideoAmp’s goal is to Enable Advertisers and Content Creators to
Transact Seamlessly Across All Media Types
• Frequency capping for target consumers.

• TV media extension to desktop / mobile campaigns.

• Competitive conquesting. —>
Consumer Graph
How Big is the Graph?
idfa

In-App
Phone
Uid 1

Safari
Phone
Uid 2

Firefox
Home
Uid 3

Chrome
Home
Uid 4

Firefox
Work
Location
Login
• 1.5B+ unique cookie IDs, Device IDs.

• 150M+ nodes.

• Behavioral data from each ID (several TBs / day).
Video	Ads	:	from	Request	to	Delivery
Figure 1
Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including
Figure 1
Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including
the URL where the ad slot is located, vertical of the web page content such as sports, and user cookie id.
Step 3: The ad exchange AdX composes a bid request and sends the bid requests to several DSPs. Let’s assume the
DSP iPinYou is one of them.
Step 4: When the iPinYou DSP server receives the bid request from the ad exchange AdX, it passes the information
Figure 1
Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including
the URL where the ad slot is located, vertical of the web page content such as sports, and user cookie id.
Figure 1
Step 2: The publisher Yahoo! passes the information to the ad excha
the URL where the ad slot is located, vertical of the web page conten
Step 3: The ad exchange AdX composes a bid request and sends the
Figure 1
Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including
the URL where the ad slot is located, vertical of the web page content such as sports, and user cookie id.
Step 3: The ad exchange AdX composes a bid request and sends the bid requests to several DSPs. Let’s assume the
DSP iPinYou is one of them.
Figure 1
Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including
the URL where the ad slot is located, vertical of the web page content such as sports, and user cookie id.
Figure 1
1.	User	Visits
2.	Calls	Ad	Exchange
3.	Bid	Request
4.	User	ID,	IP
5.	User	ID
6.	User	Data
7.	Bid	Price
8.	Bid	CPM,	Ad	Tag
9.	Auction	winner’s	Ad	Tag,
2nd price	CPM
10.	Calls	Winner’s
Ad	Tag.
11.	Serves	Ad
12.	Displays	Ad Ad	Server
Ad	Exchange
Bid	Listener
Decision	Engine
User	Data	Storage
20	ms to	calculate
Whole	process
Takes	~100	ms
The	Right	Tool	:	Apache	Spark
Apache	Spark	is	a	distributed	computing	framework
that	came	out	of	AMPLab at	UC	Berkeley.
Key	innovation	is	a	Resilient	Distributed	Dataset	(RDD):
Logical	collection	of	data	partitioned	across	machines.
Worker
tasks
results
RAM
Input Data
Worker
RAM
Input Data
Worker
RAM
Input Data
Driver
Figure 2: Spark runtime. The user’s driver program launches
multiple workers, which read data blocks from a distributed file
system and can persist computed RDD partitions in memory.
ule tasks based on data locality to improve performance.
Second, RDDs degrade gracefully when there is not
enough memory to store them, as long as they are only
being used in scan-based operations. Partitions that do
not fit in RAM can be stored on disk and will provide
similar performance to current data-parallel systems.
2.4 Applications Not Suitable for RDDs
As discussed in the Introduction, RDDs are best suited
for batch applications that apply the same operation to
all elements of a dataset. In these cases, RDDs can ef-
ficiently remember each transformation as one step in a
tions lik
Scala re
these ob
node to
saves an
the Java
var x =
of an RD
RDDs
paramet
RDD[In
example
Altho
conceptu
Scala’s
needed m
interpret
less, we
3.1 RD
Table 2
available
ation, sh
call that
new RD
a value t
API	in	Scala and	Python.
In	our	stack,	Spark	runs	on	Hadoop.
Data	stored	in	HDFS	/	Parquet.
In	some	apps,	involving	 iterative	calls,	Spark	is	upto 100X	faster	than	MapReduce.
Distributed	File	System	(e.g.	HDFS)
Spark: Graph Frames
GraphFrames is a graph processing library (similar to GraphX)



- Scala, Python, Java APIs.



- Query on graphs (like SparkSQL):
> g.vertices.filter(“age” > 25)
> g.inDegrees.filter(“inDegree” > 2)



- Supports all algorithms in GraphX, and also:

Breadth first search (BFS) - shortest path between 2 vertices.
(Strongly) connected components
Label Propagation algorithm
VideoAmp Flint
We open-sourced Flint: creating push-button Spark clusters

for Machine Learning and Data Science in the cloud.



Designed for rapid deployment while providing native access to

data in a pre-existing HDFS / Hive cluster.



- Flint: a Spark Cluster Launcher (on AWS)



- Self-contained Spark Docker images.



- Jupyter Docker image preloaded with Python, R, Scala kernels.



Users can expand or contract the cluster on the fly.
DEMO
Data from Devices
Data from TVs (ACR) Mobile Devices Desktop
TV ID generates:

TV program viewership



10M Smart TVs / STBs

Data in 15 min chunks
Device ID generates:

Sites, Video content, 

Segments.
50K QPS over

300M Device IDs
Cookie ID generates:

Sites, Video content, 

Segments



100K QPS over

1B cookie IDs
Sparse Representation
For each class of consumption data, create Dictionary with enumeration

of all content (e.g. TMS ID), or types.
e.g. demographic segments:
Income = [ <30K, 30K to 60K, 60K to 90K, 90K to 120K, 120K+ ]
e.g. TV programs watched
TV_Programs = [“Walking Dead”, “Game of Thrones”,…,”Silicon Valley”]
Then the user data is sparse:
Income (User ABC123) = [0,0,0,1,0]
TV_Programs (User ABC123) = [0,1,…,1]
Graph Construction
Connected Components
Subgraphs in the graph s.t. 

there is a path between any 

two vertices.
Start with a node s, and do BFS. This gives a 

component of the graph. 



At each stage, Pick an unexplored node n, and

do BFS. This finds another component.
Clustering
Example with only Location (Lat / Long attributes)



We utilize Location, IP address, Types (segments),
Behaviors (websites visited, TV program viewed)



Clustering in a very high dimensional space with

Sparse vectors.
Graph Inference
Find all Users similar to User A.



Fill in Missing Attributes. What is User B’s income level?
Which users will like Brain Dead (new show)?
Validation
Ground Truth from Login Data



e.g. Login to LinkedIn from Mobile, Tablet, Desktop
at Work, Laptop at Home.
Validation data is used for hold-out cross-validation,

to learn the parameters e.g. edge distance threshold,
for Machine Learning.
Precision / Recall
High Precision -> Devices assigned to a consumer, 

belong to the consumer.



High Recall -> All devices belonging to the consumer

are correctly assigned.
TV Viewership Classification
Data from TVs (ACR)
TV ID generates: TV program viewership
Dictionary is enumeration of ~10M Users
Sparse vector of Video Content (0 / 1 if they saw it)
Learning embedding: (TV programs, Users) —> Lookalike Programs.
How do we learn embeddings? 



Learn an underlying manifold -> 

Like word2vec where document is a set of users viewing the content.
Visualizing Embeddings
https://www.youtube.com/watch?v=RJVL80Gg3lA
Visualizing Data Using t-SNE by van der Maaten
t-distributed Stochastic Neighbor Embedding (t-SNE)
a) IsoMap



b) Locally Linear Embedding
Implementations in R, Python:
R package “tsne”
Visualizing Title Embeddings
Visualizing Title Embeddings
Questions?
Bandit	Optimization
Metrics	to	Optimize:	Viewability,	Conversions.
Continue	with	same	campaign	parameters	that	have	worked	well,
OR	explore	new	parameter	combinations?
How	to	solve	the	Exploration-Exploitation	Problem?	Multi-Armed	 Bandits.
Parameters	coded	in	our	
Bidders	(Actor-model	 in	Scala).
Run	Simultaneously	 and	
determine	prob of	reward.
Bandit	Optimization

More Related Content

What's hot

IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected BreweryJason Hubbard
 
AI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceAI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceDatabricks
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Streamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Databricks
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Jeff Hung
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Introduction to Large Scale Data Analysis with WSO2 Analytics PlatformIntroduction to Large Scale Data Analysis with WSO2 Analytics Platform
Introduction to Large Scale Data Analysis with WSO2 Analytics PlatformSrinath Perera
 
Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Streamlio
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganSpark Summit
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit
 
Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Stratio
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics Franco Ucci
 
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisBig Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisMatt Stubbs
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkChester Chen
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
 

What's hot (20)

IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
 
AI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceAI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer Experience
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Streamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache Pulsar
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Introduction to Large Scale Data Analysis with WSO2 Analytics PlatformIntroduction to Large Scale Data Analysis with WSO2 Analytics Platform
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
 
Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018
 
Active Learning for Fraud Prevention
Active Learning for Fraud PreventionActive Learning for Fraud Prevention
Active Learning for Fraud Prevention
 
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
 
Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
 
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisBig Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with Redis
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 

Viewers also liked

Why Culture Matters (UCLA Anderson)
Why Culture Matters (UCLA Anderson)Why Culture Matters (UCLA Anderson)
Why Culture Matters (UCLA Anderson)videoamp
 
Fiksu presentation at GDC: The Vicious Battleground, The Challenge of Gamer A...
Fiksu presentation at GDC: The Vicious Battleground, The Challenge of Gamer A...Fiksu presentation at GDC: The Vicious Battleground, The Challenge of Gamer A...
Fiksu presentation at GDC: The Vicious Battleground, The Challenge of Gamer A...Fiksu
 
Uniting TV and Video
Uniting TV and VideoUniting TV and Video
Uniting TV and VideoMediaPost
 
Tvis video amp
Tvis video ampTvis video amp
Tvis video ampMediaPost
 
Casual connect sf social app marketing a look into facebook and twitter july ...
Casual connect sf social app marketing a look into facebook and twitter july ...Casual connect sf social app marketing a look into facebook and twitter july ...
Casual connect sf social app marketing a look into facebook and twitter july ...Fiksu
 
ViralGains Demo Day Presentation
ViralGains Demo Day PresentationViralGains Demo Day Presentation
ViralGains Demo Day Presentationbatch7
 

Viewers also liked (6)

Why Culture Matters (UCLA Anderson)
Why Culture Matters (UCLA Anderson)Why Culture Matters (UCLA Anderson)
Why Culture Matters (UCLA Anderson)
 
Fiksu presentation at GDC: The Vicious Battleground, The Challenge of Gamer A...
Fiksu presentation at GDC: The Vicious Battleground, The Challenge of Gamer A...Fiksu presentation at GDC: The Vicious Battleground, The Challenge of Gamer A...
Fiksu presentation at GDC: The Vicious Battleground, The Challenge of Gamer A...
 
Uniting TV and Video
Uniting TV and VideoUniting TV and Video
Uniting TV and Video
 
Tvis video amp
Tvis video ampTvis video amp
Tvis video amp
 
Casual connect sf social app marketing a look into facebook and twitter july ...
Casual connect sf social app marketing a look into facebook and twitter july ...Casual connect sf social app marketing a look into facebook and twitter july ...
Casual connect sf social app marketing a look into facebook and twitter july ...
 
ViralGains Demo Day Presentation
ViralGains Demo Day PresentationViralGains Demo Day Presentation
ViralGains Demo Day Presentation
 

Similar to Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising with Machine Learning and Spark - Debajyoti (Deb) Ray, CDO - VideoAmp

Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsSriskandarajah Suhothayan
 
REX Hadoop et R
REX Hadoop et RREX Hadoop et R
REX Hadoop et Rpkernevez
 
AWS Webcast - Build Mobile Apps with a Secure, Scalable Back End on DynamoDB
AWS Webcast - Build Mobile Apps with a Secure, Scalable Back End on DynamoDBAWS Webcast - Build Mobile Apps with a Secure, Scalable Back End on DynamoDB
AWS Webcast - Build Mobile Apps with a Secure, Scalable Back End on DynamoDBAmazon Web Services
 
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...Ted Chien
 
Headfitted Solutions Presentation
Headfitted Solutions PresentationHeadfitted Solutions Presentation
Headfitted Solutions PresentationSneha Patil
 
Keynote: Trends in Modern Application Development - Gilly Dekel, IBM
Keynote: Trends in Modern Application Development - Gilly Dekel, IBMKeynote: Trends in Modern Application Development - Gilly Dekel, IBM
Keynote: Trends in Modern Application Development - Gilly Dekel, IBMCodemotion Tel Aviv
 
WSO2Con USA 2017: Discover Data That Matters: Deep Dive into WSO2 Analytics
WSO2Con USA 2017: Discover Data That Matters: Deep Dive into WSO2 AnalyticsWSO2Con USA 2017: Discover Data That Matters: Deep Dive into WSO2 Analytics
WSO2Con USA 2017: Discover Data That Matters: Deep Dive into WSO2 AnalyticsWSO2
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaNithin Kakkireni
 
Twitter_Sentiment_analysis.pptx
Twitter_Sentiment_analysis.pptxTwitter_Sentiment_analysis.pptx
Twitter_Sentiment_analysis.pptxJOELFRANKLIN13
 
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Databricks
 
What Drives the Car Business: Moving from Anecdotes to Data
What Drives the Car Business: Moving from Anecdotes to DataWhat Drives the Car Business: Moving from Anecdotes to Data
What Drives the Car Business: Moving from Anecdotes to DataDataWorks Summit
 
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...Ray Février
 
Ugif 10 2012 lycia2 introduction in 45 minutes
Ugif 10 2012 lycia2 introduction in 45 minutesUgif 10 2012 lycia2 introduction in 45 minutes
Ugif 10 2012 lycia2 introduction in 45 minutesUGIF
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image ProcessingIJERA Editor
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image ProcessingIJERA Editor
 
Notes how to work with variables, constants and do calculations
Notes how to work with variables, constants and do calculationsNotes how to work with variables, constants and do calculations
Notes how to work with variables, constants and do calculationsWilliam Olivier
 

Similar to Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising with Machine Learning and Spark - Debajyoti (Deb) Ray, CDO - VideoAmp (20)

Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 Analytics
 
REX Hadoop et R
REX Hadoop et RREX Hadoop et R
REX Hadoop et R
 
AWS Webcast - Build Mobile Apps with a Secure, Scalable Back End on DynamoDB
AWS Webcast - Build Mobile Apps with a Secure, Scalable Back End on DynamoDBAWS Webcast - Build Mobile Apps with a Secure, Scalable Back End on DynamoDB
AWS Webcast - Build Mobile Apps with a Secure, Scalable Back End on DynamoDB
 
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
 
Headfitted Solutions Presentation
Headfitted Solutions PresentationHeadfitted Solutions Presentation
Headfitted Solutions Presentation
 
Keynote: Trends in Modern Application Development - Gilly Dekel, IBM
Keynote: Trends in Modern Application Development - Gilly Dekel, IBMKeynote: Trends in Modern Application Development - Gilly Dekel, IBM
Keynote: Trends in Modern Application Development - Gilly Dekel, IBM
 
WSO2Con USA 2017: Discover Data That Matters: Deep Dive into WSO2 Analytics
WSO2Con USA 2017: Discover Data That Matters: Deep Dive into WSO2 AnalyticsWSO2Con USA 2017: Discover Data That Matters: Deep Dive into WSO2 Analytics
WSO2Con USA 2017: Discover Data That Matters: Deep Dive into WSO2 Analytics
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
 
Twitter_Sentiment_analysis.pptx
Twitter_Sentiment_analysis.pptxTwitter_Sentiment_analysis.pptx
Twitter_Sentiment_analysis.pptx
 
PRELIM-Lesson-2.pdf
PRELIM-Lesson-2.pdfPRELIM-Lesson-2.pdf
PRELIM-Lesson-2.pdf
 
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
 
What Drives the Car Business: Moving from Anecdotes to Data
What Drives the Car Business: Moving from Anecdotes to DataWhat Drives the Car Business: Moving from Anecdotes to Data
What Drives the Car Business: Moving from Anecdotes to Data
 
SaurabhKasyap
SaurabhKasyapSaurabhKasyap
SaurabhKasyap
 
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
 
Ugif 10 2012 lycia2 introduction in 45 minutes
Ugif 10 2012 lycia2 introduction in 45 minutesUgif 10 2012 lycia2 introduction in 45 minutes
Ugif 10 2012 lycia2 introduction in 45 minutes
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image Processing
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image Processing
 
Notes how to work with variables, constants and do calculations
Notes how to work with variables, constants and do calculationsNotes how to work with variables, constants and do calculations
Notes how to work with variables, constants and do calculations
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising with Machine Learning and Spark - Debajyoti (Deb) Ray, CDO - VideoAmp

  • 1. Enabling Cross-Screen Advertising with
 Machine Learning and Spark Deb Ray
 Chief Data Officer
 Big Data Day LA 2016
  • 2. How Media is Consumed Consumers don’t differentiate between screens
 
 Same Game of Thrones on 
 Tablet, TV, Desktop, XBox, Roku, Apple TV
  • 3. How Media is Sold But Advertising is sold in Silos
  • 5. Bridging the Gap VideoAmp’s goal is to Enable Advertisers and Content Creators to Transact Seamlessly Across All Media Types • Frequency capping for target consumers.
 • TV media extension to desktop / mobile campaigns.
 • Competitive conquesting. —>
  • 6. Consumer Graph How Big is the Graph? idfa
 In-App Phone Uid 1
 Safari Phone Uid 2
 Firefox Home Uid 3
 Chrome Home Uid 4
 Firefox Work Location Login • 1.5B+ unique cookie IDs, Device IDs.
 • 150M+ nodes.
 • Behavioral data from each ID (several TBs / day).
  • 7. Video Ads : from Request to Delivery Figure 1 Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including Figure 1 Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including the URL where the ad slot is located, vertical of the web page content such as sports, and user cookie id. Step 3: The ad exchange AdX composes a bid request and sends the bid requests to several DSPs. Let’s assume the DSP iPinYou is one of them. Step 4: When the iPinYou DSP server receives the bid request from the ad exchange AdX, it passes the information Figure 1 Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including the URL where the ad slot is located, vertical of the web page content such as sports, and user cookie id. Figure 1 Step 2: The publisher Yahoo! passes the information to the ad excha the URL where the ad slot is located, vertical of the web page conten Step 3: The ad exchange AdX composes a bid request and sends the Figure 1 Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including the URL where the ad slot is located, vertical of the web page content such as sports, and user cookie id. Step 3: The ad exchange AdX composes a bid request and sends the bid requests to several DSPs. Let’s assume the DSP iPinYou is one of them. Figure 1 Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including the URL where the ad slot is located, vertical of the web page content such as sports, and user cookie id. Figure 1 1. User Visits 2. Calls Ad Exchange 3. Bid Request 4. User ID, IP 5. User ID 6. User Data 7. Bid Price 8. Bid CPM, Ad Tag 9. Auction winner’s Ad Tag, 2nd price CPM 10. Calls Winner’s Ad Tag. 11. Serves Ad 12. Displays Ad Ad Server Ad Exchange Bid Listener Decision Engine User Data Storage 20 ms to calculate Whole process Takes ~100 ms
  • 8. The Right Tool : Apache Spark Apache Spark is a distributed computing framework that came out of AMPLab at UC Berkeley. Key innovation is a Resilient Distributed Dataset (RDD): Logical collection of data partitioned across machines. Worker tasks results RAM Input Data Worker RAM Input Data Worker RAM Input Data Driver Figure 2: Spark runtime. The user’s driver program launches multiple workers, which read data blocks from a distributed file system and can persist computed RDD partitions in memory. ule tasks based on data locality to improve performance. Second, RDDs degrade gracefully when there is not enough memory to store them, as long as they are only being used in scan-based operations. Partitions that do not fit in RAM can be stored on disk and will provide similar performance to current data-parallel systems. 2.4 Applications Not Suitable for RDDs As discussed in the Introduction, RDDs are best suited for batch applications that apply the same operation to all elements of a dataset. In these cases, RDDs can ef- ficiently remember each transformation as one step in a tions lik Scala re these ob node to saves an the Java var x = of an RD RDDs paramet RDD[In example Altho conceptu Scala’s needed m interpret less, we 3.1 RD Table 2 available ation, sh call that new RD a value t API in Scala and Python. In our stack, Spark runs on Hadoop. Data stored in HDFS / Parquet. In some apps, involving iterative calls, Spark is upto 100X faster than MapReduce. Distributed File System (e.g. HDFS)
  • 9. Spark: Graph Frames GraphFrames is a graph processing library (similar to GraphX)
 
 - Scala, Python, Java APIs.
 
 - Query on graphs (like SparkSQL): > g.vertices.filter(“age” > 25) > g.inDegrees.filter(“inDegree” > 2)
 
 - Supports all algorithms in GraphX, and also:
 Breadth first search (BFS) - shortest path between 2 vertices. (Strongly) connected components Label Propagation algorithm
  • 10. VideoAmp Flint We open-sourced Flint: creating push-button Spark clusters
 for Machine Learning and Data Science in the cloud.
 
 Designed for rapid deployment while providing native access to
 data in a pre-existing HDFS / Hive cluster.
 
 - Flint: a Spark Cluster Launcher (on AWS)
 
 - Self-contained Spark Docker images.
 
 - Jupyter Docker image preloaded with Python, R, Scala kernels.
 
 Users can expand or contract the cluster on the fly.
  • 11. DEMO
  • 12. Data from Devices Data from TVs (ACR) Mobile Devices Desktop TV ID generates:
 TV program viewership
 
 10M Smart TVs / STBs
 Data in 15 min chunks Device ID generates:
 Sites, Video content, 
 Segments. 50K QPS over
 300M Device IDs Cookie ID generates:
 Sites, Video content, 
 Segments
 
 100K QPS over
 1B cookie IDs
  • 13. Sparse Representation For each class of consumption data, create Dictionary with enumeration
 of all content (e.g. TMS ID), or types. e.g. demographic segments: Income = [ <30K, 30K to 60K, 60K to 90K, 90K to 120K, 120K+ ] e.g. TV programs watched TV_Programs = [“Walking Dead”, “Game of Thrones”,…,”Silicon Valley”] Then the user data is sparse: Income (User ABC123) = [0,0,0,1,0] TV_Programs (User ABC123) = [0,1,…,1]
  • 15. Connected Components Subgraphs in the graph s.t. 
 there is a path between any 
 two vertices. Start with a node s, and do BFS. This gives a 
 component of the graph. 
 
 At each stage, Pick an unexplored node n, and
 do BFS. This finds another component.
  • 16. Clustering Example with only Location (Lat / Long attributes)
 
 We utilize Location, IP address, Types (segments), Behaviors (websites visited, TV program viewed)
 
 Clustering in a very high dimensional space with
 Sparse vectors.
  • 17. Graph Inference Find all Users similar to User A.
 
 Fill in Missing Attributes. What is User B’s income level? Which users will like Brain Dead (new show)?
  • 18. Validation Ground Truth from Login Data
 
 e.g. Login to LinkedIn from Mobile, Tablet, Desktop at Work, Laptop at Home. Validation data is used for hold-out cross-validation,
 to learn the parameters e.g. edge distance threshold, for Machine Learning.
  • 19. Precision / Recall High Precision -> Devices assigned to a consumer, 
 belong to the consumer.
 
 High Recall -> All devices belonging to the consumer
 are correctly assigned.
  • 20. TV Viewership Classification Data from TVs (ACR) TV ID generates: TV program viewership Dictionary is enumeration of ~10M Users Sparse vector of Video Content (0 / 1 if they saw it) Learning embedding: (TV programs, Users) —> Lookalike Programs. How do we learn embeddings? 
 
 Learn an underlying manifold -> 
 Like word2vec where document is a set of users viewing the content.
  • 21. Visualizing Embeddings https://www.youtube.com/watch?v=RJVL80Gg3lA Visualizing Data Using t-SNE by van der Maaten t-distributed Stochastic Neighbor Embedding (t-SNE) a) IsoMap
 
 b) Locally Linear Embedding Implementations in R, Python: R package “tsne”