SlideShare a Scribd company logo
1 of 71
Wednesday, June 26, 13
Hadoop in Love
@petricek
Wednesday, June 26, 13
The eHarmony Difference › Who we are
~45% Tech
Wednesday, June 26, 13
The eHarmony Difference › Who we are
~15% Customer Care
~45% Tech
Wednesday, June 26, 13
The eHarmony Difference › Who we are
~15% Customer Care
~45% Tech
~10% Marketing
Wednesday, June 26, 13
The eHarmony Difference › Compatibility Matching System®
Wednesday, June 26, 13
The eHarmony Difference › Compatibility Matching System®
Compatibility Matching
System®
Wednesday, June 26, 13
The eHarmony Difference › Compatibility Matching System®
Compatibility Matching
System®
Compatibility
Matching
1
Wednesday, June 26, 13
The eHarmony Difference › Compatibility Matching System®
Compatibility Matching
System®
Compatibility
Matching
1
Affinity
Matching
2
Wednesday, June 26, 13
The eHarmony Difference › Compatibility Matching System®
Compatibility Matching
System®
Match
Distribution
3
Compatibility
Matching
1
Affinity
Matching
2
Wednesday, June 26, 13
The eHarmony Difference
Wednesday, June 26, 13
Affinity
Matching
Match
Distribution
2 3
The eHarmony Difference › Compatibility Matching System®
Compatibility
Matching
1
Wednesday, June 26, 13
Affinity
Matching
Match
Distribution
2 3
The eHarmony Difference › Compatibility Matching System®
Compatibility
Matching
1
Wednesday, June 26, 13
Wednesday, June 26, 13
150	
  
ques)ons
Wednesday, June 26, 13
150	
  
ques)ons
Personality
Values
A5ributes
Beliefs
Wednesday, June 26, 13
Compatibility Matching › Obstreperousness
Wednesday, June 26, 13
Compatibility Matching › Romantic
Wednesday, June 26, 13
Marital satisfaction
Wednesday, June 26, 13
Marital satisfaction
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Wednesday, June 26, 13
Compatibility Matching ›
Wednesday, June 26, 13
Compatibility Matching ›
Wednesday, June 26, 13
Match
Distribution
3
Compatibility
Matching
1
Affinity
Matching
2
The eHarmony Difference › Compatibility Matching System®
Wednesday, June 26, 13
Match
Distribution
3
Compatibility
Matching
1
Affinity
Matching
2
The eHarmony Difference › Compatibility Matching System®
Layers on Top of
Compatibility Matching
Wednesday, June 26, 13
Affinity Matching ›
Wednesday, June 26, 13
61 21
Affinity Matching ›
Wednesday, June 26, 13
61 21
3000
Affinity Matching ›
Wednesday, June 26, 13
61 21
3000
Affinity Matching ›
Wednesday, June 26, 13
Affinity Matching ›
Wednesday, June 26, 13
………
Affinity Matching ›
Wednesday, June 26, 13
Affinity Matching › Distance
Prob(	
  	
  	
  	
  	
  	
  	
  )
Wednesday, June 26, 13
Affinity Matching › Distance
Wednesday, June 26, 13
Affinity Matching › Height difference
Prob(	
  	
  	
  	
  	
  	
  	
  ) 4	
  -­‐	
  8	
  in
cm
Wednesday, June 26, 13
Affinity Matching › Zoom level
Wednesday, June 26, 13
Affinity Matching › Zoom level
Wednesday, June 26, 13
Affinity Matching › Zoom level
Wednesday, June 26, 13
life
Affinity Matching › Semi-structured Text
life
my	
  smile
my	
  smile
world
world
my me
my
me
I I
I
I
Wednesday, June 26, 13
life
Affinity Matching › Semi-structured Text
life
my	
  smile
my	
  smile
world
world
Wednesday, June 26, 13
life
Affinity Matching › Semi-structured Text
life
my	
  smile
my	
  smile
world
world
Wednesday, June 26, 13
Affinity Matching ›
~40M	
  registered	
  users
~10^7	
  matches	
  per	
  day
~10^3	
  a5ributes
...
...
Prob( | data)
?
~10^8	
  daily
Prob( | features)
Wednesday, June 26, 13
UserMatchCommunica)on
feature	
  expansion
Sparse	
  
ML	
  format
models
Affinity Matching › Model Training: Maestro
Protocol	
  Buffers
vowpal	
  wabbit,
boosted	
  trees
Wednesday, June 26, 13
750M	
  Compressed
Protocol	
  Buffers
Map-­‐side	
  joins
(~TB)
Matching	
  User	
  Serice
Pairings	
  Browser	
  
Service
1+G	
  Compressed	
  Protocol	
  Buffers	
  
Affinity Matching › Production: Conductor
Wednesday, June 26, 13
...
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
Affinity Matching › Scorer
Wednesday, June 26, 13
...
Prob( | data)
Prob( | data)
Prob( | data)
Prob( | data)
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
Affinity Matching › Scorer
Wednesday, June 26, 13
...
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
...
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
Prob( | data)
Prob( | data)
Prob( | data)
Prob( | data)
Prob( | data)
Prob( | data)
Prob( | data)
Prob( | data)
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
[User[Demographic][Photo][Ac)vity][FX]]
[Cand[Demographic][Photo]	
  [Ac)vity][FX]]
[Pairing[Distance][Flags]]
Affinity Matching › Scorer
Wednesday, June 26, 13
“same_religion”:”${user.profile.religion}=={cand.profile.religion}”
“cmp_drinking”:”cmp(${user.profile.drinking},{cand.profile.drinking})”
<
“strict_distance_u”:”${user.profile.accepted_distance}<={pairing.distance}”
60miles
Affinity Matching › Scala DSL
Wednesday, June 26, 13
Compatibility
Matching
1
Affinity
Matching
2
Match
Distribution
3
The eHarmony Difference › Compatibility Matching System®
Wednesday, June 26, 13
Compatibility
Matching
1
Affinity
Matching
2
Match
Distribution
3
The eHarmony Difference › Compatibility Matching System®
Delivering the right
matches at the right time
to as many people as
possible across the
entire network.
Wednesday, June 26, 13
Match Distribution › Graph optimization
Wednesday, June 26, 13
Match Distribution › Graph optimization
Wednesday, June 26, 13
Match Distribution › Graph optimization
2 2
Wednesday, June 26, 13
Match Distribution › Graph optimization
2 21
Wednesday, June 26, 13
Match Distribution › Graph optimization
2 21Prob( | data)
Wednesday, June 26, 13
Match Distribution › Graph optimization
2 21Prob( | data)
Wednesday, June 26, 13
Match Distribution › Graph optimization
2 2Prob( | data)
Wednesday, June 26, 13
Match Distribution › Graph optimization
2 2Prob( | data)
Wednesday, June 26, 13
Resulting Customer Experience ›
Guided
Communication
Wednesday, June 26, 13
Resulting Customer Experience ›
Guided
Communication
Wednesday, June 26, 13
? !
Resulting Customer Experience ›
Guided
Communication
Wednesday, June 26, 13
Resulting Customer Experience › Success!
Wednesday, June 26, 13
Resulting Customer Experience › Success!
Wednesday, June 26, 13
eHarmony Results › The eHarmony Impact
2005
90
eHarmony Members
Married Every Day
Wednesday, June 26, 13
eHarmony Results › The eHarmony Impact
2005 2007
236
eHarmony Members
Married Every Day
Wednesday, June 26, 13
eHarmony Results › The eHarmony Impact
2005 2007 2009
542
eHarmony Members
Married Every Day
Wednesday, June 26, 13
Proceedings of National Academy of Sciences
Wednesday, June 26, 13
Press coverage
Wednesday, June 26, 13
Since	
  2005,	
  about	
  1/3	
  of	
  couples	
  
who	
  have	
  married	
  in	
  the	
  US	
  
have	
  met	
  online	
  (35%)
eHarmony Results › The eHarmony Impact
*	
  according	
  to	
  survey	
  of	
  couples	
  married	
  between	
  2005-­‐2012	
  by	
  Harris	
  InteracAve	
  for	
  eHarmony
Wednesday, June 26, 13
Rates of breakup or divorce
0%
2.0%
4.0%
6.0%
8.0%
All Online Offline
*	
  according	
  to	
  survey	
  of	
  couples	
  married	
  between	
  2005-­‐2012	
  by	
  Harris	
  InteracAve	
  for	
  eHarmony
Wednesday, June 26, 13
The	
  largest	
  number	
  
of	
  marriages	
  surveyed	
  
who	
  met	
  via	
  online	
  da)ng	
  
had	
  met	
  on	
  eHarmony	
  (25%)
eHarmony Results › The eHarmony Impact
*	
  according	
  to	
  survey	
  of	
  couples	
  married	
  between	
  2005-­‐2012	
  by	
  Harris	
  InteracAve	
  for	
  eHarmony
Wednesday, June 26, 13
Rates of breakup or divorce
0%
2.0%
4.0%
6.0%
8.0%
eHarmony All Other Online Offline
*	
  according	
  to	
  survey	
  of	
  couples	
  married	
  between	
  2005-­‐2012	
  by	
  by	
  Harris	
  InteracAve	
  for	
  eHarmony
bit.ly/jobateharmony
Wednesday, June 26, 13
Rates of breakup or divorce
0%
2.0%
4.0%
6.0%
8.0%
eHarmony All Other Online Offline
*	
  according	
  to	
  survey	
  of	
  couples	
  married	
  between	
  2005-­‐2012	
  by	
  by	
  Harris	
  InteracAve	
  for	
  eHarmony
linkedin.com/in/petricek
bit.ly/jobateharmony
@petricek
Wednesday, June 26, 13

More Related Content

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart Cities
DataWorks Summit
 

More from DataWorks Summit (20)

HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart Cities
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Hadoop in Love