SlideShare a Scribd company logo
1 of 42
Download to read offline
Customer Behaviour Analytics:
Billions of Events to
one Customer-Product Graph
Strata London, 12th November 2013
Presented by Paul Lam
About Paul Lam
Joined uSwitch.com as first Data Scientist in 2010
• developed internal data products
• built distributed data architecture
• team of 3 with a developer and a statistician
Code contributor to various open source tools
• Cascalog, a big data processing library built on top of
Cascading (comparable to Apache Pig)
• Incanter, a statistical computing platform in Clojure
Author of Web Usage Mining: Data Mining Visitor Patterns
From Web Server Logs* to be published in late 2014
* tentative title
What is it
Customer-Product Graph
{:name “Bob”}
{:product “iPhone”}
{:name “Tom”}

{:name “Emily”}
{:product “American Express”}
{:name “Lily”}

[:BUY {:timestamp 2013-11-01 13:00:00}]
bought
viewed
Question: Who bought an iPhone?
{:name “Bob”}
{:product “iPhone”}
{:name “Tom”}

{:name “Emily”}
{:product “American Express”}
{:name “Lily”}
bought
viewed
Query: Who bought an iPhone?
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

{:name “Emily”}
START

x=node:node_auto_index(product='iPhone')Express”}
{:product “American
{:name “Lily”}
MATCH (person)-[:BUY]->(x)

RETURN person
Question: What else did they buy?
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

{:name “Emily”}

Y
{:product “American Express”}

{:name “Lily”}
bought
viewed
Query: What else did they buy?
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

START x=node:node_auto_index(product='iPhone')
{:name
MATCH “Emily”}

{:product “American Express”}

(x)<-[:BUY]-(person)-[:BUY]->(y)
{:name “Lily”}

RETURN y
Hypothesis: People that buy X has interest in Y
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

{:name “Emily”}

Y
{:product “American Express”}

{:name “Lily”}
bought
viewed
Query: Who to recommend Y
{:name “Bob”}

X

{:product “iPhone”}
START
{:name “Tom”}
x=node:node_auto_index(product='iPhone'),
y=node:node_auto_index(product='American
Express')
{:name “Emily”}
Y Looked
{:product “American Express”}
MATCH (p)-[:BUY]->(x),
at AE

(p)-[:VIEW]->(y)
WHERE NOT (p)-[:BUY]->(y)
{:name “Lily”}

RETURN p

Haven’t
bought AE
Product Recommendation by Reasoning Example

Interactive demo at http://bit.ly/customer_graph

1. Start with an idea
2. Trace to connected nodes
3. Identify patterns from viewpoint of those nodes
4. Repeat from #1 until discovering actionable item
5. Apply pattern
Challenge: Event Data to Graph Data

User ID

Product ID Action

Bob

iPhone

Bought

Tom

iPhone

Bought

Emily

iPhone

Bought

Bob

AE

Bought

Emily

AE

Viewed

Lily

AE

Bought
Why should you care
Customer Journey

14
Purchase Funnel

Interest

Desire

Action
15
Understanding Customer Intents

Interest

Desire

Action

16
Customer Experience as a Graph

17
http://insideintercom.io/criticism-and-two-way-streets/
18
A Feedback System to Drive Profit

19
Minimise effort between Q & A

Question
Answer

Data Interrogation
One Approach: Make data querying easier

Query = Function(Data) [1]
~ Function(Data Structure)

[1] Figure 1.3 from Big Data (preview v11) by Nathan Marz and James Warren
Data Structure: Relations versus Relations
a Edges
ak
Sale ID

User ID

Product ID Profit

1

1

1

£100

2

1

2

£50

{:profit £100}
{:profit £50}

User ID

Name

Product ID Name

1

Bob

1

iPhone

2

Emily

2

A.E.
Using the right database for the right task

RDBMS

Graph DB

Data

Attributes

Entities and relations

Model

Record-based

Associative

Relation

By-product of
normalisation

First class citizen

Example use

Reporting

Reasoning
How does it work
User actions as time-stamped records

time

HDFS

Paul Ingles, “User as Data”, Euroclojure 2012
Our User Event to Graph Data Pipeline

time

HDFS

Reshape

Graph Database
From Input to Output with Hadoop, aka ETL step
1. interface
3. output data

2. input data

4... The 80%
HDFS

Reshape

Graph Database
Hadoop interface to Neo4J
•
•
•
•

Cascading-Neo4j tap [1]
Faunus Hadoop binaries [2]
CSV files*
etc.
[1] http://github.com/pingles/cascading.neo4j
[2] http://thinkaurelius.github.io/faunus/

HDFS

Reshape

Graph Database
Input data stored on HDFS

time

1

2

3

4

User

1
2

Timestamp

Viewed Page

Referrer

Paul

2013-11-01 13:00

/homepage/

google.com

Paul

2013-11-01 13:01

/blog/

/homepage/

User

4

Viewed Product

Price

Referrer

Paul

2013-11-01 13:04

iPhone

£500

/blog/

User

3

Timestamp
Timestamp

Purchased

Paid

Attrib.

Paul

2013-11-01 13:05

iPhone

£500

google.com

User

Landed

Referral

Email

Paul

2013-11-01 13:00

google.com

paul.lam@uswitch.com
Nodes and Edges CSVs to go into a property graph
Node ID

Properties

1

{:name “Paul”, email: “paul.lam@uswitch.com”}

2

{:domain “google.com”}

3

{:page “/homepage/”}

...

...

5

{:product “iPhone”}

From To

Type

Properties

1

2

:SOURCE

{:timestamp “2013-11-01 13:00”}

1

3

:VIEWED

{:timestamp “2013-11-01 13:00”}

...

...

...

...

1

5

:BOUGHT

{:timestamp “2013-11-01 13:05”}
Records to Graph in 3 Steps

^ impo
rtable C

SV

1. Design graph
2. Extract Nodes
3. Build Relations

31
Step 1: Designing your graph
{:name “Paul”
:email “paul.lam@uswitch.com}
:viewed {:timestamp ... }
{:page “homepage”}

:bought {:timestamp ... }
:viewed { ... }
{:product “iPhone”}

:viewed { ... }
{:page “blog”}

:source {:timestamp ... }

{:domain “google.com”}
32
Step 2: Extract list of entity nodes
User

Timestamp

Viewed Page

Referrer

Paul

2013-11-01 13:00

/homepage/

google.com

Paul

2013-11-01 13:01

/blog/

/homepage/

User

Timestamp

Viewed Product

Price

Referrer

Paul

2013-11-01 13:04

iPhone

£500

/blog/

User

Timestamp

Purchased

Paid

Attrib.

Paul

2013-11-01 13:05

iPhone

£500

google.com

User

Landed

Referral

Email

Paul

2013-11-01 13:00

google.com

paul.lam@uswitch.com
Step 3: Building node-to-node relations
User

Timestamp

Viewed Page

Referrer

Paul

2013-11-01 13:00

/homepage/

google.com

Paul

2013-11-01 13:01

/blog/

/homepage/

User

Timestamp

Viewed Product

Price

Referrer

Paul

2013-11-01 13:04

iPhone

£500

/blog/

User

Timestamp

Purchased

Paid

Attrib.

Paul

2013-11-01 13:05

iPhone

£500

google.com

User

Landed

Referral

Email

Paul

2013-11-01 13:00

google.com

paul.lam@uswitch.com
Do this across all customers and products
Use your data processing tool of choice:
Apache Hive
• Apache Pig
• Cascading
and more ...
• Scalding
• Cascalog
• Spark
• your favourite programming language
•

Paco Nathan, “The Workflow Abstraction”, Strata SC, 2013.
Cascalog code to build user nodes
145 lines of Cascalog code in production
• a couple hundred lines more of utility functions
• build entity nodes and meta nodes
• sink data into database with Cascading-Neo4j Tap
•

36
Cascalog code to build user nodes
145 lines of Cascalog code in production
• a couple hundred lines more of utility functions
• build entity nodes and meta nodes
• sink data into database with Cascading-Neo4j Tap
•

37
Code to build user to product click relations
160 lines of Cascalog code in production
• + utility functions
• build direct and categoric relations
• sink data with Cascading-Neo4j Tap
•

38
Code to build user to product click relations
160 lines of Cascalog code in production
• + utility functions
• build direct and categoric relations
• sink data with Cascading-Neo4j Tap
•

39
Our User Event to Graph Data Pipeline

time

HDFS

Reshape

Graph Database
Summary

HDFS

Reshape

Graph
Contact
Paul Lam, data scientist at uSwitch.com
Email: paul@quantisan.com
Twitter: @Quantisan

More Related Content

Similar to Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph

Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Krist Wongsuphasawat
 
AD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsAD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsVincent Burckhardt
 
Driving the Internet of Things with Mobile Apps at tiConf
Driving the Internet of Things with Mobile Apps at tiConfDriving the Internet of Things with Mobile Apps at tiConf
Driving the Internet of Things with Mobile Apps at tiConfHans Scharler
 
User Story Mapping in Practice
User Story Mapping in PracticeUser Story Mapping in Practice
User Story Mapping in PracticeSteve Rogalsky
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...predictionio
 
Eposition - the Online Object ID system for the Internet of things era
Eposition - the Online Object ID system for the Internet of things eraEposition - the Online Object ID system for the Internet of things era
Eposition - the Online Object ID system for the Internet of things eraTAEHOON KIM
 
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)Giles Greenway
 
Build an App with JavaScript & jQuery
Build an App with JavaScript & jQuery Build an App with JavaScript & jQuery
Build an App with JavaScript & jQuery Aaron Lamphere
 
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?MongoDB
 
Data Collection and Consumption
Data Collection and ConsumptionData Collection and Consumption
Data Collection and ConsumptionBrian Greig
 
Data Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! ResearchData Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! ResearchYury Lifshits
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingSingleStore
 
Opportunities for IT and SLA Professionals to Collaborate
Opportunities for IT and SLA Professionals to CollaborateOpportunities for IT and SLA Professionals to Collaborate
Opportunities for IT and SLA Professionals to CollaborateAnand Deshpande
 
Preparing for Release to the App Store
Preparing for Release to the App StorePreparing for Release to the App Store
Preparing for Release to the App StoreGeoffrey Goetz
 
Building Social Enterprise with Ruby and Salesforce
Building Social Enterprise with Ruby and SalesforceBuilding Social Enterprise with Ruby and Salesforce
Building Social Enterprise with Ruby and SalesforceRaymond Gao
 
IT-Security@Contemporary Life
IT-Security@Contemporary LifeIT-Security@Contemporary Life
IT-Security@Contemporary LifeOliver Pfaff
 
Introduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologiesIntroduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologiesChris Schalk
 

Similar to Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph (20)

Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
 
AD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsAD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With Analytics
 
Driving the Internet of Things with Mobile Apps at tiConf
Driving the Internet of Things with Mobile Apps at tiConfDriving the Internet of Things with Mobile Apps at tiConf
Driving the Internet of Things with Mobile Apps at tiConf
 
User Story Mapping in Practice
User Story Mapping in PracticeUser Story Mapping in Practice
User Story Mapping in Practice
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
Eposition - the Online Object ID system for the Internet of things era
Eposition - the Online Object ID system for the Internet of things eraEposition - the Online Object ID system for the Internet of things era
Eposition - the Online Object ID system for the Internet of things era
 
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
 
Logs & Visualizations at Twitter
Logs & Visualizations at TwitterLogs & Visualizations at Twitter
Logs & Visualizations at Twitter
 
Build an App with JavaScript & jQuery
Build an App with JavaScript & jQuery Build an App with JavaScript & jQuery
Build an App with JavaScript & jQuery
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
 
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
 
Data Collection and Consumption
Data Collection and ConsumptionData Collection and Consumption
Data Collection and Consumption
 
Data Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! ResearchData Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! Research
 
yogesh CV
yogesh CVyogesh CV
yogesh CV
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
Opportunities for IT and SLA Professionals to Collaborate
Opportunities for IT and SLA Professionals to CollaborateOpportunities for IT and SLA Professionals to Collaborate
Opportunities for IT and SLA Professionals to Collaborate
 
Preparing for Release to the App Store
Preparing for Release to the App StorePreparing for Release to the App Store
Preparing for Release to the App Store
 
Building Social Enterprise with Ruby and Salesforce
Building Social Enterprise with Ruby and SalesforceBuilding Social Enterprise with Ruby and Salesforce
Building Social Enterprise with Ruby and Salesforce
 
IT-Security@Contemporary Life
IT-Security@Contemporary LifeIT-Security@Contemporary Life
IT-Security@Contemporary Life
 
Introduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologiesIntroduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologies
 

More from Paul Lam

Mozambique, Smallholder Farming, and Technology
Mozambique, Smallholder Farming, and TechnologyMozambique, Smallholder Farming, and Technology
Mozambique, Smallholder Farming, and TechnologyPaul Lam
 
When a machine learning researcher and a software engineer walk into a bar
When a machine learning researcher and a software engineer walk into a barWhen a machine learning researcher and a software engineer walk into a bar
When a machine learning researcher and a software engineer walk into a barPaul Lam
 
Evolution of Our Software Architecture
Evolution of Our Software ArchitectureEvolution of Our Software Architecture
Evolution of Our Software ArchitecturePaul Lam
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojurePaul Lam
 
Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)Paul Lam
 
Clojure in US vs Europe
Clojure in US vs EuropeClojure in US vs Europe
Clojure in US vs EuropePaul Lam
 
2014 docker boston fig for developing microservices
2014 docker boston   fig for developing microservices2014 docker boston   fig for developing microservices
2014 docker boston fig for developing microservicesPaul Lam
 
Composing re-useable ETL on Hadoop
Composing re-useable ETL on HadoopComposing re-useable ETL on Hadoop
Composing re-useable ETL on HadoopPaul Lam
 
An agile approach to knowledge discovery on web log data
An agile approach to knowledge discovery on web log dataAn agile approach to knowledge discovery on web log data
An agile approach to knowledge discovery on web log dataPaul Lam
 

More from Paul Lam (9)

Mozambique, Smallholder Farming, and Technology
Mozambique, Smallholder Farming, and TechnologyMozambique, Smallholder Farming, and Technology
Mozambique, Smallholder Farming, and Technology
 
When a machine learning researcher and a software engineer walk into a bar
When a machine learning researcher and a software engineer walk into a barWhen a machine learning researcher and a software engineer walk into a bar
When a machine learning researcher and a software engineer walk into a bar
 
Evolution of Our Software Architecture
Evolution of Our Software ArchitectureEvolution of Our Software Architecture
Evolution of Our Software Architecture
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojure
 
Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)
 
Clojure in US vs Europe
Clojure in US vs EuropeClojure in US vs Europe
Clojure in US vs Europe
 
2014 docker boston fig for developing microservices
2014 docker boston   fig for developing microservices2014 docker boston   fig for developing microservices
2014 docker boston fig for developing microservices
 
Composing re-useable ETL on Hadoop
Composing re-useable ETL on HadoopComposing re-useable ETL on Hadoop
Composing re-useable ETL on Hadoop
 
An agile approach to knowledge discovery on web log data
An agile approach to knowledge discovery on web log dataAn agile approach to knowledge discovery on web log data
An agile approach to knowledge discovery on web log data
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph

  • 1. Customer Behaviour Analytics: Billions of Events to one Customer-Product Graph Strata London, 12th November 2013 Presented by Paul Lam
  • 2. About Paul Lam Joined uSwitch.com as first Data Scientist in 2010 • developed internal data products • built distributed data architecture • team of 3 with a developer and a statistician Code contributor to various open source tools • Cascalog, a big data processing library built on top of Cascading (comparable to Apache Pig) • Incanter, a statistical computing platform in Clojure Author of Web Usage Mining: Data Mining Visitor Patterns From Web Server Logs* to be published in late 2014 * tentative title
  • 4. Customer-Product Graph {:name “Bob”} {:product “iPhone”} {:name “Tom”} {:name “Emily”} {:product “American Express”} {:name “Lily”} [:BUY {:timestamp 2013-11-01 13:00:00}] bought viewed
  • 5. Question: Who bought an iPhone? {:name “Bob”} {:product “iPhone”} {:name “Tom”} {:name “Emily”} {:product “American Express”} {:name “Lily”} bought viewed
  • 6. Query: Who bought an iPhone? {:name “Bob”} X {:product “iPhone”} {:name “Tom”} {:name “Emily”} START x=node:node_auto_index(product='iPhone')Express”} {:product “American {:name “Lily”} MATCH (person)-[:BUY]->(x) RETURN person
  • 7. Question: What else did they buy? {:name “Bob”} X {:product “iPhone”} {:name “Tom”} {:name “Emily”} Y {:product “American Express”} {:name “Lily”} bought viewed
  • 8. Query: What else did they buy? {:name “Bob”} X {:product “iPhone”} {:name “Tom”} START x=node:node_auto_index(product='iPhone') {:name MATCH “Emily”} {:product “American Express”} (x)<-[:BUY]-(person)-[:BUY]->(y) {:name “Lily”} RETURN y
  • 9. Hypothesis: People that buy X has interest in Y {:name “Bob”} X {:product “iPhone”} {:name “Tom”} {:name “Emily”} Y {:product “American Express”} {:name “Lily”} bought viewed
  • 10. Query: Who to recommend Y {:name “Bob”} X {:product “iPhone”} START {:name “Tom”} x=node:node_auto_index(product='iPhone'), y=node:node_auto_index(product='American Express') {:name “Emily”} Y Looked {:product “American Express”} MATCH (p)-[:BUY]->(x), at AE (p)-[:VIEW]->(y) WHERE NOT (p)-[:BUY]->(y) {:name “Lily”} RETURN p Haven’t bought AE
  • 11. Product Recommendation by Reasoning Example Interactive demo at http://bit.ly/customer_graph 1. Start with an idea 2. Trace to connected nodes 3. Identify patterns from viewpoint of those nodes 4. Repeat from #1 until discovering actionable item 5. Apply pattern
  • 12. Challenge: Event Data to Graph Data User ID Product ID Action Bob iPhone Bought Tom iPhone Bought Emily iPhone Bought Bob AE Bought Emily AE Viewed Lily AE Bought
  • 19. A Feedback System to Drive Profit 19
  • 20. Minimise effort between Q & A Question Answer Data Interrogation
  • 21. One Approach: Make data querying easier Query = Function(Data) [1] ~ Function(Data Structure) [1] Figure 1.3 from Big Data (preview v11) by Nathan Marz and James Warren
  • 22. Data Structure: Relations versus Relations a Edges ak Sale ID User ID Product ID Profit 1 1 1 £100 2 1 2 £50 {:profit £100} {:profit £50} User ID Name Product ID Name 1 Bob 1 iPhone 2 Emily 2 A.E.
  • 23. Using the right database for the right task RDBMS Graph DB Data Attributes Entities and relations Model Record-based Associative Relation By-product of normalisation First class citizen Example use Reporting Reasoning
  • 24. How does it work
  • 25. User actions as time-stamped records time HDFS Paul Ingles, “User as Data”, Euroclojure 2012
  • 26. Our User Event to Graph Data Pipeline time HDFS Reshape Graph Database
  • 27. From Input to Output with Hadoop, aka ETL step 1. interface 3. output data 2. input data 4... The 80% HDFS Reshape Graph Database
  • 28. Hadoop interface to Neo4J • • • • Cascading-Neo4j tap [1] Faunus Hadoop binaries [2] CSV files* etc. [1] http://github.com/pingles/cascading.neo4j [2] http://thinkaurelius.github.io/faunus/ HDFS Reshape Graph Database
  • 29. Input data stored on HDFS time 1 2 3 4 User 1 2 Timestamp Viewed Page Referrer Paul 2013-11-01 13:00 /homepage/ google.com Paul 2013-11-01 13:01 /blog/ /homepage/ User 4 Viewed Product Price Referrer Paul 2013-11-01 13:04 iPhone £500 /blog/ User 3 Timestamp Timestamp Purchased Paid Attrib. Paul 2013-11-01 13:05 iPhone £500 google.com User Landed Referral Email Paul 2013-11-01 13:00 google.com paul.lam@uswitch.com
  • 30. Nodes and Edges CSVs to go into a property graph Node ID Properties 1 {:name “Paul”, email: “paul.lam@uswitch.com”} 2 {:domain “google.com”} 3 {:page “/homepage/”} ... ... 5 {:product “iPhone”} From To Type Properties 1 2 :SOURCE {:timestamp “2013-11-01 13:00”} 1 3 :VIEWED {:timestamp “2013-11-01 13:00”} ... ... ... ... 1 5 :BOUGHT {:timestamp “2013-11-01 13:05”}
  • 31. Records to Graph in 3 Steps ^ impo rtable C SV 1. Design graph 2. Extract Nodes 3. Build Relations 31
  • 32. Step 1: Designing your graph {:name “Paul” :email “paul.lam@uswitch.com} :viewed {:timestamp ... } {:page “homepage”} :bought {:timestamp ... } :viewed { ... } {:product “iPhone”} :viewed { ... } {:page “blog”} :source {:timestamp ... } {:domain “google.com”} 32
  • 33. Step 2: Extract list of entity nodes User Timestamp Viewed Page Referrer Paul 2013-11-01 13:00 /homepage/ google.com Paul 2013-11-01 13:01 /blog/ /homepage/ User Timestamp Viewed Product Price Referrer Paul 2013-11-01 13:04 iPhone £500 /blog/ User Timestamp Purchased Paid Attrib. Paul 2013-11-01 13:05 iPhone £500 google.com User Landed Referral Email Paul 2013-11-01 13:00 google.com paul.lam@uswitch.com
  • 34. Step 3: Building node-to-node relations User Timestamp Viewed Page Referrer Paul 2013-11-01 13:00 /homepage/ google.com Paul 2013-11-01 13:01 /blog/ /homepage/ User Timestamp Viewed Product Price Referrer Paul 2013-11-01 13:04 iPhone £500 /blog/ User Timestamp Purchased Paid Attrib. Paul 2013-11-01 13:05 iPhone £500 google.com User Landed Referral Email Paul 2013-11-01 13:00 google.com paul.lam@uswitch.com
  • 35. Do this across all customers and products Use your data processing tool of choice: Apache Hive • Apache Pig • Cascading and more ... • Scalding • Cascalog • Spark • your favourite programming language • Paco Nathan, “The Workflow Abstraction”, Strata SC, 2013.
  • 36. Cascalog code to build user nodes 145 lines of Cascalog code in production • a couple hundred lines more of utility functions • build entity nodes and meta nodes • sink data into database with Cascading-Neo4j Tap • 36
  • 37. Cascalog code to build user nodes 145 lines of Cascalog code in production • a couple hundred lines more of utility functions • build entity nodes and meta nodes • sink data into database with Cascading-Neo4j Tap • 37
  • 38. Code to build user to product click relations 160 lines of Cascalog code in production • + utility functions • build direct and categoric relations • sink data with Cascading-Neo4j Tap • 38
  • 39. Code to build user to product click relations 160 lines of Cascalog code in production • + utility functions • build direct and categoric relations • sink data with Cascading-Neo4j Tap • 39
  • 40. Our User Event to Graph Data Pipeline time HDFS Reshape Graph Database
  • 42. Contact Paul Lam, data scientist at uSwitch.com Email: paul@quantisan.com Twitter: @Quantisan