SlideShare a Scribd company logo
Using Flink with MongoDB to enhance relevancy
in personalization
“How to use Flink with MongoDB?”
Marc Schwering
Sr. Solution Architect – EMEA
marc@mongodb.com
@m4rcsch
2
Agenda For This Session
•  Personalization Process Review
•  The Life of an Application
•  Separation of Concerns / Real World Architecture
•  Apache Spark and Flink Data Processing Projects
•  Clustering with Apache Flink
•  Next Steps
3
High Level Personalization Process
1.	
  Profile	
  created	
  
2.	
  Enrich	
  with	
  public	
  data	
  
3.	
  Capture	
  ac9vity	
  
4.	
  Clustering	
  analysis	
  	
  
5.	
  Define	
  Personas	
  
6.	
  Tag	
  with	
  
personas	
  
7.	
  Personalize	
  interac9ons	
  
Batch analytics
Public data
Common
technologies
•  R
•  Hadoop
•  Spark
•  Python
•  Java
•  Many other
options Personas
changed much
less often than
tagging
4
Evolution of a Profile (1)
{
"_id" : ObjectId("553ea57b588ac9ef066428e1"),
"ipAddress" : "216.58.219.238",
"referrer" : ”kay.com",
"firstName" : "John",
"lastName" : "Doe",
"email" : "johndoe@gmail.com"
}
5
Evolution of a Profile (n+1)
{
"_id" : ObjectId("553e7dca588ac9ef066428e0"),
"firstName" : "John",
"lastName" : "Doe",
"address" : "229 W. 43rd St.",
"city" : "New York",
"state" : "NY",
"zipCode" : "10036",
"age" : 30,
"email" : "john.doe@mongodb.com",
"twitterHandle" : "johndoe",
"gender" : "male",
"interests" : [
"electronics",
"basketball",
"weightlifting",
"ultimate frisbee",
"traveling",
"technology"
],
"visitedCounts" : {
"watches" : 3,
"shirts" : 1,
"sunglasses" : 1,
"bags" : 2
},
"purchases" : [
{
"id" : 1,
"desc" : "Power Oxford Dress Shoe",
"category" : "Mens shoes"
},
{
"id" : 2,
"desc" : "Striped Sportshirt",
"category" : "Mens shirts"
}
],
"persona" : "shoe-fanatic”
}
6
One size/document fits all?
•  Profile Data
–  Preferences
–  Personal information
•  Contact information
•  DOB, gender, ZIP...
•  Customer Data
–  Purchase History
–  Marketing History
•  „Session Data“
–  View History
–  Shopping Cart Data
–  Information Broker Data
•  Personalisation Data
–  Persona Vectors
–  Product and Category recommendations
Application
Batch analytics
7
Separation of Concerns
•  Profile Data
–  Preferences
–  Personal information
•  Contact information
•  DOB, gender, ZIP...
•  Customer Data
–  Purchase History
–  Marketing History
•  „Session Data“
–  View History
–  Shopping Cart Data
–  Information Broker Data
•  Personalisation Data
–  Persona Vectors
–  Product and Category recommendations
Batch analytics Layer
Frontend - System
Profile Service
Customer
Service
Session Service Persona Service
8
Benefits
•  Code does less, Document and Code stays focused
•  Split ability
– Different Teams
– New Languages
– Defined Dependencies
9
Advice for Developers (1)
•  Code does less, Document and Code stays focused
•  Split ability
– Different Teams
– New Languages
– Defined Dependencies
KISS
=> Keep it simple and save!
=> Clean Code <=
•  Robert C. Marten: https://cleancoders.com/
•  M. Fowler / B. Meyer. et. al.: Command Query Separation
Analytics and Personalization
From Query to Clustering
11
Separation of Concerns
•  Profile Data
–  Preferences
–  Personal information
•  Contact information
•  DOB, gender, ZIP...
•  Customer Data
–  Purchase History
–  Marketing History
•  „Session Data“
–  View History
–  Shopping Cart Data
–  Information Broker Data
•  Personalisation Data
–  Persona Vectors
–  Product and Category recommendations
Batch analytics Layer
Frontend – System
Profile Service
Customer
Service
Session Service Persona Service
12
Separation of Concerns
•  Profile Data
–  Preferences
–  Personal information
•  Contact information
•  DOB, gender, ZIP...
•  Customer Data
–  Purchase History
–  Marketing History
•  „Session Data“
–  View History
–  Shopping Cart Data
–  Information Broker Data
•  Personalisation Data
–  Persona Vectors
–  Product and Category recommendations
Batch analytics Layer
Frontend – System
Profile Service
Customer
Service
Session Service Persona Service
13
Architecture revised
Profile Service
Customer
Service
Session Service Persona Service
Frontend – System Backend– Systems
Data
Processing
14
Advice for Developers (2)
•  OWN YOUR DATA! (but only relevant Data)
•  Say no! (to direct Data ie. DB Access)
Data Processing
16
Hadoop in a Nutshell
•  An open source distributed storage and
distributed batch oriented processing framework
•  Hadoop Distributed File System (HDFS) to store data on
commodity hardware
•  Yarn as resource management platform
•  MapReduce as programming model working on top of HDFS
17
Spark in a Nutshell
•  Spark is a top-level Apache project
•  Can be run on top of YARN and can read any
Hadoop API data, including HDFS or MongoDB
•  Fast and general engine for large-scale data processing and
analytics
•  Advanced DAG execution engine with support for data locality
and in-memory computing
18
Flink in a Nutshell
•  Flink is a top-level Apache project
•  Can be run on top of YARN and can read any
Hadoop API data, including HDFS or MongoDB
•  A distributed streaming dataflow engine
•  Streaming and batch
•  Iterative in memory execution and handling
•  Cost based optimizer
19
Latency of query operations
Query Aggregation MapReduce Cluster Algorithms
time
MongoDB
Hadoop
Spark/Flink
Iterative Algorithms / Clustering
21
K-Means in Pictures
•  Source: Wikipedia K-Means
22
K-Means as a Process
23
Iterations in Hadoop and Spark
24
Iterations in Flink
•  Dedicated iteration operators
•  Tasks keep running for the iterations, not redeployed for each step
•  Caching and optimizations done automatically
Example
26
Result
27
More…?
28
Takeaways
•  Evolution is amazing and exiting!
–  Be ready to learn new things, ask questions across Silos!
•  Stay focused => Start and stay small
–  Evaluate with BigDocuments but do a PoC focussed on the topic
•  Extending functionality could be challenging
–  Evolution is outpacing help channels
–  A lot of options (Spark, Flink, Storm, Hadoop….)
–  More than just a binary
•  Extending functionality is easy
–  Aggregation, MapReduce
–  Connectors opening a new variety of Use Cases
29
Next Steps
•  Try out Flink
–  http://flink.apache.org/
–  https://github.com/mongodb/mongo-hadoop
–  https://github.com/m4rcsch/flink-mongodb-example
•  Participate and ask Questions!
–  @m4rcsch
–  marc@mongodb.com
•  We are hiring!! J
Thank you!
Marc Schwering
Sr. Solutions Architect – EMEA
marc@mongodb.com
@m4rcsch

More Related Content

What's hot

FRRouting Overview and Current Status
FRRouting Overview and Current StatusFRRouting Overview and Current Status
FRRouting Overview and Current Status
APNIC
 
Introduction to github slideshare
Introduction to github slideshareIntroduction to github slideshare
Introduction to github slideshare
Rakesh Sukumar
 
Write microservice in golang
Write microservice in golangWrite microservice in golang
Write microservice in golang
Bo-Yi Wu
 
Arquitetura de Internet das Coisas usando Google Cloud
Arquitetura de Internet das Coisas usando Google CloudArquitetura de Internet das Coisas usando Google Cloud
Arquitetura de Internet das Coisas usando Google Cloud
Alvaro Viebrantz
 
Go 語言基礎簡介
Go 語言基礎簡介Go 語言基礎簡介
Go 語言基礎簡介
Bo-Yi Wu
 
내가써본 nGrinder-SpringCamp 2015
내가써본 nGrinder-SpringCamp 2015내가써본 nGrinder-SpringCamp 2015
내가써본 nGrinder-SpringCamp 2015
Lim SungHyun
 
Git flow Introduction
Git flow IntroductionGit flow Introduction
Git flow Introduction
David Paluy
 
Infrastructure as Code (IaC)
Infrastructure as Code (IaC)Infrastructure as Code (IaC)
Infrastructure as Code (IaC)
Srinivas Kantipudi
 
Build and release in code with azure devops pipelines
Build and release in code with azure devops pipelinesBuild and release in code with azure devops pipelines
Build and release in code with azure devops pipelines
Gian Maria Ricci
 
Gitlab ci-cd
Gitlab ci-cdGitlab ci-cd
Gitlab ci-cd
Dan MAGIER
 
How Shopify Is Scaling Up Its Redis Message Queues
How Shopify Is Scaling Up Its Redis Message QueuesHow Shopify Is Scaling Up Its Redis Message Queues
How Shopify Is Scaling Up Its Redis Message Queues
Redis Labs
 
Gitlab CI/CD
Gitlab CI/CDGitlab CI/CD
Gitlab CI/CD
JEMLI Fathi
 
JIRA 업무 생산성 향상 및 프로젝트 관리
JIRA 업무 생산성 향상 및 프로젝트 관리JIRA 업무 생산성 향상 및 프로젝트 관리
JIRA 업무 생산성 향상 및 프로젝트 관리
KwangSeob Jeong
 
CI/CD Overview
CI/CD OverviewCI/CD Overview
CI/CD Overview
An Nguyen
 
이클립스 플랫폼
이클립스 플랫폼이클립스 플랫폼
이클립스 플랫폼
Kenu, GwangNam Heo
 
Primeiros passos com a API do Zabbix - Webinar JLCP
Primeiros passos com a API do Zabbix - Webinar JLCPPrimeiros passos com a API do Zabbix - Webinar JLCP
Primeiros passos com a API do Zabbix - Webinar JLCP
Robert Silva
 
GitHub Actions with Node.js
GitHub Actions with Node.jsGitHub Actions with Node.js
GitHub Actions with Node.js
Stefan Stölzle
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Cerberus Testing
Cerberus TestingCerberus Testing
Cerberus Testing
CIVEL Benoit
 
[AIS 2018] [Team Tools_Basic] Confluence는 어떻게 쓰나요 - 모우소프트
[AIS 2018] [Team Tools_Basic] Confluence는 어떻게 쓰나요 - 모우소프트[AIS 2018] [Team Tools_Basic] Confluence는 어떻게 쓰나요 - 모우소프트
[AIS 2018] [Team Tools_Basic] Confluence는 어떻게 쓰나요 - 모우소프트
Atlassian 대한민국
 

What's hot (20)

FRRouting Overview and Current Status
FRRouting Overview and Current StatusFRRouting Overview and Current Status
FRRouting Overview and Current Status
 
Introduction to github slideshare
Introduction to github slideshareIntroduction to github slideshare
Introduction to github slideshare
 
Write microservice in golang
Write microservice in golangWrite microservice in golang
Write microservice in golang
 
Arquitetura de Internet das Coisas usando Google Cloud
Arquitetura de Internet das Coisas usando Google CloudArquitetura de Internet das Coisas usando Google Cloud
Arquitetura de Internet das Coisas usando Google Cloud
 
Go 語言基礎簡介
Go 語言基礎簡介Go 語言基礎簡介
Go 語言基礎簡介
 
내가써본 nGrinder-SpringCamp 2015
내가써본 nGrinder-SpringCamp 2015내가써본 nGrinder-SpringCamp 2015
내가써본 nGrinder-SpringCamp 2015
 
Git flow Introduction
Git flow IntroductionGit flow Introduction
Git flow Introduction
 
Infrastructure as Code (IaC)
Infrastructure as Code (IaC)Infrastructure as Code (IaC)
Infrastructure as Code (IaC)
 
Build and release in code with azure devops pipelines
Build and release in code with azure devops pipelinesBuild and release in code with azure devops pipelines
Build and release in code with azure devops pipelines
 
Gitlab ci-cd
Gitlab ci-cdGitlab ci-cd
Gitlab ci-cd
 
How Shopify Is Scaling Up Its Redis Message Queues
How Shopify Is Scaling Up Its Redis Message QueuesHow Shopify Is Scaling Up Its Redis Message Queues
How Shopify Is Scaling Up Its Redis Message Queues
 
Gitlab CI/CD
Gitlab CI/CDGitlab CI/CD
Gitlab CI/CD
 
JIRA 업무 생산성 향상 및 프로젝트 관리
JIRA 업무 생산성 향상 및 프로젝트 관리JIRA 업무 생산성 향상 및 프로젝트 관리
JIRA 업무 생산성 향상 및 프로젝트 관리
 
CI/CD Overview
CI/CD OverviewCI/CD Overview
CI/CD Overview
 
이클립스 플랫폼
이클립스 플랫폼이클립스 플랫폼
이클립스 플랫폼
 
Primeiros passos com a API do Zabbix - Webinar JLCP
Primeiros passos com a API do Zabbix - Webinar JLCPPrimeiros passos com a API do Zabbix - Webinar JLCP
Primeiros passos com a API do Zabbix - Webinar JLCP
 
GitHub Actions with Node.js
GitHub Actions with Node.jsGitHub Actions with Node.js
GitHub Actions with Node.js
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Cerberus Testing
Cerberus TestingCerberus Testing
Cerberus Testing
 
[AIS 2018] [Team Tools_Basic] Confluence는 어떻게 쓰나요 - 모우소프트
[AIS 2018] [Team Tools_Basic] Confluence는 어떻게 쓰나요 - 모우소프트[AIS 2018] [Team Tools_Basic] Confluence는 어떻게 쓰나요 - 모우소프트
[AIS 2018] [Team Tools_Basic] Confluence는 어떻게 쓰나요 - 모우소프트
 

Viewers also liked

Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Till Rohrmann
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Flink Forward
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
Flink Forward
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Flink Forward
 
Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and Bytes
Flink Forward
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Flink Forward
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Matthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and StormsMatthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and Storms
Flink Forward
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Flink Forward
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Flink Forward
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce Compatibility
Fabian Hueske
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
Flink Forward
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
Flink Forward
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Flink Forward
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on Flink
Flink Forward
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Flink Forward
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at Scale
Flink Forward
 
Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming
Flink Forward
 

Viewers also liked (20)

Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and Bytes
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Matthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and StormsMatthias J. Sax – A Tale of Squirrels and Storms
Matthias J. Sax – A Tale of Squirrels and Storms
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce Compatibility
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on Flink
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at Scale
 
Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming
 

Similar to Marc Schwering – Using Flink with MongoDB to enhance relevancy in personalization

Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...
Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...
Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...
MongoDB
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
MongoDB
 
MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013MongoDB
 
Apache drill
Apache drillApache drill
Apache drill
MapR Technologies
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
Hisham Arafat
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Lior Sidi
 
Neo4j GraphDay Seattle- Sept19- in the enterprise
Neo4j GraphDay Seattle- Sept19-  in the enterpriseNeo4j GraphDay Seattle- Sept19-  in the enterprise
Neo4j GraphDay Seattle- Sept19- in the enterprise
Neo4j
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
Bhupesh Bansal
 
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)DataPad Inc.
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Anand Narayanan
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
Sarang Shravagi
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
Miguel Pastor
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
Tugdual Grall
 
Presentacion day f-core v1.2.1.2-technical - english
Presentacion day f-core v1.2.1.2-technical - englishPresentacion day f-core v1.2.1.2-technical - english
Presentacion day f-core v1.2.1.2-technical - english
Jose Luis Sanchez del Coso
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
MongoDB
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
Uwe Printz
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 

Similar to Marc Schwering – Using Flink with MongoDB to enhance relevancy in personalization (20)

Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...
Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...
Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013
 
Apache drill
Apache drillApache drill
Apache drill
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Neo4j GraphDay Seattle- Sept19- in the enterprise
Neo4j GraphDay Seattle- Sept19-  in the enterpriseNeo4j GraphDay Seattle- Sept19-  in the enterprise
Neo4j GraphDay Seattle- Sept19- in the enterprise
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika Technologies
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
Presentacion day f-core v1.2.1.2-technical - english
Presentacion day f-core v1.2.1.2-technical - englishPresentacion day f-core v1.2.1.2-technical - english
Presentacion day f-core v1.2.1.2-technical - english
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 

Marc Schwering – Using Flink with MongoDB to enhance relevancy in personalization

  • 1. Using Flink with MongoDB to enhance relevancy in personalization “How to use Flink with MongoDB?” Marc Schwering Sr. Solution Architect – EMEA marc@mongodb.com @m4rcsch
  • 2. 2 Agenda For This Session •  Personalization Process Review •  The Life of an Application •  Separation of Concerns / Real World Architecture •  Apache Spark and Flink Data Processing Projects •  Clustering with Apache Flink •  Next Steps
  • 3. 3 High Level Personalization Process 1.  Profile  created   2.  Enrich  with  public  data   3.  Capture  ac9vity   4.  Clustering  analysis     5.  Define  Personas   6.  Tag  with   personas   7.  Personalize  interac9ons   Batch analytics Public data Common technologies •  R •  Hadoop •  Spark •  Python •  Java •  Many other options Personas changed much less often than tagging
  • 4. 4 Evolution of a Profile (1) { "_id" : ObjectId("553ea57b588ac9ef066428e1"), "ipAddress" : "216.58.219.238", "referrer" : ”kay.com", "firstName" : "John", "lastName" : "Doe", "email" : "johndoe@gmail.com" }
  • 5. 5 Evolution of a Profile (n+1) { "_id" : ObjectId("553e7dca588ac9ef066428e0"), "firstName" : "John", "lastName" : "Doe", "address" : "229 W. 43rd St.", "city" : "New York", "state" : "NY", "zipCode" : "10036", "age" : 30, "email" : "john.doe@mongodb.com", "twitterHandle" : "johndoe", "gender" : "male", "interests" : [ "electronics", "basketball", "weightlifting", "ultimate frisbee", "traveling", "technology" ], "visitedCounts" : { "watches" : 3, "shirts" : 1, "sunglasses" : 1, "bags" : 2 }, "purchases" : [ { "id" : 1, "desc" : "Power Oxford Dress Shoe", "category" : "Mens shoes" }, { "id" : 2, "desc" : "Striped Sportshirt", "category" : "Mens shirts" } ], "persona" : "shoe-fanatic” }
  • 6. 6 One size/document fits all? •  Profile Data –  Preferences –  Personal information •  Contact information •  DOB, gender, ZIP... •  Customer Data –  Purchase History –  Marketing History •  „Session Data“ –  View History –  Shopping Cart Data –  Information Broker Data •  Personalisation Data –  Persona Vectors –  Product and Category recommendations Application Batch analytics
  • 7. 7 Separation of Concerns •  Profile Data –  Preferences –  Personal information •  Contact information •  DOB, gender, ZIP... •  Customer Data –  Purchase History –  Marketing History •  „Session Data“ –  View History –  Shopping Cart Data –  Information Broker Data •  Personalisation Data –  Persona Vectors –  Product and Category recommendations Batch analytics Layer Frontend - System Profile Service Customer Service Session Service Persona Service
  • 8. 8 Benefits •  Code does less, Document and Code stays focused •  Split ability – Different Teams – New Languages – Defined Dependencies
  • 9. 9 Advice for Developers (1) •  Code does less, Document and Code stays focused •  Split ability – Different Teams – New Languages – Defined Dependencies KISS => Keep it simple and save! => Clean Code <= •  Robert C. Marten: https://cleancoders.com/ •  M. Fowler / B. Meyer. et. al.: Command Query Separation
  • 10. Analytics and Personalization From Query to Clustering
  • 11. 11 Separation of Concerns •  Profile Data –  Preferences –  Personal information •  Contact information •  DOB, gender, ZIP... •  Customer Data –  Purchase History –  Marketing History •  „Session Data“ –  View History –  Shopping Cart Data –  Information Broker Data •  Personalisation Data –  Persona Vectors –  Product and Category recommendations Batch analytics Layer Frontend – System Profile Service Customer Service Session Service Persona Service
  • 12. 12 Separation of Concerns •  Profile Data –  Preferences –  Personal information •  Contact information •  DOB, gender, ZIP... •  Customer Data –  Purchase History –  Marketing History •  „Session Data“ –  View History –  Shopping Cart Data –  Information Broker Data •  Personalisation Data –  Persona Vectors –  Product and Category recommendations Batch analytics Layer Frontend – System Profile Service Customer Service Session Service Persona Service
  • 13. 13 Architecture revised Profile Service Customer Service Session Service Persona Service Frontend – System Backend– Systems Data Processing
  • 14. 14 Advice for Developers (2) •  OWN YOUR DATA! (but only relevant Data) •  Say no! (to direct Data ie. DB Access)
  • 16. 16 Hadoop in a Nutshell •  An open source distributed storage and distributed batch oriented processing framework •  Hadoop Distributed File System (HDFS) to store data on commodity hardware •  Yarn as resource management platform •  MapReduce as programming model working on top of HDFS
  • 17. 17 Spark in a Nutshell •  Spark is a top-level Apache project •  Can be run on top of YARN and can read any Hadoop API data, including HDFS or MongoDB •  Fast and general engine for large-scale data processing and analytics •  Advanced DAG execution engine with support for data locality and in-memory computing
  • 18. 18 Flink in a Nutshell •  Flink is a top-level Apache project •  Can be run on top of YARN and can read any Hadoop API data, including HDFS or MongoDB •  A distributed streaming dataflow engine •  Streaming and batch •  Iterative in memory execution and handling •  Cost based optimizer
  • 19. 19 Latency of query operations Query Aggregation MapReduce Cluster Algorithms time MongoDB Hadoop Spark/Flink
  • 21. 21 K-Means in Pictures •  Source: Wikipedia K-Means
  • 22. 22 K-Means as a Process
  • 24. 24 Iterations in Flink •  Dedicated iteration operators •  Tasks keep running for the iterations, not redeployed for each step •  Caching and optimizations done automatically
  • 28. 28 Takeaways •  Evolution is amazing and exiting! –  Be ready to learn new things, ask questions across Silos! •  Stay focused => Start and stay small –  Evaluate with BigDocuments but do a PoC focussed on the topic •  Extending functionality could be challenging –  Evolution is outpacing help channels –  A lot of options (Spark, Flink, Storm, Hadoop….) –  More than just a binary •  Extending functionality is easy –  Aggregation, MapReduce –  Connectors opening a new variety of Use Cases
  • 29. 29 Next Steps •  Try out Flink –  http://flink.apache.org/ –  https://github.com/mongodb/mongo-hadoop –  https://github.com/m4rcsch/flink-mongodb-example •  Participate and ask Questions! –  @m4rcsch –  marc@mongodb.com •  We are hiring!! J
  • 30. Thank you! Marc Schwering Sr. Solutions Architect – EMEA marc@mongodb.com @m4rcsch