SlideShare a Scribd company logo
1 of 50
Download to read offline
OneHippo @ Goto
follow the Hippo trail
Building a relevance
platform with Couchbase
and Elasticsearch
@jreijn | Hippo
#gotoams, June 18
follow the Hippo trail
OneHippo @ Goto
About me
• Architect @ Hippo
• DevOps guy
• Blogger @ http://blog.jeroenreijn.com
follow the Hippo trail
OneHippo @ Goto
About Hippo
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Relevance?
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
“The capability of a search
engine or function to
retrieve data appropriate
to a user's needs.”
http://www.thefreedictionary.com/relevance
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
How we deliver
relevant content
@Hippo
follow the Hippo trail
OneHippo @ Goto
Registration
Visitor - entity making HTTP requests
Collector - records data about a visitor or his behavior
Example: location collector (GeoIPCollector)
Targeting Data - all data about a specific visitor
Example: IP address is located in Amsterdam
follow the Hippo trail
OneHippo @ Goto
Matching
Characteristic - a type of fact about visitors
Example: "comes from a city", "experiences a type of
weather"
Target Group - the specification of a Characteristic
Example: "comes from a European city", "comes from
Amsterdam"
Persona - one or more target groups that describe a
certain type of visitor
Example: "Jim, the European urban consumer",
"Alice, the Pet owner"
follow the Hippo trail
OneHippo @ Goto
What do we store?
Request log
Targeting data
Statistics
Averages, e.g. how many visitors became which persona
follow the Hippo trail
OneHippo @ Goto
Real-time analysis
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Architecture
follow the Hippo trail
OneHippo @ Goto
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
XMLJSON (X)HTML
follow the Hippo trail
OneHippo @ Goto
Delivery Tier
URL Matching
Fetch content
Compose output
Request
Response
follow the Hippo trail
OneHippo @ Goto
Delivery Tier
URL Matching
Targeting Data Collection
Compose output
Request
Response
Fetch content
Scoring
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Scaling
follow the Hippo trail
OneHippo @ Goto
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
Hippo Delivery Tier
Hippo Repository
App server
Scaling out
follow the Hippo trail
OneHippo @ Goto
RDBMS
Delivery Tier
Repository
App server
Delivery Tier
Repository
App server
Scaling out
Targeting
Datastore
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
What kind of ‘storage’?
follow the Hippo trail
OneHippo @ Goto
Distributed Cache?
follow the Hippo trail
OneHippo @ Goto
We have a winner!
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Requirements
change!
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
NoSQL to the rescue
follow the Hippo trail
OneHippo @ Goto
Suitable types
• Key-value store
• Document database
follow the Hippo trail
OneHippo @ Goto
Assessment Criteria
Maturity Data model
Consistency model
PerformanceReplication
Caching model Query model
Monitoring
Scalability
Reliability
Support
follow the Hippo trail
OneHippo @ Goto
Selection Criteria
• Performance!
• Scalability
• Schema flexibility
• Simplicity
• Monitoring
• Support
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Performance !!
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Scalability
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Schema flexibility
follow the Hippo trail
OneHippo @ Goto
{
"visitorId": "7a1c7e75-8539-40",
"pageUrl": "http://localhost:8080/site/news",
"pathInfo": "/news",
"remoteAddr": "127.0.0.1",
"referer": "http://localhost:8080/site/",
"timestamp": 1371419505909,
"collectorData": {
"geo": {
"country": "",
"city": "",
"latitude": 0,
"longitude": 0
},
"returningvisitor": false,
"channel": "English Website"
},
"personaIdScores": [],
"globalPersonaIdScores": []
}
Request log document
follow the Hippo trail
OneHippo @ Goto
{
"geo": {
"collectorId": "geo",
"city": "",
"country": "",
"latitude": 0,
"longitude": 0
},
"channel": {
"collectorId": "channel",
"channels": [
"English Website"
],
"lastVisitedChannel": "English Website"
}
}
Visitor document
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Simplicity
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Monitoring
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Support
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Couchbase
follow the Hippo trail
OneHippo @ Goto
Why Couchbase?
• Drop-in replacement for memcached
• Read/Write-through cache
• High throughput
• Easy scalability
• Schema flexibility
• Low latency
follow the Hippo trail
OneHippo @ Goto
Couchbase
• Open Source
• Document-oriented
• Easy Scalable
• Consistent High Performance
follow the Hippo trail
OneHippo @ Goto
Performance
• Object managed cache
• Write Queue to disk
• Avoids Cold Cache
follow the Hippo trail
OneHippo @ Goto
Easy scalable
• Auto sharding
• Cross cluster replication (XDCR)
• Master - Master replication
follow the Hippo trail
OneHippo @ Goto
Flexible data model
• Native JSON support
• Incremental Map Reduce
• Gives power to the developer
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
How we run
Couchbase @Hippo
follow the Hippo trail
OneHippo @ Goto
Load Balancer
Database cluster
Hippo Delivery Tier
Couchbase cluster
•Request log data
•Targeting data
•Statistics data
follow the Hippo trail
OneHippo @ Goto
Query capabilities
• Querying via views
• Secondary indexes via views
• Views based on Map - Reduce
• Lacks some advanced query capabilities
follow the Hippo trail
OneHippo @ Goto
Elasticsearch
• Apache Lucene
• Designed to be distributed
• Schema free
• Apache 2 licensed
• RESTful API
follow the Hippo trail
OneHippo @ Goto
Added value of ES
• Full text search
• Faceted search
• Geo spatial search
• All in (near) real-time
follow the Hippo trail
OneHippo @ Goto
Couchbase Server Cluster Elasticsearch Server Cluster
Hippo Delivery Tier
Java API
Write
Read
XDCR Couchbase ES
Transport plugin
Replicating to ES
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Demo time!
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
What’s Next?
follow the Hippo trail
OneHippo @ Goto
Advanced analytics
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Thank you!
Questions?
j.reijn@onehippo.com
@jreijn
ps. We’re hiring!

More Related Content

What's hot

Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Data Con LA
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherMongoDB
 
Scrapinghub Deck for Startups
Scrapinghub Deck for StartupsScrapinghub Deck for Startups
Scrapinghub Deck for StartupsScrapinghub
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyJosh Baer
 
ElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der CloudsElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der Cloudsinovex GmbH
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyJason Plurad
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
 
Real time ads personalization @ Spotify
Real time ads personalization @ SpotifyReal time ads personalization @ Spotify
Real time ads personalization @ SpotifyKinshuk Mishra
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioWinston Chen
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big DataInfoFarm
 
Insight Data Engineering project
Insight Data Engineering projectInsight Data Engineering project
Insight Data Engineering projectHoa Nguyen
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Zekeriya Besiroglu
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Zhenxiao Luo
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyJosh Baer
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopJason Plurad
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 

What's hot (20)

Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
 
Dynamic sitemaps
Dynamic sitemapsDynamic sitemaps
Dynamic sitemaps
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
Scrapinghub Deck for Startups
Scrapinghub Deck for StartupsScrapinghub Deck for Startups
Scrapinghub Deck for Startups
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
ElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der CloudsElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der Clouds
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph Technology
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 
Pig on Spark
Pig on SparkPig on Spark
Pig on Spark
 
Real time ads personalization @ Spotify
Real time ads personalization @ SpotifyReal time ads personalization @ Spotify
Real time ads personalization @ Spotify
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big Data
 
Insight Data Engineering project
Insight Data Engineering projectInsight Data Engineering project
Insight Data Engineering project
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 

Similar to Building a relevance platform with Couchbase and Elasticsearch

How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...MongoDB
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
Hadoop at Twitter (Hadoop Summit 2010)
Hadoop at Twitter (Hadoop Summit 2010)Hadoop at Twitter (Hadoop Summit 2010)
Hadoop at Twitter (Hadoop Summit 2010)Kevin Weil
 
How Search Works
How Search WorksHow Search Works
How Search WorksAhrefs
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Toolsm_richardson
 
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭台灣資料科學年會
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkFlink Forward
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Miguel Pastor
 
The what, how and why of scaling git repositories
The what, how and why of scaling git repositoriesThe what, how and why of scaling git repositories
The what, how and why of scaling git repositoriesJohan Abildskov
 
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...State of Search Conference
 
Data Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets SalesforceData Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets Salesforceagarciaodeian
 
Start Building SEO Efficiencies with Automation - MNSearch Summit 2018
Start Building SEO Efficiencies with Automation - MNSearch Summit 2018Start Building SEO Efficiencies with Automation - MNSearch Summit 2018
Start Building SEO Efficiencies with Automation - MNSearch Summit 2018Paul Shapiro
 
Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018Maura Teal
 
Search Intelligently - Liferay Symposium North America 2016, Chicago, USA
Search Intelligently - Liferay Symposium North America 2016, Chicago, USASearch Intelligently - Liferay Symposium North America 2016, Chicago, USA
Search Intelligently - Liferay Symposium North America 2016, Chicago, USAAndré Ricardo Barreto de Oliveira
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 PresentationsAna Rebelo
 
Shortening the feedback loop
Shortening the feedback loopShortening the feedback loop
Shortening the feedback loopJosh Baer
 
Mongodb, our Swiss Army Knife Database
Mongodb, our Swiss Army Knife DatabaseMongodb, our Swiss Army Knife Database
Mongodb, our Swiss Army Knife DatabaseMathieu Poumeyrol
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for youTatiana Al-Chueyr
 

Similar to Building a relevance platform with Couchbase and Elasticsearch (20)

How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
Hadoop at Twitter (Hadoop Summit 2010)
Hadoop at Twitter (Hadoop Summit 2010)Hadoop at Twitter (Hadoop Summit 2010)
Hadoop at Twitter (Hadoop Summit 2010)
 
How Search Works
How Search WorksHow Search Works
How Search Works
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
 
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
 
The what, how and why of scaling git repositories
The what, how and why of scaling git repositoriesThe what, how and why of scaling git repositories
The what, how and why of scaling git repositories
 
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
 
Data Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets SalesforceData Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets Salesforce
 
MnSearch Summit 2018 - Paul Shapiro – Start Building SEO Efficiencies with Au...
MnSearch Summit 2018 - Paul Shapiro – Start Building SEO Efficiencies with Au...MnSearch Summit 2018 - Paul Shapiro – Start Building SEO Efficiencies with Au...
MnSearch Summit 2018 - Paul Shapiro – Start Building SEO Efficiencies with Au...
 
Start Building SEO Efficiencies with Automation - MNSearch Summit 2018
Start Building SEO Efficiencies with Automation - MNSearch Summit 2018Start Building SEO Efficiencies with Automation - MNSearch Summit 2018
Start Building SEO Efficiencies with Automation - MNSearch Summit 2018
 
Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018
 
Search Intelligently - Liferay Symposium North America 2016, Chicago, USA
Search Intelligently - Liferay Symposium North America 2016, Chicago, USASearch Intelligently - Liferay Symposium North America 2016, Chicago, USA
Search Intelligently - Liferay Symposium North America 2016, Chicago, USA
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
Shortening the feedback loop
Shortening the feedback loopShortening the feedback loop
Shortening the feedback loop
 
Mongodb, our Swiss Army Knife Database
Mongodb, our Swiss Army Knife DatabaseMongodb, our Swiss Army Knife Database
Mongodb, our Swiss Army Knife Database
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for you
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Building a relevance platform with Couchbase and Elasticsearch

  • 1. OneHippo @ Goto follow the Hippo trail Building a relevance platform with Couchbase and Elasticsearch @jreijn | Hippo #gotoams, June 18
  • 2. follow the Hippo trail OneHippo @ Goto About me • Architect @ Hippo • DevOps guy • Blogger @ http://blog.jeroenreijn.com
  • 3. follow the Hippo trail OneHippo @ Goto About Hippo
  • 4. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Relevance?
  • 5. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto “The capability of a search engine or function to retrieve data appropriate to a user's needs.” http://www.thefreedictionary.com/relevance
  • 6. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto
  • 7. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto How we deliver relevant content @Hippo
  • 8. follow the Hippo trail OneHippo @ Goto Registration Visitor - entity making HTTP requests Collector - records data about a visitor or his behavior Example: location collector (GeoIPCollector) Targeting Data - all data about a specific visitor Example: IP address is located in Amsterdam
  • 9. follow the Hippo trail OneHippo @ Goto Matching Characteristic - a type of fact about visitors Example: "comes from a city", "experiences a type of weather" Target Group - the specification of a Characteristic Example: "comes from a European city", "comes from Amsterdam" Persona - one or more target groups that describe a certain type of visitor Example: "Jim, the European urban consumer", "Alice, the Pet owner"
  • 10. follow the Hippo trail OneHippo @ Goto What do we store? Request log Targeting data Statistics Averages, e.g. how many visitors became which persona
  • 11. follow the Hippo trail OneHippo @ Goto Real-time analysis
  • 12. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Architecture
  • 13. follow the Hippo trail OneHippo @ Goto RDBMS Hippo Delivery Tier Hippo Repository App server XMLJSON (X)HTML
  • 14. follow the Hippo trail OneHippo @ Goto Delivery Tier URL Matching Fetch content Compose output Request Response
  • 15. follow the Hippo trail OneHippo @ Goto Delivery Tier URL Matching Targeting Data Collection Compose output Request Response Fetch content Scoring
  • 16. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Scaling
  • 17. follow the Hippo trail OneHippo @ Goto RDBMS Hippo Delivery Tier Hippo Repository App server Hippo Delivery Tier Hippo Repository App server Scaling out
  • 18. follow the Hippo trail OneHippo @ Goto RDBMS Delivery Tier Repository App server Delivery Tier Repository App server Scaling out Targeting Datastore
  • 19. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto What kind of ‘storage’?
  • 20. follow the Hippo trail OneHippo @ Goto Distributed Cache?
  • 21. follow the Hippo trail OneHippo @ Goto We have a winner!
  • 22. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Requirements change!
  • 23. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto NoSQL to the rescue
  • 24. follow the Hippo trail OneHippo @ Goto Suitable types • Key-value store • Document database
  • 25. follow the Hippo trail OneHippo @ Goto Assessment Criteria Maturity Data model Consistency model PerformanceReplication Caching model Query model Monitoring Scalability Reliability Support
  • 26. follow the Hippo trail OneHippo @ Goto Selection Criteria • Performance! • Scalability • Schema flexibility • Simplicity • Monitoring • Support
  • 27. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Performance !!
  • 28. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Scalability
  • 29. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Schema flexibility
  • 30. follow the Hippo trail OneHippo @ Goto { "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": [] } Request log document
  • 31. follow the Hippo trail OneHippo @ Goto { "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" } } Visitor document
  • 32. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Simplicity
  • 33. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Monitoring
  • 34. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Support
  • 35. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Couchbase
  • 36. follow the Hippo trail OneHippo @ Goto Why Couchbase? • Drop-in replacement for memcached • Read/Write-through cache • High throughput • Easy scalability • Schema flexibility • Low latency
  • 37. follow the Hippo trail OneHippo @ Goto Couchbase • Open Source • Document-oriented • Easy Scalable • Consistent High Performance
  • 38. follow the Hippo trail OneHippo @ Goto Performance • Object managed cache • Write Queue to disk • Avoids Cold Cache
  • 39. follow the Hippo trail OneHippo @ Goto Easy scalable • Auto sharding • Cross cluster replication (XDCR) • Master - Master replication
  • 40. follow the Hippo trail OneHippo @ Goto Flexible data model • Native JSON support • Incremental Map Reduce • Gives power to the developer
  • 41. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto How we run Couchbase @Hippo
  • 42. follow the Hippo trail OneHippo @ Goto Load Balancer Database cluster Hippo Delivery Tier Couchbase cluster •Request log data •Targeting data •Statistics data
  • 43. follow the Hippo trail OneHippo @ Goto Query capabilities • Querying via views • Secondary indexes via views • Views based on Map - Reduce • Lacks some advanced query capabilities
  • 44. follow the Hippo trail OneHippo @ Goto Elasticsearch • Apache Lucene • Designed to be distributed • Schema free • Apache 2 licensed • RESTful API
  • 45. follow the Hippo trail OneHippo @ Goto Added value of ES • Full text search • Faceted search • Geo spatial search • All in (near) real-time
  • 46. follow the Hippo trail OneHippo @ Goto Couchbase Server Cluster Elasticsearch Server Cluster Hippo Delivery Tier Java API Write Read XDCR Couchbase ES Transport plugin Replicating to ES
  • 47. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Demo time!
  • 48. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto What’s Next?
  • 49. follow the Hippo trail OneHippo @ Goto Advanced analytics
  • 50. follow the Hippo trail OneHippo @ Goto OneHippo @ Goto Thank you! Questions? j.reijn@onehippo.com @jreijn ps. We’re hiring!