SlideShare uma empresa Scribd logo
1 de 53
Baixar para ler offline
Building a relevance platform
with Couchbase and
Elasticsearch
Hippo GetTogether, 21 June 2013
Jeroen Reijn | @jreijn | #hgt2013
Hippo GetTogether 2013
follow the Hippo trail
follow the Hippo trail
Hippo GetTogether 2013
About me
• Architect @ Hippo
• DevOps guy
• Blogger @ http://blog.jeroenreijn.com
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Relevance?
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
“The capability of a search
engine or function to
retrieve data appropriate
to a user's needs.”
http://www.thefreedictionary.com/relevance
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
How we deliver
relevant content
@Hippo
follow the Hippo trail
Hippo GetTogether 2013
Registration
Visitor - entity making HTTP requests
Collector - records data about a visitor or his behavior
Example: location collector (GeoIPCollector)
Targeting Data - all data about a specific visitor
Example: IP address is located in Amsterdam
follow the Hippo trail
Hippo GetTogether 2013
Matching
Characteristic - a type of fact about visitors
Example: "comes from a city", "experiences a type of
weather"
Target Group - the specification of a Characteristic
Example: "comes from a European city", "comes from
Amsterdam"
Persona - one or more target groups that describe a
certain type of visitor
Example: "Jim, the European urban consumer",
"Alice, the Pet owner"
follow the Hippo trail
Hippo GetTogether 2013
What do we store?
Request log
Targeting data
Statistics
Averages, e.g. how many visitors became which persona
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
BIG DATA !!
follow the Hippo trail
Hippo GetTogether 2013
Real-time analysis
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Architecture
follow the Hippo trail
Hippo GetTogether 2013
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
XMLJSON (X)HTML
follow the Hippo trail
Hippo GetTogether 2013
Delivery Tier
URL Matching
Fetch content
Compose output
Request
Response
follow the Hippo trail
Hippo GetTogether 2013
Delivery Tier
URL Matching
Targeting Data Collection
Compose output
Request
Response
Fetch content
Scoring
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Scaling
follow the Hippo trail
Hippo GetTogether 2013
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
Hippo Delivery Tier
Hippo Repository
App server
Scaling out
follow the Hippo trail
Hippo GetTogether 2013
RDBMS
Delivery Tier
Repository
App server
Delivery Tier
Repository
App server
Scaling out
Targeting
Datastore
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
What kind of ‘storage’?
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Question?
follow the Hippo trail
Hippo GetTogether 2013
Distributed Cache?
follow the Hippo trail
Hippo GetTogether 2013
We have a winner!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Requirements
change!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
NoSQL to the rescue
follow the Hippo trail
Hippo GetTogether 2013
Suitable types
• Key-value store
• Document database
follow the Hippo trail
Hippo GetTogether 2013
Assessment Criteria
Maturity Data model
Consistency model
PerformanceReplication
Caching model Query model
Monitoring
Scalability
Reliability
Support
follow the Hippo trail
Hippo GetTogether 2013
Selection Criteria
• Performance
• Scalability
• Schema flexibility
• Simplicity
• Monitoring
• Support
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Performance !!
Performance !!!!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Scalability
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Schema flexibility
follow the Hippo trail
Hippo GetTogether 2013
{
"visitorId": "7a1c7e75-8539-40",
"pageUrl": "http://localhost:8080/site/news",
"pathInfo": "/news",
"remoteAddr": "127.0.0.1",
"referer": "http://localhost:8080/site/",
"timestamp": 1371419505909,
"collectorData": {
"geo": {
"country": "",
"city": "",
"latitude": 0,
"longitude": 0
},
"returningvisitor": false,
"channel": "English Website"
},
"personaIdScores": [],
"globalPersonaIdScores": []
}
Request log document
follow the Hippo trail
Hippo GetTogether 2013
{
"geo": {
"collectorId": "geo",
"city": "",
"country": "",
"latitude": 0,
"longitude": 0
},
"channel": {
"collectorId": "channel",
"channels": [
"English Website"
],
"lastVisitedChannel": "English Website"
}
}
Visitor document
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Simplicity
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Monitoring
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Support
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Couchbase
follow the Hippo trail
Hippo GetTogether 2013
Why Couchbase?
• Drop-in replacement for memcached
• Read/Write-through cache
• High throughput
• Easy scalability
• Schema flexibility
• Low latency
follow the Hippo trail
Hippo GetTogether 2013
Couchbase
• Open Source
• Document-oriented
• Easy Scalable
• Consistent High Performance
• Apache license
follow the Hippo trail
Hippo GetTogether 2013
Performance
• Object managed cache
• Write Queue to disk
• Avoids Cold Cache
follow the Hippo trail
Hippo GetTogether 2013
Source: http://www.slideshare.net/Couchbase/benchmarking-couchbase
Copyright © Altoros Systems, Inc.
follow the Hippo trail
Hippo GetTogether 2013
Easy scalable
• Auto sharding
• Cross cluster replication (XDCR)
• Master - Master replication
follow the Hippo trail
Hippo GetTogether 2013
Flexible data model
• Native JSON support
• Incremental Map Reduce
• Gives power to the developer
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
How we run
Couchbase @Hippo
follow the Hippo trail
Hippo GetTogether 2013
Load Balancer
Database cluster
Hippo Delivery Tier
Couchbase cluster
•Request log data
•Targeting data
•Statistics data
follow the Hippo trail
Hippo GetTogether 2013
Query capabilities
• Querying via views
• Secondary indexes via views
• Views based on Map - Reduce
• Lacks some advanced query capabilities
follow the Hippo trail
Hippo GetTogether 2013
Elasticsearch
• Apache Lucene
• Designed to be distributed
• Schema free
• Apache license
• RESTful API
follow the Hippo trail
Hippo GetTogether 2013
Added value of ES
• Full text search
• Faceted search
• Geo spatial search
• All in (near) real-time
follow the Hippo trail
Hippo GetTogether 2013
Couchbase Server Cluster Elasticsearch Server Cluster
Hippo Delivery Tier
Java API
Write
Read
XDCR Couchbase ES
Transport plugin
Replicating to ES
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
What’s Next?
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
What’s Next?
follow the Hippo trail
Hippo GetTogether 2013
Advanced analytics
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Demo time!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Thank you!
Questions?
j.reijn@onehippo.com | @jreijn
ps. We’re hiring!

Mais conteúdo relacionado

Mais procurados

What's the story with Open Source?
What's the story with Open Source? What's the story with Open Source?
What's the story with Open Source?
Charlie Hull
 
The Future of Search and SEO in Drupal
The Future of Search and SEO in DrupalThe Future of Search and SEO in Drupal
The Future of Search and SEO in Drupal
scorlosquet
 

Mais procurados (20)

Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big Data
 
The (IPv6) Internet in Romania - RIPE NCC Data and Tools
The (IPv6) Internet in Romania - RIPE NCC Data and ToolsThe (IPv6) Internet in Romania - RIPE NCC Data and Tools
The (IPv6) Internet in Romania - RIPE NCC Data and Tools
 
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
 
Drupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP WebinarDrupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP Webinar
 
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
 
Librecon 2016 bilbao: kappa architecture IoT of the cars
Librecon 2016 bilbao:   kappa architecture IoT of the carsLibrecon 2016 bilbao:   kappa architecture IoT of the cars
Librecon 2016 bilbao: kappa architecture IoT of the cars
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph Technology
 
ElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der CloudsElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der Clouds
 
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open WebKafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
 
Turning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareTurning search upside down with powerful open source search software
Turning search upside down with powerful open source search software
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforward
 
APNIC Hackathon The Lord of IPv6
APNIC Hackathon The Lord of IPv6APNIC Hackathon The Lord of IPv6
APNIC Hackathon The Lord of IPv6
 
Boosting big data with apache spark
Boosting big data with apache sparkBoosting big data with apache spark
Boosting big data with apache spark
 
Real time ads personalization @ Spotify
Real time ads personalization @ SpotifyReal time ads personalization @ Spotify
Real time ads personalization @ Spotify
 
Why contribute to open source projects
Why contribute to open source projectsWhy contribute to open source projects
Why contribute to open source projects
 
What's the story with Open Source?
What's the story with Open Source? What's the story with Open Source?
What's the story with Open Source?
 
Apache Bahir
Apache BahirApache Bahir
Apache Bahir
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at Airbnb
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
The Future of Search and SEO in Drupal
The Future of Search and SEO in DrupalThe Future of Search and SEO in Drupal
The Future of Search and SEO in Drupal
 

Destaque

Basic web application development with Apache Cocoon 2.1
Basic web application development with  Apache Cocoon 2.1Basic web application development with  Apache Cocoon 2.1
Basic web application development with Apache Cocoon 2.1
Jeroen Reijn
 

Destaque (6)

Real-time visitor analysis with Couchbase and Elastichsearch
Real-time visitor analysis with Couchbase and ElastichsearchReal-time visitor analysis with Couchbase and Elastichsearch
Real-time visitor analysis with Couchbase and Elastichsearch
 
Basic web application development with Apache Cocoon 2.1
Basic web application development with  Apache Cocoon 2.1Basic web application development with  Apache Cocoon 2.1
Basic web application development with Apache Cocoon 2.1
 
Continuous Delivery in a content centric world
Continuous Delivery in a content centric worldContinuous Delivery in a content centric world
Continuous Delivery in a content centric world
 
Introducing Hippo CMS 10.2
Introducing Hippo CMS 10.2Introducing Hippo CMS 10.2
Introducing Hippo CMS 10.2
 
Account based marketing - targeting key accounts with 1-2-1 marketing programmes
Account based marketing - targeting key accounts with 1-2-1 marketing programmesAccount based marketing - targeting key accounts with 1-2-1 marketing programmes
Account based marketing - targeting key accounts with 1-2-1 marketing programmes
 
Shootout! Template engines for the JVM
Shootout! Template engines for the JVMShootout! Template engines for the JVM
Shootout! Template engines for the JVM
 

Semelhante a Hippo GetTogether: The architecture behind Hippos relevance platform

How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
MongoDB
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
m_richardson
 

Semelhante a Hippo GetTogether: The architecture behind Hippos relevance platform (20)

使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
Generative AI Study Group_2ndSesssion_20230620.pdf
Generative AI Study Group_2ndSesssion_20230620.pdfGenerative AI Study Group_2ndSesssion_20230620.pdf
Generative AI Study Group_2ndSesssion_20230620.pdf
 
How Search Works
How Search WorksHow Search Works
How Search Works
 
Data Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets SalesforceData Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets Salesforce
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for you
 
Data Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets SalesforceData Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets Salesforce
 
Data Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceData Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets Salesforce
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
 
PageSpeed and SPDY
PageSpeed and SPDYPageSpeed and SPDY
PageSpeed and SPDY
 
Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018
 
Altic's big analytics stack, Charly Clairmont, Altic.
Altic's big analytics stack, Charly Clairmont, Altic.Altic's big analytics stack, Charly Clairmont, Altic.
Altic's big analytics stack, Charly Clairmont, Altic.
 
Dave's Wellcome Library digitisation presentation
Dave's Wellcome Library digitisation presentationDave's Wellcome Library digitisation presentation
Dave's Wellcome Library digitisation presentation
 
Linked Data and Search: Thomas Steiner (Google Inc, Germany)
Linked Data and Search:  Thomas Steiner (Google Inc, Germany)Linked Data and Search:  Thomas Steiner (Google Inc, Germany)
Linked Data and Search: Thomas Steiner (Google Inc, Germany)
 
Productionizing Data Science at Experience
Productionizing Data Science at ExperienceProductionizing Data Science at Experience
Productionizing Data Science at Experience
 
The what, how and why of scaling git repositories
The what, how and why of scaling git repositoriesThe what, how and why of scaling git repositories
The what, how and why of scaling git repositories
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Hippo GetTogether: The architecture behind Hippos relevance platform

  • 1. Building a relevance platform with Couchbase and Elasticsearch Hippo GetTogether, 21 June 2013 Jeroen Reijn | @jreijn | #hgt2013 Hippo GetTogether 2013 follow the Hippo trail
  • 2. follow the Hippo trail Hippo GetTogether 2013 About me • Architect @ Hippo • DevOps guy • Blogger @ http://blog.jeroenreijn.com
  • 3. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Relevance?
  • 4. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto “The capability of a search engine or function to retrieve data appropriate to a user's needs.” http://www.thefreedictionary.com/relevance
  • 5. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto
  • 6. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto How we deliver relevant content @Hippo
  • 7. follow the Hippo trail Hippo GetTogether 2013 Registration Visitor - entity making HTTP requests Collector - records data about a visitor or his behavior Example: location collector (GeoIPCollector) Targeting Data - all data about a specific visitor Example: IP address is located in Amsterdam
  • 8. follow the Hippo trail Hippo GetTogether 2013 Matching Characteristic - a type of fact about visitors Example: "comes from a city", "experiences a type of weather" Target Group - the specification of a Characteristic Example: "comes from a European city", "comes from Amsterdam" Persona - one or more target groups that describe a certain type of visitor Example: "Jim, the European urban consumer", "Alice, the Pet owner"
  • 9. follow the Hippo trail Hippo GetTogether 2013 What do we store? Request log Targeting data Statistics Averages, e.g. how many visitors became which persona
  • 10. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto BIG DATA !!
  • 11. follow the Hippo trail Hippo GetTogether 2013 Real-time analysis
  • 12. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Architecture
  • 13. follow the Hippo trail Hippo GetTogether 2013 RDBMS Hippo Delivery Tier Hippo Repository App server XMLJSON (X)HTML
  • 14. follow the Hippo trail Hippo GetTogether 2013 Delivery Tier URL Matching Fetch content Compose output Request Response
  • 15. follow the Hippo trail Hippo GetTogether 2013 Delivery Tier URL Matching Targeting Data Collection Compose output Request Response Fetch content Scoring
  • 16. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Scaling
  • 17. follow the Hippo trail Hippo GetTogether 2013 RDBMS Hippo Delivery Tier Hippo Repository App server Hippo Delivery Tier Hippo Repository App server Scaling out
  • 18. follow the Hippo trail Hippo GetTogether 2013 RDBMS Delivery Tier Repository App server Delivery Tier Repository App server Scaling out Targeting Datastore
  • 19. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto What kind of ‘storage’?
  • 20. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Question?
  • 21. follow the Hippo trail Hippo GetTogether 2013 Distributed Cache?
  • 22. follow the Hippo trail Hippo GetTogether 2013 We have a winner!
  • 23. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Requirements change!
  • 24. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto NoSQL to the rescue
  • 25. follow the Hippo trail Hippo GetTogether 2013 Suitable types • Key-value store • Document database
  • 26. follow the Hippo trail Hippo GetTogether 2013 Assessment Criteria Maturity Data model Consistency model PerformanceReplication Caching model Query model Monitoring Scalability Reliability Support
  • 27. follow the Hippo trail Hippo GetTogether 2013 Selection Criteria • Performance • Scalability • Schema flexibility • Simplicity • Monitoring • Support
  • 28. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Performance !! Performance !!!!
  • 29. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Scalability
  • 30. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Schema flexibility
  • 31. follow the Hippo trail Hippo GetTogether 2013 { "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": [] } Request log document
  • 32. follow the Hippo trail Hippo GetTogether 2013 { "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" } } Visitor document
  • 33. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Simplicity
  • 34. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Monitoring
  • 35. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Support
  • 36. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Couchbase
  • 37. follow the Hippo trail Hippo GetTogether 2013 Why Couchbase? • Drop-in replacement for memcached • Read/Write-through cache • High throughput • Easy scalability • Schema flexibility • Low latency
  • 38. follow the Hippo trail Hippo GetTogether 2013 Couchbase • Open Source • Document-oriented • Easy Scalable • Consistent High Performance • Apache license
  • 39. follow the Hippo trail Hippo GetTogether 2013 Performance • Object managed cache • Write Queue to disk • Avoids Cold Cache
  • 40. follow the Hippo trail Hippo GetTogether 2013 Source: http://www.slideshare.net/Couchbase/benchmarking-couchbase Copyright © Altoros Systems, Inc.
  • 41. follow the Hippo trail Hippo GetTogether 2013 Easy scalable • Auto sharding • Cross cluster replication (XDCR) • Master - Master replication
  • 42. follow the Hippo trail Hippo GetTogether 2013 Flexible data model • Native JSON support • Incremental Map Reduce • Gives power to the developer
  • 43. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto How we run Couchbase @Hippo
  • 44. follow the Hippo trail Hippo GetTogether 2013 Load Balancer Database cluster Hippo Delivery Tier Couchbase cluster •Request log data •Targeting data •Statistics data
  • 45. follow the Hippo trail Hippo GetTogether 2013 Query capabilities • Querying via views • Secondary indexes via views • Views based on Map - Reduce • Lacks some advanced query capabilities
  • 46. follow the Hippo trail Hippo GetTogether 2013 Elasticsearch • Apache Lucene • Designed to be distributed • Schema free • Apache license • RESTful API
  • 47. follow the Hippo trail Hippo GetTogether 2013 Added value of ES • Full text search • Faceted search • Geo spatial search • All in (near) real-time
  • 48. follow the Hippo trail Hippo GetTogether 2013 Couchbase Server Cluster Elasticsearch Server Cluster Hippo Delivery Tier Java API Write Read XDCR Couchbase ES Transport plugin Replicating to ES
  • 49. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto What’s Next?
  • 50. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto What’s Next?
  • 51. follow the Hippo trail Hippo GetTogether 2013 Advanced analytics
  • 52. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Demo time!
  • 53. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Thank you! Questions? j.reijn@onehippo.com | @jreijn ps. We’re hiring!