SlideShare uma empresa Scribd logo
1 de 53
Baixar para ler offline
Building a relevance platform
with Couchbase and
Elasticsearch
Hippo GetTogether, 21 June 2013
Jeroen Reijn | @jreijn | #hgt2013
Hippo GetTogether 2013
follow the Hippo trail
follow the Hippo trail
Hippo GetTogether 2013
About me
• Architect @ Hippo
• DevOps guy
• Blogger @ http://blog.jeroenreijn.com
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Relevance?
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
“The capability of a search
engine or function to
retrieve data appropriate
to a user's needs.”
http://www.thefreedictionary.com/relevance
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
How we deliver
relevant content
@Hippo
follow the Hippo trail
Hippo GetTogether 2013
Registration
Visitor - entity making HTTP requests
Collector - records data about a visitor or his behavior
Example: location collector (GeoIPCollector)
Targeting Data - all data about a specific visitor
Example: IP address is located in Amsterdam
follow the Hippo trail
Hippo GetTogether 2013
Matching
Characteristic - a type of fact about visitors
Example: "comes from a city", "experiences a type of
weather"
Target Group - the specification of a Characteristic
Example: "comes from a European city", "comes from
Amsterdam"
Persona - one or more target groups that describe a
certain type of visitor
Example: "Jim, the European urban consumer",
"Alice, the Pet owner"
follow the Hippo trail
Hippo GetTogether 2013
What do we store?
Request log
Targeting data
Statistics
Averages, e.g. how many visitors became which persona
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
BIG DATA !!
follow the Hippo trail
Hippo GetTogether 2013
Real-time analysis
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Architecture
follow the Hippo trail
Hippo GetTogether 2013
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
XMLJSON (X)HTML
follow the Hippo trail
Hippo GetTogether 2013
Delivery Tier
URL Matching
Fetch content
Compose output
Request
Response
follow the Hippo trail
Hippo GetTogether 2013
Delivery Tier
URL Matching
Targeting Data Collection
Compose output
Request
Response
Fetch content
Scoring
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Scaling
follow the Hippo trail
Hippo GetTogether 2013
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
Hippo Delivery Tier
Hippo Repository
App server
Scaling out
follow the Hippo trail
Hippo GetTogether 2013
RDBMS
Delivery Tier
Repository
App server
Delivery Tier
Repository
App server
Scaling out
Targeting
Datastore
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
What kind of ‘storage’?
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Question?
follow the Hippo trail
Hippo GetTogether 2013
Distributed Cache?
follow the Hippo trail
Hippo GetTogether 2013
We have a winner!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Requirements
change!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
NoSQL to the rescue
follow the Hippo trail
Hippo GetTogether 2013
Suitable types
• Key-value store
• Document database
follow the Hippo trail
Hippo GetTogether 2013
Assessment Criteria
Maturity Data model
Consistency model
PerformanceReplication
Caching model Query model
Monitoring
Scalability
Reliability
Support
follow the Hippo trail
Hippo GetTogether 2013
Selection Criteria
• Performance
• Scalability
• Schema flexibility
• Simplicity
• Monitoring
• Support
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Performance !!
Performance !!!!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Scalability
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Schema flexibility
follow the Hippo trail
Hippo GetTogether 2013
{
"visitorId": "7a1c7e75-8539-40",
"pageUrl": "http://localhost:8080/site/news",
"pathInfo": "/news",
"remoteAddr": "127.0.0.1",
"referer": "http://localhost:8080/site/",
"timestamp": 1371419505909,
"collectorData": {
"geo": {
"country": "",
"city": "",
"latitude": 0,
"longitude": 0
},
"returningvisitor": false,
"channel": "English Website"
},
"personaIdScores": [],
"globalPersonaIdScores": []
}
Request log document
follow the Hippo trail
Hippo GetTogether 2013
{
"geo": {
"collectorId": "geo",
"city": "",
"country": "",
"latitude": 0,
"longitude": 0
},
"channel": {
"collectorId": "channel",
"channels": [
"English Website"
],
"lastVisitedChannel": "English Website"
}
}
Visitor document
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Simplicity
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Monitoring
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Support
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Couchbase
follow the Hippo trail
Hippo GetTogether 2013
Why Couchbase?
• Drop-in replacement for memcached
• Read/Write-through cache
• High throughput
• Easy scalability
• Schema flexibility
• Low latency
follow the Hippo trail
Hippo GetTogether 2013
Couchbase
• Open Source
• Document-oriented
• Easy Scalable
• Consistent High Performance
• Apache license
follow the Hippo trail
Hippo GetTogether 2013
Performance
• Object managed cache
• Write Queue to disk
• Avoids Cold Cache
follow the Hippo trail
Hippo GetTogether 2013
Source: http://www.slideshare.net/Couchbase/benchmarking-couchbase
Copyright © Altoros Systems, Inc.
follow the Hippo trail
Hippo GetTogether 2013
Easy scalable
• Auto sharding
• Cross cluster replication (XDCR)
• Master - Master replication
follow the Hippo trail
Hippo GetTogether 2013
Flexible data model
• Native JSON support
• Incremental Map Reduce
• Gives power to the developer
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
How we run
Couchbase @Hippo
follow the Hippo trail
Hippo GetTogether 2013
Load Balancer
Database cluster
Hippo Delivery Tier
Couchbase cluster
•Request log data
•Targeting data
•Statistics data
follow the Hippo trail
Hippo GetTogether 2013
Query capabilities
• Querying via views
• Secondary indexes via views
• Views based on Map - Reduce
• Lacks some advanced query capabilities
follow the Hippo trail
Hippo GetTogether 2013
Elasticsearch
• Apache Lucene
• Designed to be distributed
• Schema free
• Apache license
• RESTful API
follow the Hippo trail
Hippo GetTogether 2013
Added value of ES
• Full text search
• Faceted search
• Geo spatial search
• All in (near) real-time
follow the Hippo trail
Hippo GetTogether 2013
Couchbase Server Cluster Elasticsearch Server Cluster
Hippo Delivery Tier
Java API
Write
Read
XDCR Couchbase ES
Transport plugin
Replicating to ES
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
What’s Next?
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
What’s Next?
follow the Hippo trail
Hippo GetTogether 2013
Advanced analytics
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Demo time!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Thank you!
Questions?
j.reijn@onehippo.com | @jreijn
ps. We’re hiring!

Mais conteúdo relacionado

Mais procurados

Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big DataInfoFarm
 
The (IPv6) Internet in Romania - RIPE NCC Data and Tools
The (IPv6) Internet in Romania - RIPE NCC Data and ToolsThe (IPv6) Internet in Romania - RIPE NCC Data and Tools
The (IPv6) Internet in Romania - RIPE NCC Data and ToolsRIPE NCC
 
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Data Con LA
 
Drupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP WebinarDrupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP Webinarscorlosquet
 
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...Charlie Hull
 
Librecon 2016 bilbao: kappa architecture IoT of the cars
Librecon 2016 bilbao:   kappa architecture IoT of the carsLibrecon 2016 bilbao:   kappa architecture IoT of the cars
Librecon 2016 bilbao: kappa architecture IoT of the carsJuantomás García Molina
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyJason Plurad
 
ElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der CloudsElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der Cloudsinovex GmbH
 
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open WebKafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open WebHostedbyConfluent
 
Turning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareTurning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareCharlie Hull
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardDemai Ni
 
APNIC Hackathon The Lord of IPv6
APNIC Hackathon The Lord of IPv6APNIC Hackathon The Lord of IPv6
APNIC Hackathon The Lord of IPv6Siena Perry
 
Boosting big data with apache spark
Boosting big data with apache sparkBoosting big data with apache spark
Boosting big data with apache sparkInfoFarm
 
Real time ads personalization @ Spotify
Real time ads personalization @ SpotifyReal time ads personalization @ Spotify
Real time ads personalization @ SpotifyKinshuk Mishra
 
Why contribute to open source projects
Why contribute to open source projectsWhy contribute to open source projects
Why contribute to open source projectsKranti Parisa
 
What's the story with Open Source?
What's the story with Open Source? What's the story with Open Source?
What's the story with Open Source? Charlie Hull
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at AirbnbHao Wang
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraphJason Plurad
 
The Future of Search and SEO in Drupal
The Future of Search and SEO in DrupalThe Future of Search and SEO in Drupal
The Future of Search and SEO in Drupalscorlosquet
 

Mais procurados (20)

Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big Data
 
The (IPv6) Internet in Romania - RIPE NCC Data and Tools
The (IPv6) Internet in Romania - RIPE NCC Data and ToolsThe (IPv6) Internet in Romania - RIPE NCC Data and Tools
The (IPv6) Internet in Romania - RIPE NCC Data and Tools
 
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
 
Drupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP WebinarDrupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP Webinar
 
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
 
Librecon 2016 bilbao: kappa architecture IoT of the cars
Librecon 2016 bilbao:   kappa architecture IoT of the carsLibrecon 2016 bilbao:   kappa architecture IoT of the cars
Librecon 2016 bilbao: kappa architecture IoT of the cars
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph Technology
 
ElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der CloudsElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der Clouds
 
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open WebKafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
 
Turning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareTurning search upside down with powerful open source search software
Turning search upside down with powerful open source search software
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforward
 
APNIC Hackathon The Lord of IPv6
APNIC Hackathon The Lord of IPv6APNIC Hackathon The Lord of IPv6
APNIC Hackathon The Lord of IPv6
 
Boosting big data with apache spark
Boosting big data with apache sparkBoosting big data with apache spark
Boosting big data with apache spark
 
Real time ads personalization @ Spotify
Real time ads personalization @ SpotifyReal time ads personalization @ Spotify
Real time ads personalization @ Spotify
 
Why contribute to open source projects
Why contribute to open source projectsWhy contribute to open source projects
Why contribute to open source projects
 
What's the story with Open Source?
What's the story with Open Source? What's the story with Open Source?
What's the story with Open Source?
 
Apache Bahir
Apache BahirApache Bahir
Apache Bahir
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at Airbnb
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
The Future of Search and SEO in Drupal
The Future of Search and SEO in DrupalThe Future of Search and SEO in Drupal
The Future of Search and SEO in Drupal
 

Destaque

Real-time visitor analysis with Couchbase and Elastichsearch
Real-time visitor analysis with Couchbase and ElastichsearchReal-time visitor analysis with Couchbase and Elastichsearch
Real-time visitor analysis with Couchbase and ElastichsearchJeroen Reijn
 
Basic web application development with Apache Cocoon 2.1
Basic web application development with  Apache Cocoon 2.1Basic web application development with  Apache Cocoon 2.1
Basic web application development with Apache Cocoon 2.1Jeroen Reijn
 
Continuous Delivery in a content centric world
Continuous Delivery in a content centric worldContinuous Delivery in a content centric world
Continuous Delivery in a content centric worldJeroen Reijn
 
Introducing Hippo CMS 10.2
Introducing Hippo CMS 10.2Introducing Hippo CMS 10.2
Introducing Hippo CMS 10.2Hippo
 
Account based marketing - targeting key accounts with 1-2-1 marketing programmes
Account based marketing - targeting key accounts with 1-2-1 marketing programmesAccount based marketing - targeting key accounts with 1-2-1 marketing programmes
Account based marketing - targeting key accounts with 1-2-1 marketing programmesThe Marketing Practice
 
Shootout! Template engines for the JVM
Shootout! Template engines for the JVMShootout! Template engines for the JVM
Shootout! Template engines for the JVMJeroen Reijn
 

Destaque (6)

Real-time visitor analysis with Couchbase and Elastichsearch
Real-time visitor analysis with Couchbase and ElastichsearchReal-time visitor analysis with Couchbase and Elastichsearch
Real-time visitor analysis with Couchbase and Elastichsearch
 
Basic web application development with Apache Cocoon 2.1
Basic web application development with  Apache Cocoon 2.1Basic web application development with  Apache Cocoon 2.1
Basic web application development with Apache Cocoon 2.1
 
Continuous Delivery in a content centric world
Continuous Delivery in a content centric worldContinuous Delivery in a content centric world
Continuous Delivery in a content centric world
 
Introducing Hippo CMS 10.2
Introducing Hippo CMS 10.2Introducing Hippo CMS 10.2
Introducing Hippo CMS 10.2
 
Account based marketing - targeting key accounts with 1-2-1 marketing programmes
Account based marketing - targeting key accounts with 1-2-1 marketing programmesAccount based marketing - targeting key accounts with 1-2-1 marketing programmes
Account based marketing - targeting key accounts with 1-2-1 marketing programmes
 
Shootout! Template engines for the JVM
Shootout! Template engines for the JVMShootout! Template engines for the JVM
Shootout! Template engines for the JVM
 

Semelhante a Hippo GetTogether: The architecture behind Hippos relevance platform

使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭台灣資料科學年會
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Tugdual Grall
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
Generative AI Study Group_2ndSesssion_20230620.pdf
Generative AI Study Group_2ndSesssion_20230620.pdfGenerative AI Study Group_2ndSesssion_20230620.pdf
Generative AI Study Group_2ndSesssion_20230620.pdfKunihiroSugiyama1
 
How Search Works
How Search WorksHow Search Works
How Search WorksAhrefs
 
Data Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets SalesforceData Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets Salesforceagarciaodeian
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...MongoDB
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for youTatiana Al-Chueyr
 
Data Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets SalesforceData Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets SalesforceSalesforce Developers
 
Data Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceData Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceCarolEnLaNube
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Toolsm_richardson
 
PageSpeed and SPDY
PageSpeed and SPDYPageSpeed and SPDY
PageSpeed and SPDYBlake Crosby
 
Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018Maura Teal
 
Altic's big analytics stack, Charly Clairmont, Altic.
Altic's big analytics stack, Charly Clairmont, Altic.Altic's big analytics stack, Charly Clairmont, Altic.
Altic's big analytics stack, Charly Clairmont, Altic.OW2
 
Linked Data and Search: Thomas Steiner (Google Inc, Germany)
Linked Data and Search:  Thomas Steiner (Google Inc, Germany)Linked Data and Search:  Thomas Steiner (Google Inc, Germany)
Linked Data and Search: Thomas Steiner (Google Inc, Germany)FIA2010
 
Productionizing Data Science at Experience
Productionizing Data Science at ExperienceProductionizing Data Science at Experience
Productionizing Data Science at ExperienceMatt Mills
 
The what, how and why of scaling git repositories
The what, how and why of scaling git repositoriesThe what, how and why of scaling git repositories
The what, how and why of scaling git repositoriesJohan Abildskov
 
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...State of Search Conference
 

Semelhante a Hippo GetTogether: The architecture behind Hippos relevance platform (20)

使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
Generative AI Study Group_2ndSesssion_20230620.pdf
Generative AI Study Group_2ndSesssion_20230620.pdfGenerative AI Study Group_2ndSesssion_20230620.pdf
Generative AI Study Group_2ndSesssion_20230620.pdf
 
How Search Works
How Search WorksHow Search Works
How Search Works
 
Data Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets SalesforceData Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets Salesforce
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for you
 
Data Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets SalesforceData Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets Salesforce
 
Data Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceData Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets Salesforce
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
 
PageSpeed and SPDY
PageSpeed and SPDYPageSpeed and SPDY
PageSpeed and SPDY
 
Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018
 
Altic's big analytics stack, Charly Clairmont, Altic.
Altic's big analytics stack, Charly Clairmont, Altic.Altic's big analytics stack, Charly Clairmont, Altic.
Altic's big analytics stack, Charly Clairmont, Altic.
 
Dave's Wellcome Library digitisation presentation
Dave's Wellcome Library digitisation presentationDave's Wellcome Library digitisation presentation
Dave's Wellcome Library digitisation presentation
 
Linked Data and Search: Thomas Steiner (Google Inc, Germany)
Linked Data and Search:  Thomas Steiner (Google Inc, Germany)Linked Data and Search:  Thomas Steiner (Google Inc, Germany)
Linked Data and Search: Thomas Steiner (Google Inc, Germany)
 
Productionizing Data Science at Experience
Productionizing Data Science at ExperienceProductionizing Data Science at Experience
Productionizing Data Science at Experience
 
The what, how and why of scaling git repositories
The what, how and why of scaling git repositoriesThe what, how and why of scaling git repositories
The what, how and why of scaling git repositories
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
 

Último

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Último (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Hippo GetTogether: The architecture behind Hippos relevance platform

  • 1. Building a relevance platform with Couchbase and Elasticsearch Hippo GetTogether, 21 June 2013 Jeroen Reijn | @jreijn | #hgt2013 Hippo GetTogether 2013 follow the Hippo trail
  • 2. follow the Hippo trail Hippo GetTogether 2013 About me • Architect @ Hippo • DevOps guy • Blogger @ http://blog.jeroenreijn.com
  • 3. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Relevance?
  • 4. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto “The capability of a search engine or function to retrieve data appropriate to a user's needs.” http://www.thefreedictionary.com/relevance
  • 5. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto
  • 6. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto How we deliver relevant content @Hippo
  • 7. follow the Hippo trail Hippo GetTogether 2013 Registration Visitor - entity making HTTP requests Collector - records data about a visitor or his behavior Example: location collector (GeoIPCollector) Targeting Data - all data about a specific visitor Example: IP address is located in Amsterdam
  • 8. follow the Hippo trail Hippo GetTogether 2013 Matching Characteristic - a type of fact about visitors Example: "comes from a city", "experiences a type of weather" Target Group - the specification of a Characteristic Example: "comes from a European city", "comes from Amsterdam" Persona - one or more target groups that describe a certain type of visitor Example: "Jim, the European urban consumer", "Alice, the Pet owner"
  • 9. follow the Hippo trail Hippo GetTogether 2013 What do we store? Request log Targeting data Statistics Averages, e.g. how many visitors became which persona
  • 10. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto BIG DATA !!
  • 11. follow the Hippo trail Hippo GetTogether 2013 Real-time analysis
  • 12. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Architecture
  • 13. follow the Hippo trail Hippo GetTogether 2013 RDBMS Hippo Delivery Tier Hippo Repository App server XMLJSON (X)HTML
  • 14. follow the Hippo trail Hippo GetTogether 2013 Delivery Tier URL Matching Fetch content Compose output Request Response
  • 15. follow the Hippo trail Hippo GetTogether 2013 Delivery Tier URL Matching Targeting Data Collection Compose output Request Response Fetch content Scoring
  • 16. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Scaling
  • 17. follow the Hippo trail Hippo GetTogether 2013 RDBMS Hippo Delivery Tier Hippo Repository App server Hippo Delivery Tier Hippo Repository App server Scaling out
  • 18. follow the Hippo trail Hippo GetTogether 2013 RDBMS Delivery Tier Repository App server Delivery Tier Repository App server Scaling out Targeting Datastore
  • 19. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto What kind of ‘storage’?
  • 20. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Question?
  • 21. follow the Hippo trail Hippo GetTogether 2013 Distributed Cache?
  • 22. follow the Hippo trail Hippo GetTogether 2013 We have a winner!
  • 23. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Requirements change!
  • 24. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto NoSQL to the rescue
  • 25. follow the Hippo trail Hippo GetTogether 2013 Suitable types • Key-value store • Document database
  • 26. follow the Hippo trail Hippo GetTogether 2013 Assessment Criteria Maturity Data model Consistency model PerformanceReplication Caching model Query model Monitoring Scalability Reliability Support
  • 27. follow the Hippo trail Hippo GetTogether 2013 Selection Criteria • Performance • Scalability • Schema flexibility • Simplicity • Monitoring • Support
  • 28. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Performance !! Performance !!!!
  • 29. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Scalability
  • 30. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Schema flexibility
  • 31. follow the Hippo trail Hippo GetTogether 2013 { "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": [] } Request log document
  • 32. follow the Hippo trail Hippo GetTogether 2013 { "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" } } Visitor document
  • 33. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Simplicity
  • 34. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Monitoring
  • 35. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Support
  • 36. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Couchbase
  • 37. follow the Hippo trail Hippo GetTogether 2013 Why Couchbase? • Drop-in replacement for memcached • Read/Write-through cache • High throughput • Easy scalability • Schema flexibility • Low latency
  • 38. follow the Hippo trail Hippo GetTogether 2013 Couchbase • Open Source • Document-oriented • Easy Scalable • Consistent High Performance • Apache license
  • 39. follow the Hippo trail Hippo GetTogether 2013 Performance • Object managed cache • Write Queue to disk • Avoids Cold Cache
  • 40. follow the Hippo trail Hippo GetTogether 2013 Source: http://www.slideshare.net/Couchbase/benchmarking-couchbase Copyright © Altoros Systems, Inc.
  • 41. follow the Hippo trail Hippo GetTogether 2013 Easy scalable • Auto sharding • Cross cluster replication (XDCR) • Master - Master replication
  • 42. follow the Hippo trail Hippo GetTogether 2013 Flexible data model • Native JSON support • Incremental Map Reduce • Gives power to the developer
  • 43. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto How we run Couchbase @Hippo
  • 44. follow the Hippo trail Hippo GetTogether 2013 Load Balancer Database cluster Hippo Delivery Tier Couchbase cluster •Request log data •Targeting data •Statistics data
  • 45. follow the Hippo trail Hippo GetTogether 2013 Query capabilities • Querying via views • Secondary indexes via views • Views based on Map - Reduce • Lacks some advanced query capabilities
  • 46. follow the Hippo trail Hippo GetTogether 2013 Elasticsearch • Apache Lucene • Designed to be distributed • Schema free • Apache license • RESTful API
  • 47. follow the Hippo trail Hippo GetTogether 2013 Added value of ES • Full text search • Faceted search • Geo spatial search • All in (near) real-time
  • 48. follow the Hippo trail Hippo GetTogether 2013 Couchbase Server Cluster Elasticsearch Server Cluster Hippo Delivery Tier Java API Write Read XDCR Couchbase ES Transport plugin Replicating to ES
  • 49. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto What’s Next?
  • 50. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto What’s Next?
  • 51. follow the Hippo trail Hippo GetTogether 2013 Advanced analytics
  • 52. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Demo time!
  • 53. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Thank you! Questions? j.reijn@onehippo.com | @jreijn ps. We’re hiring!