SlideShare uma empresa Scribd logo
1 de 29
Riak Search
Performance Wins
 How we got > 100x improvement
            in query throughput


            Gary Flake, Founder
           gary@clipboard.com
Demo


       Introduction
Architecture
                             web-01                       web-02                web-03
                          Node.js + Nginx              Node.js + Nginx       Node.js + Nginx




                      riak-01

                                                          cache-01                redis-01

  riak-05                                 riak-02
                                                          cache-02                redis-02


                                                          cache-03

            riak-04             riak-03
                                                                                                 admin-01




                      thumb-01              thumb-02                     job-01         job-02
Riak

An awesome noSQL data store:

• Super easy to scale up AND down
• Fault tolerant – no SPoF
• Flexible schema
• Full-text search out of the box
• Can be fixed and improved in Erlang (the
  Basho folks awesomely take our commits)
Riak – Basics

• Data in Riak is grouped buckets
  (effectively namespaces)
• Basic operations are:
    •   Get, save, delete, search, map, reduce
• Eventual consistency managed through
  N, R, and W bucket parameters.
• Everything we put in Riak is JSON
• We talk to Riak through the excellent riak-js
  node library by Francisco Treacy
Data Model – Clips
           title                  ctime
                                          domain

 author




mentions           annotation   tags
Data Model - Clips
Clips are the gateway to all of our data

                   <html>         Comments on Clip ‘abc’
                      …                  “F1rst”

                   </html>
 key: abc           Blob              “Nice clip yo!”


                                  “Saw this on Reddit…”
   Clip            Key: abc



                Comment Cache
Other Buckets

• Users
• Blobs
• Comments
• Templates
• Counts
• Search Caches
• Transactions
Riak Search

• Gets many things out of Riak by something
  other than the primary key.
• You specify a schema (the types for the
  field within a JSON object).
• Works great but with one big gotcha:
  – Index is uses term-based partitioning instead
    of document-based partitioning
  – Implication: joins + sort + pagination sucks
  – We know how to work around this
Riak Search – Querying

• Query syntax based on Lucene
• Basic Query
   text:funny
• Compound Query
   login:greg OR (login:gary AND tags:riak)
• Range Query
   ctime:[98685879630026 TO 98686484430026]
Clipboard App Flow
      Client                           node.js                           Riak
            Go to clipboard.com/home
                                                  Search clips bucket
                                                   query = login:greg

                                                     Top 20 results
                  Top 20 results
    start
rendering
                  (For each clip)
               API Request for blob
                                                 GET from blobs bucket

               Return blob to client
  render
    blob
Clipboard Queries


                 login:greg



               mentions:greg



  ctime:[98685879630026 TO 98686484430026]

                                             (Search)
Clipboard Queries cont.



            login:greg AND tags:riak




  login:greg AND text:node AND text:javascript


                                                 (Search)
Uh oh


               login:greg AND private:false
  Matches only my clips           Matches 20% of all clips!




                login:greg AND text:iPhone



                                                              (Search)
Index Partitioning Schemes
Doc Partition Query Processing

1. x AND y (sort z, start = 990, count = 10)
2. On Each node:
    1. Perform x AND y
    2. Sort on z
    3. Slice [ 0 .. 1000 ]
    4. Send to aggregator
3. On aggregator
    1. Merge all results (N x 1000)
    2. Slice [ 990 .. 1000 ]
Term Partition Query Processing

1. x AND y (sort z, start = 990, count = 10)
2. On x node: search for x (and send all)
3. On y node: search for y (and send all)
4. On aggregator:
    1. Do x AND y
    2. Sort on z
    3. Slice to [ 990 .. 1000 ]
Riak Search Issues

1. For any singular term, all results must be
   sent back to aggregator.
2. Incorrectly performs sort and slice (does
   sort then slice)
3. ANDs take time O(MAX(|x|, |y|)) instead
   of O(MIN(|x|, |y|).
4. All matches must be read to get sort field.
Riak Search Fixes

1. Inline fields for short and common
   attributes.
2. Dynamic fields for precomputed ANDs.
3. PRESORT option for sorting without
   document reads.
Inline Fields

Nifty feature added recently to Riak Search


Fields only used to prune result set can be
made inline for a big perf win


Normal query applied first – then results filtered
quickly with inline “filter” query


High storage cost – only viable for small fields!

                                               (Search)
Riak Search – Inline Fields cont.


             login:greg AND private:false

                       becomes
                   Query - login:greg
              Filter Query – private:false

 private:false is efficiently applied only to results of
 login:greg. Hooray!
                                                       (Search)
Fixing ANDs

But what about login:greg AND text:iPhone?



text field is too large to inline!



We had to get creative.


                                         (Search)
Dynamic Fields
Our Solution: Create a new field - text_u
   (u for user)


Values in text_u have the user’s name appended


In greg’s clip
 text:iPhone  text_greg:iPhone
In bob’s clip
 text:iPhone  text_bob:iPhone

                                            (Search)
Presort on Keys

• Our addition to Riak code base.
• Does sort before slice
• If PRESORT=key, then never reads the docs
• Tremendous win (> 100x compared to M/R
  approaches)
Clip Keys

<Time (ms)><User (guid)><SHA1 of Value>


• Base-64 encode each component
• Only use first 4 characters of user & content
• Only 16 bytes


Collisions? 1 in 17M if clipped the same thing
at same time.
Our Query Processing

1. w AND (x AND y)
   (sort z, start = 990, count = 10)
2. On w_x node: search and send w_x
3. On w_y node: search and send all w_y
4. On aggregator:
    1. Do w_x AND w_y
    2. Sort on z
    3. Slice to [ 990 .. 1000 ]
Summary

• Use inline fields for short and common bits
• Use dynamic fields for prebuilt ANDs
• Use keys that imply sort order
• Use same techniques for pagination


• Out approach yields search throughput
  that is 100x better than out of the box (and
  better as you scale outward).
Questions?
We’re hiring!


       www.clipboard.com/register
          Invitation Code: just4u


        www.clipboard.com/jobs
         Or talk to us right now!



                                    Thanks!

Mais conteúdo relacionado

Mais procurados

TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADRTweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
Lucidworks
 

Mais procurados (20)

Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
 
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGMUsing JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
 
Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014
 
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADRTweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
 
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Scala profiling
Scala profilingScala profiling
Scala profiling
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!
 
Introduction to datomic
Introduction to datomicIntroduction to datomic
Introduction to datomic
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 

Destaque

презентация1
презентация1презентация1
презентация1
Danil Kozlov
 
Bunny booktemplate1
Bunny booktemplate1Bunny booktemplate1
Bunny booktemplate1
mjbeichner
 
Voting presentation
Voting presentationVoting presentation
Voting presentation
hannahfenney
 
Maritime New Haven - Sound School
Maritime New Haven - Sound SchoolMaritime New Haven - Sound School
Maritime New Haven - Sound School
Amy Durbin
 
Android vs ios
Android vs iosAndroid vs ios
Android vs ios
gndolf
 
Forefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas esForefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas es
Fitira
 

Destaque (20)

Riak 2.0 : For Beginners, and Everyone Else
Riak 2.0 : For Beginners, and Everyone ElseRiak 2.0 : For Beginners, and Everyone Else
Riak 2.0 : For Beginners, and Everyone Else
 
Leon fagan
Leon faganLeon fagan
Leon fagan
 
Proyecto fredy-jaramillo extenzo
Proyecto fredy-jaramillo extenzoProyecto fredy-jaramillo extenzo
Proyecto fredy-jaramillo extenzo
 
Plt process (category products)
Plt process (category products)Plt process (category products)
Plt process (category products)
 
Data data every where!! Thomas O'Grady
Data data every where!! Thomas O'GradyData data every where!! Thomas O'Grady
Data data every where!! Thomas O'Grady
 
26 28
26 2826 28
26 28
 
презентация1
презентация1презентация1
презентация1
 
Bunny booktemplate1
Bunny booktemplate1Bunny booktemplate1
Bunny booktemplate1
 
Voting presentation
Voting presentationVoting presentation
Voting presentation
 
Maritime New Haven - Sound School
Maritime New Haven - Sound SchoolMaritime New Haven - Sound School
Maritime New Haven - Sound School
 
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
 
Digipak analysis
Digipak analysisDigipak analysis
Digipak analysis
 
Marketing Management
Marketing ManagementMarketing Management
Marketing Management
 
Android vs ios
Android vs iosAndroid vs ios
Android vs ios
 
Forefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas esForefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas es
 
Simplify and run your development environments with Vagrant on OpenStack
Simplify and run your development environments with Vagrant on OpenStackSimplify and run your development environments with Vagrant on OpenStack
Simplify and run your development environments with Vagrant on OpenStack
 
The Poker Entrepreneurship: Speaking @ JFDI.Asia
The Poker Entrepreneurship: Speaking @ JFDI.AsiaThe Poker Entrepreneurship: Speaking @ JFDI.Asia
The Poker Entrepreneurship: Speaking @ JFDI.Asia
 
P m01 inside_selling
P m01 inside_sellingP m01 inside_selling
P m01 inside_selling
 
Digital Audio/Podcast Assignment
Digital Audio/Podcast AssignmentDigital Audio/Podcast Assignment
Digital Audio/Podcast Assignment
 
P m01 inside_selling
P m01 inside_sellingP m01 inside_selling
P m01 inside_selling
 

Semelhante a Riak perf wins

Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
MongoDB APAC
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
bartzon
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slides
Matt Kocubinski
 

Semelhante a Riak perf wins (20)

Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Tricks
 
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
 
Tuning Flink For Robustness And Performance
Tuning Flink For Robustness And PerformanceTuning Flink For Robustness And Performance
Tuning Flink For Robustness And Performance
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
 
遇見 Ruby on Rails
遇見 Ruby on Rails遇見 Ruby on Rails
遇見 Ruby on Rails
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Fluent 2012 v2
Fluent 2012   v2Fluent 2012   v2
Fluent 2012 v2
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
 
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time CompilationThe Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
 
Autogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQLAutogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQL
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slides
 
Let's Get to the Rapids
Let's Get to the RapidsLet's Get to the Rapids
Let's Get to the Rapids
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
Monitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixMonitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with Zabbix
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Riak perf wins

  • 1. Riak Search Performance Wins How we got > 100x improvement in query throughput Gary Flake, Founder gary@clipboard.com
  • 2. Demo Introduction
  • 3. Architecture web-01 web-02 web-03 Node.js + Nginx Node.js + Nginx Node.js + Nginx riak-01 cache-01 redis-01 riak-05 riak-02 cache-02 redis-02 cache-03 riak-04 riak-03 admin-01 thumb-01 thumb-02 job-01 job-02
  • 4. Riak An awesome noSQL data store: • Super easy to scale up AND down • Fault tolerant – no SPoF • Flexible schema • Full-text search out of the box • Can be fixed and improved in Erlang (the Basho folks awesomely take our commits)
  • 5. Riak – Basics • Data in Riak is grouped buckets (effectively namespaces) • Basic operations are: • Get, save, delete, search, map, reduce • Eventual consistency managed through N, R, and W bucket parameters. • Everything we put in Riak is JSON • We talk to Riak through the excellent riak-js node library by Francisco Treacy
  • 6. Data Model – Clips title ctime domain author mentions annotation tags
  • 7. Data Model - Clips Clips are the gateway to all of our data <html> Comments on Clip ‘abc’ … “F1rst” </html> key: abc Blob “Nice clip yo!” “Saw this on Reddit…” Clip Key: abc Comment Cache
  • 8. Other Buckets • Users • Blobs • Comments • Templates • Counts • Search Caches • Transactions
  • 9. Riak Search • Gets many things out of Riak by something other than the primary key. • You specify a schema (the types for the field within a JSON object). • Works great but with one big gotcha: – Index is uses term-based partitioning instead of document-based partitioning – Implication: joins + sort + pagination sucks – We know how to work around this
  • 10. Riak Search – Querying • Query syntax based on Lucene • Basic Query text:funny • Compound Query login:greg OR (login:gary AND tags:riak) • Range Query ctime:[98685879630026 TO 98686484430026]
  • 11. Clipboard App Flow Client node.js Riak Go to clipboard.com/home Search clips bucket query = login:greg Top 20 results Top 20 results start rendering (For each clip) API Request for blob GET from blobs bucket Return blob to client render blob
  • 12. Clipboard Queries login:greg mentions:greg ctime:[98685879630026 TO 98686484430026] (Search)
  • 13. Clipboard Queries cont. login:greg AND tags:riak login:greg AND text:node AND text:javascript (Search)
  • 14. Uh oh login:greg AND private:false Matches only my clips Matches 20% of all clips! login:greg AND text:iPhone (Search)
  • 16. Doc Partition Query Processing 1. x AND y (sort z, start = 990, count = 10) 2. On Each node: 1. Perform x AND y 2. Sort on z 3. Slice [ 0 .. 1000 ] 4. Send to aggregator 3. On aggregator 1. Merge all results (N x 1000) 2. Slice [ 990 .. 1000 ]
  • 17. Term Partition Query Processing 1. x AND y (sort z, start = 990, count = 10) 2. On x node: search for x (and send all) 3. On y node: search for y (and send all) 4. On aggregator: 1. Do x AND y 2. Sort on z 3. Slice to [ 990 .. 1000 ]
  • 18. Riak Search Issues 1. For any singular term, all results must be sent back to aggregator. 2. Incorrectly performs sort and slice (does sort then slice) 3. ANDs take time O(MAX(|x|, |y|)) instead of O(MIN(|x|, |y|). 4. All matches must be read to get sort field.
  • 19. Riak Search Fixes 1. Inline fields for short and common attributes. 2. Dynamic fields for precomputed ANDs. 3. PRESORT option for sorting without document reads.
  • 20. Inline Fields Nifty feature added recently to Riak Search Fields only used to prune result set can be made inline for a big perf win Normal query applied first – then results filtered quickly with inline “filter” query High storage cost – only viable for small fields! (Search)
  • 21. Riak Search – Inline Fields cont. login:greg AND private:false becomes Query - login:greg Filter Query – private:false private:false is efficiently applied only to results of login:greg. Hooray! (Search)
  • 22. Fixing ANDs But what about login:greg AND text:iPhone? text field is too large to inline! We had to get creative. (Search)
  • 23. Dynamic Fields Our Solution: Create a new field - text_u (u for user) Values in text_u have the user’s name appended In greg’s clip text:iPhone  text_greg:iPhone In bob’s clip text:iPhone  text_bob:iPhone (Search)
  • 24. Presort on Keys • Our addition to Riak code base. • Does sort before slice • If PRESORT=key, then never reads the docs • Tremendous win (> 100x compared to M/R approaches)
  • 25. Clip Keys <Time (ms)><User (guid)><SHA1 of Value> • Base-64 encode each component • Only use first 4 characters of user & content • Only 16 bytes Collisions? 1 in 17M if clipped the same thing at same time.
  • 26. Our Query Processing 1. w AND (x AND y) (sort z, start = 990, count = 10) 2. On w_x node: search and send w_x 3. On w_y node: search and send all w_y 4. On aggregator: 1. Do w_x AND w_y 2. Sort on z 3. Slice to [ 990 .. 1000 ]
  • 27. Summary • Use inline fields for short and common bits • Use dynamic fields for prebuilt ANDs • Use keys that imply sort order • Use same techniques for pagination • Out approach yields search throughput that is 100x better than out of the box (and better as you scale outward).
  • 29. We’re hiring! www.clipboard.com/register Invitation Code: just4u www.clipboard.com/jobs Or talk to us right now! Thanks!