SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Scaling at Showyou
      John Muellerleile (@jrecursive, john@remixation.com)
      September 26, 2011




Tuesday, September 27, 2011
John Muellerleile

      • Basho Technologies: Jan. 2008 - Dec. 2010
            • Riak
            • Riak Search
            • Automated research, NLP, spidering
      • Consulting: 2003 - 2008
            • E-commerce, AdSense, AdWords, ...
      • Infrastructure:
                  • Messaging
                  • Riak
                  • Solr



Tuesday, September 27, 2011
Agenda

      • Who am I?


      • What is Showyou?


      • The Nature of “Social Data”


      • Showyou’s Data: Today & Tomorrow


      • Data Management: Technology Stack


      • Riak: The Awesome & Sub-Awesome, Integration Patterns & Observations


      • Not Bob’s Riak: The “Mecha” Backend & Query System


Tuesday, September 27, 2011
What is Showyou?




Tuesday, September 27, 2011
A minute about the nature
                  of “Social Data”



Tuesday, September 27, 2011
Tuesday, September 27, 2011
Tuesday, September 27, 2011
Tuesday, September 27, 2011
Tuesday, September 27, 2011
Tuesday, September 27, 2011
Hi!




Tuesday, September 27, 2011
Showyou’s Data: Today

      • Riak with Bitcask


      • Solr with replication


      • Often ever-growing blobs of JSON


      • No useful way to “find” data based on anything other than compound primary
        keys :(


            • Primary keys crafted for specific access patterns :(


      • This will not last forever




Tuesday, September 27, 2011
Showyou’s Data: Tomorrow

      • Significant data growth per additional user


      • Find & aggregate data about our users & their videos


      • Derive useful “signal” from this data


            • Better search: disambiguation, “more like this”, performance


            • “Collective Intelligence”: trending, “smart collections”


            • Spam & De-duplication


            • & more: #hashtags, auto-complete, statistics, usage, ...


Tuesday, September 27, 2011
Data Management: Technology Stack

      • Erlang/OTP


            • Riak, Bitcask


      • Java


            • JInterface


            • Hazelcast


            • Solr




Tuesday, September 27, 2011
Riak: The Awesome Parts

      • By far the best operational story in its class


      • Shared-nothing


      • No single point of failure


      • Masterless multi-site replication


      • Support via EnterpriseDS Startup Program


      • I helped write it & turned it inside-out to design Riak Search while working at
        Basho -- this helps




Tuesday, September 27, 2011
Riak: The Sub-Awesome Parts

      • Riak Search as it exists today does not perform well for us & lacks features


      • Bitcask keeps all keys in memory


      • Listing keys for a bucket will take your cluster down


      • Map/Reduce is virtually useless (for us) other than as “multi-get”


      • Pre-1.0 cluster membership changes are “at your peril”


      • Usable/useful built-in monitoring is non-existent


      • If you are not intimately familiar with Riak, it’s very hard to debug!


Tuesday, September 27, 2011
Riak: Integration Patterns

      • api_riak_node: “main” Riak cluster node


      • sidecar: post-commit talks to local
        Java-based Erlang node using JInterface


      • fabric: distributed data structures &
        utilities via Hazelcast - very similar in
        spirit & implementation as Riak!


      • indexer: pull from fabric queue &
        index record in Solr


      • Identical deployment on every node



Tuesday, September 27, 2011
Riak: Integration Patterns: The Big Picture

      • backend_node: nginx, redis; logs


      • spider: finds, extracts,
        stores & indexes further
        information on users & videos


      • log_indexer: aggregate & index
        interesting parts of our access logs


      • research_riak_node: a special-purpose
        Riak cluster node to support Showyou
        data “Tomorrow”




Tuesday, September 27, 2011
Riak: Integration Patterns: Observations

      • Riak post-commit hooks


      • Why not pre-commit hooks?


      • Riak as a “virtual memory”


      • Post-commit hooks as “change events”


      • Wishlist: pre- & post-delete hook (I realize this is tricky - do it anyway)


      • Wishlist: pre- & post-create hook (Less tricky - do it anyway)




Tuesday, September 27, 2011
Riak: Integration Patterns: Search

      • Monolithic replicated Solr won’t last forever & sharding is a faceted multi-
        value “shitshow” [1]


      • We’re doing fine, for now, with lots of RAM, SSDs, etc. but...


      • I don’t want to find out “the hard way” where the joyride ends and the hellride
        starts [2]


      • Clearly the answer is to kill every bird in nearby airspace by writing a
        Riak storage backend and integrated query mechanism! [2,3,4]

          [1] Cliff Moon suggested the use of this word (for emphasis)
          [2] It’s okay, I have done this several dozen times
          [3] Yes, really: BDB, BDBJE, Innostore, sqlite, hsql, mysql, postgres, etc.
          [4] This is not a pride thing: it was a hard, lonely, unforgiving road




Tuesday, September 27, 2011
Not Bob’s Riak: A Moment of Weakness




                              “My way of joking is to tell the truth;
                               it’s the funniest joke in the world”
                                      George Bernard Shaw




Tuesday, September 27, 2011
Not Bob’s Riak: Introducing “Mecha”

      • Bob?




Tuesday, September 27, 2011
Mecha: Goals

      • Birds to kill:


            • Tight, purposed Riak integration


            • Efficient & feature-complete indexing


            • Fast sequential & range object access


            • Flexible distributed query mechanism


            • Query parallelism where appropriate




Tuesday, September 27, 2011
Mecha: What Stays The Same

      • Works with “stock” (unmodified) Riak 1.0 (pre & release)


      • Little/no difference from the “Riak side” -- everything works “as it should”


      • Differences:


            • All objects you “put” into Riak must be JSON objects (this will change by
              release to respect content type)


            • Any fields without a specific “indexed field type” (e.g., _t, _s, _dt, etc.) are
              simply stored along with the rest of the fields (i.e. “stored field”)




Tuesday, September 27, 2011
Mecha: At a glance

      • Written in Java, uses many, many third-party libraries:
        LevelDB (JNI), Solr, JInterface, Jetlang, Netty, Protobufs, Commons, ...


      • Riak backend module written in Erlang; beaten into submission for reliable
        interaction with JInterface-driven Java node


      • LevelDB instance per partition, per bucket; “ulimit -n 90000” :)


      • One Solr instance per node (covers all partitions, buckets)


      • Objects stored as Riak Objects in JSON form in LevelDB


      • Standardized schema covering common data types using name suffixes



Tuesday, September 27, 2011
Mecha: Riak Integration




Tuesday, September 27, 2011
Mecha: Index Field Types

      • Supported index field types (by suffix):


      • _t, _tt - full-text (optionally w/ term vectors, ...)


      • _s, _s_mv - exact string, multi-value exact string


      • _i, _l, _d, _f - trie-based integer, long, double, float


      • _dt - trie-based date (YYYY-MM-DDTHH:MM:SS)


      • _b - boolean


      • _xy, _ll, _geo - point, lat/lon & geohash


Tuesday, September 27, 2011
Mecha: Example Object

          { “content_t”: “NoSQL is a ghetto”,
            “lol_count_i”: 47,
            “lol_dt”: “2009-04-13T07:01:43.000Z”
            “rating_f”: 4.111111164093018,
            “tags_s_mv”: [
              “funny”,
              “lol”,
              “nosql”,
              “ghetto”]}


Tuesday, September 27, 2011
Mecha: Fast Sequential & Range Object Access

      • Every bucket gets own instance of LevelDB per partition


      • No “multiplexing” buckets or partitions per LevelDB instance


      • Keys are literal, no encoded Erlang terms; simple ranges, smaller values


      • Values stored as JSON-ized Riak objects (why? you’ll see)


      • LevelDB JNI binaries shipped with Snappy compression built-in




Tuesday, September 27, 2011
Mecha: Flexible Distributed Querying

      • Exact, prefix, suffix, & wildcard filtering on multiple index fields


      • Ultra-fast list_keys, list_bucket, list_buckets replacements


      • Equally fast bucket count operation


      • Ridiculous range query performance (any trie- type; sane datetime
        functions)


      • Faceting, group counts


      • Spatial (bounding box/bowl, geofilt, Haversine, distance faceting)




Tuesday, September 27, 2011
Mecha: Query Parallelism




Tuesday, September 27, 2011
Mecha: Coverage

      • Coverage is the set of nodes and respectively owned partitions you must
        process to cover all objects in a given bucket.

                                         Node 1




                                                              Node 2
              (for n=2)



Tuesday, September 27, 2011
Mecha: Examples: Count & Faceting

      • Count the number of records in the "research" bucket modified within the last
        10 seconds




      • Top search terms by count for the last hour




Tuesday, September 27, 2011
Mecha: Examples: Multi-value String Fields

      • See which field values occur with other values, and how often (using a multi-
        value string field)




Tuesday, September 27, 2011
Mecha: Next Steps

      • Code cleanup, test & polish current functionality


      • Sane build & deployment (yay, Maven)


      • Simplify configuration


      • Embed Solr (currently running standalone for debugging)


      • Extend & improve Solr “standard configuration”


      • “Out of Band” Map/reduce with “direct bucket-level” LevelDB instance


      • After that? Join operators, ... :)


Tuesday, September 27, 2011
Mecha: Availability

      • Right now it is a complex system


      • It will be worth waiting for tighter integration & polish


      • This is me not answering your question :)


      • As soon as responsible, sane, & possible :)




Tuesday, September 27, 2011
Thank you!

      • Basho Technologies, especially Andy Gross (@argv0) & Kelly McLaughlin
        (@_klm) specifically for help with Riak 1.0 changes


      • Of course, without Erlang/OTP, LevelDB, Solr, Java, JInterface, and a host of
        other open source projects, this would have never even gotten started --
        thank you.


      • Last but not least, thank you to my Showyou teammates for encouragement
        & support!


      • Contact:
        John Muellerleile / http://twitter.com/jrecursive
        john@remixation.com, jmuellerleile@gmail.com
        http://github.com/jrecursive



Tuesday, September 27, 2011

Mais conteúdo relacionado

Mais procurados

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Dan Harvey
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzDatabricks
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...rhatr
 
Scala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataScala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataJohn Nestor
 
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...HostedbyConfluent
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Toolsbotsplash.com
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using KafkaAkash Vacher
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021StreamNative
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelinprajods
 
DEVNET-1106 Upcoming Services in OpenStack
DEVNET-1106	Upcoming Services in OpenStackDEVNET-1106	Upcoming Services in OpenStack
DEVNET-1106 Upcoming Services in OpenStackCisco DevNet
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
 
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsMaking Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsLightbend
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Josh Carlisle
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark PresentationStephen Borg
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesAnand Narayanan
 

Mais procurados (20)

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
Scala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataScala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big Data
 
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using Kafka
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
DEVNET-1106 Upcoming Services in OpenStack
DEVNET-1106	Upcoming Services in OpenStackDEVNET-1106	Upcoming Services in OpenStack
DEVNET-1106 Upcoming Services in OpenStack
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsMaking Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development Teams
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark Presentation
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika Technologies
 

Destaque

Redis : Play buzz uses Redis
Redis : Play buzz uses RedisRedis : Play buzz uses Redis
Redis : Play buzz uses RedisRedis Labs
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Michal Bachman
 
An intro to Neo4j and some use cases (JFokus 2011)
An intro to Neo4j and some use cases (JFokus 2011)An intro to Neo4j and some use cases (JFokus 2011)
An intro to Neo4j and some use cases (JFokus 2011)Emil Eifrem
 
Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsAndy Gross
 
Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Michal Bachman
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkMichal Bachman
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Michal Bachman
 
Best Buy Web 2.0
Best Buy Web 2.0Best Buy Web 2.0
Best Buy Web 2.0Lee Aase
 
Introduction to Redis Data Structures: Sorted Sets
Introduction to Redis Data Structures: Sorted SetsIntroduction to Redis Data Structures: Sorted Sets
Introduction to Redis Data Structures: Sorted SetsScaleGrid.io
 
Redis and its many use cases
Redis and its many use casesRedis and its many use cases
Redis and its many use casesChristian Joudrey
 
Introduction to some top Redis use cases
Introduction to some top Redis use casesIntroduction to some top Redis use cases
Introduction to some top Redis use casesJosiah Carlson
 
The BestBuy.com Cloud Architecture
The BestBuy.com Cloud ArchitectureThe BestBuy.com Cloud Architecture
The BestBuy.com Cloud Architecturejoelcrabb
 
Using Redis at Facebook
Using Redis at FacebookUsing Redis at Facebook
Using Redis at FacebookRedis Labs
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jMax De Marzi
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j
 
Redis Use Patterns (DevconTLV June 2014)
Redis Use Patterns (DevconTLV June 2014)Redis Use Patterns (DevconTLV June 2014)
Redis Use Patterns (DevconTLV June 2014)Itamar Haber
 
Neo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j
 
Neo4j GraphTalks Panama Papers
Neo4j GraphTalks Panama PapersNeo4j GraphTalks Panama Papers
Neo4j GraphTalks Panama PapersNeo4j
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
 

Destaque (20)

Redis : Play buzz uses Redis
Redis : Play buzz uses RedisRedis : Play buzz uses Redis
Redis : Play buzz uses Redis
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)
 
An intro to Neo4j and some use cases (JFokus 2011)
An intro to Neo4j and some use cases (JFokus 2011)An intro to Neo4j and some use cases (JFokus 2011)
An intro to Neo4j and some use cases (JFokus 2011)
 
Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard Problems
 
Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework
 
Redis use cases
Redis use casesRedis use cases
Redis use cases
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)
 
Best Buy Web 2.0
Best Buy Web 2.0Best Buy Web 2.0
Best Buy Web 2.0
 
Introduction to Redis Data Structures: Sorted Sets
Introduction to Redis Data Structures: Sorted SetsIntroduction to Redis Data Structures: Sorted Sets
Introduction to Redis Data Structures: Sorted Sets
 
Redis and its many use cases
Redis and its many use casesRedis and its many use cases
Redis and its many use cases
 
Introduction to some top Redis use cases
Introduction to some top Redis use casesIntroduction to some top Redis use cases
Introduction to some top Redis use cases
 
The BestBuy.com Cloud Architecture
The BestBuy.com Cloud ArchitectureThe BestBuy.com Cloud Architecture
The BestBuy.com Cloud Architecture
 
Using Redis at Facebook
Using Redis at FacebookUsing Redis at Facebook
Using Redis at Facebook
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4j
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in Graphdatenbanken
 
Redis Use Patterns (DevconTLV June 2014)
Redis Use Patterns (DevconTLV June 2014)Redis Use Patterns (DevconTLV June 2014)
Redis Use Patterns (DevconTLV June 2014)
 
Neo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j the Anti Crime Database
Neo4j the Anti Crime Database
 
Neo4j GraphTalks Panama Papers
Neo4j GraphTalks Panama PapersNeo4j GraphTalks Panama Papers
Neo4j GraphTalks Panama Papers
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 

Semelhante a Scaling with Riak at Showyou

Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014NoSQLmatters
 
JCR - Java Content Repositories
JCR - Java Content RepositoriesJCR - Java Content Repositories
JCR - Java Content RepositoriesCarsten Ziegeler
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Eurydike: Schemaless Object Relational SQL Mapper
Eurydike: Schemaless Object Relational SQL MapperEurydike: Schemaless Object Relational SQL Mapper
Eurydike: Schemaless Object Relational SQL MapperESUG
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...OpenBlend society
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopDmitry Kan
 
DSpace Under the Hood
DSpace Under the HoodDSpace Under the Hood
DSpace Under the HoodDuraSpace
 
SeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisSeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisWill Iverson
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Oracle Analytics Security Everything you always wanted to know
Oracle Analytics Security Everything you always wanted to knowOracle Analytics Security Everything you always wanted to know
Oracle Analytics Security Everything you always wanted to knowChristian Berg
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's ArchitectureTony Tam
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
 
Riak seattle-meetup-august
Riak seattle-meetup-augustRiak seattle-meetup-august
Riak seattle-meetup-augustpharkmillups
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at TwitterAlex Payne
 
OSGi, Scripting and REST, Building Webapps With Apache Sling
OSGi, Scripting and REST, Building Webapps With Apache SlingOSGi, Scripting and REST, Building Webapps With Apache Sling
OSGi, Scripting and REST, Building Webapps With Apache SlingCarsten Ziegeler
 
Apache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM AlternativeApache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM AlternativeAndrus Adamchik
 
Oct meetup open stack 101 clean
Oct meetup open stack 101   cleanOct meetup open stack 101   clean
Oct meetup open stack 101 cleanbenrodrigue
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning ModelsJosh Patterson
 

Semelhante a Scaling with Riak at Showyou (20)

Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
 
JCR - Java Content Repositories
JCR - Java Content RepositoriesJCR - Java Content Repositories
JCR - Java Content Repositories
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Eurydike: Schemaless Object Relational SQL Mapper
Eurydike: Schemaless Object Relational SQL MapperEurydike: Schemaless Object Relational SQL Mapper
Eurydike: Schemaless Object Relational SQL Mapper
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 
DSpace Under the Hood
DSpace Under the HoodDSpace Under the Hood
DSpace Under the Hood
 
SeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisSeaJUG May 2012 mybatis
SeaJUG May 2012 mybatis
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Oracle Analytics Security Everything you always wanted to know
Oracle Analytics Security Everything you always wanted to knowOracle Analytics Security Everything you always wanted to know
Oracle Analytics Security Everything you always wanted to know
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
Riak seattle-meetup-august
Riak seattle-meetup-augustRiak seattle-meetup-august
Riak seattle-meetup-august
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at Twitter
 
OSGi, Scripting and REST, Building Webapps With Apache Sling
OSGi, Scripting and REST, Building Webapps With Apache SlingOSGi, Scripting and REST, Building Webapps With Apache Sling
OSGi, Scripting and REST, Building Webapps With Apache Sling
 
Apache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM AlternativeApache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM Alternative
 
Oct meetup open stack 101 clean
Oct meetup open stack 101   cleanOct meetup open stack 101   clean
Oct meetup open stack 101 clean
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning Models
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 

Último

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Último (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Scaling with Riak at Showyou

  • 1. Scaling at Showyou John Muellerleile (@jrecursive, john@remixation.com) September 26, 2011 Tuesday, September 27, 2011
  • 2. John Muellerleile • Basho Technologies: Jan. 2008 - Dec. 2010 • Riak • Riak Search • Automated research, NLP, spidering • Consulting: 2003 - 2008 • E-commerce, AdSense, AdWords, ... • Infrastructure: • Messaging • Riak • Solr Tuesday, September 27, 2011
  • 3. Agenda • Who am I? • What is Showyou? • The Nature of “Social Data” • Showyou’s Data: Today & Tomorrow • Data Management: Technology Stack • Riak: The Awesome & Sub-Awesome, Integration Patterns & Observations • Not Bob’s Riak: The “Mecha” Backend & Query System Tuesday, September 27, 2011
  • 4. What is Showyou? Tuesday, September 27, 2011
  • 5. A minute about the nature of “Social Data” Tuesday, September 27, 2011
  • 12. Showyou’s Data: Today • Riak with Bitcask • Solr with replication • Often ever-growing blobs of JSON • No useful way to “find” data based on anything other than compound primary keys :( • Primary keys crafted for specific access patterns :( • This will not last forever Tuesday, September 27, 2011
  • 13. Showyou’s Data: Tomorrow • Significant data growth per additional user • Find & aggregate data about our users & their videos • Derive useful “signal” from this data • Better search: disambiguation, “more like this”, performance • “Collective Intelligence”: trending, “smart collections” • Spam & De-duplication • & more: #hashtags, auto-complete, statistics, usage, ... Tuesday, September 27, 2011
  • 14. Data Management: Technology Stack • Erlang/OTP • Riak, Bitcask • Java • JInterface • Hazelcast • Solr Tuesday, September 27, 2011
  • 15. Riak: The Awesome Parts • By far the best operational story in its class • Shared-nothing • No single point of failure • Masterless multi-site replication • Support via EnterpriseDS Startup Program • I helped write it & turned it inside-out to design Riak Search while working at Basho -- this helps Tuesday, September 27, 2011
  • 16. Riak: The Sub-Awesome Parts • Riak Search as it exists today does not perform well for us & lacks features • Bitcask keeps all keys in memory • Listing keys for a bucket will take your cluster down • Map/Reduce is virtually useless (for us) other than as “multi-get” • Pre-1.0 cluster membership changes are “at your peril” • Usable/useful built-in monitoring is non-existent • If you are not intimately familiar with Riak, it’s very hard to debug! Tuesday, September 27, 2011
  • 17. Riak: Integration Patterns • api_riak_node: “main” Riak cluster node • sidecar: post-commit talks to local Java-based Erlang node using JInterface • fabric: distributed data structures & utilities via Hazelcast - very similar in spirit & implementation as Riak! • indexer: pull from fabric queue & index record in Solr • Identical deployment on every node Tuesday, September 27, 2011
  • 18. Riak: Integration Patterns: The Big Picture • backend_node: nginx, redis; logs • spider: finds, extracts, stores & indexes further information on users & videos • log_indexer: aggregate & index interesting parts of our access logs • research_riak_node: a special-purpose Riak cluster node to support Showyou data “Tomorrow” Tuesday, September 27, 2011
  • 19. Riak: Integration Patterns: Observations • Riak post-commit hooks • Why not pre-commit hooks? • Riak as a “virtual memory” • Post-commit hooks as “change events” • Wishlist: pre- & post-delete hook (I realize this is tricky - do it anyway) • Wishlist: pre- & post-create hook (Less tricky - do it anyway) Tuesday, September 27, 2011
  • 20. Riak: Integration Patterns: Search • Monolithic replicated Solr won’t last forever & sharding is a faceted multi- value “shitshow” [1] • We’re doing fine, for now, with lots of RAM, SSDs, etc. but... • I don’t want to find out “the hard way” where the joyride ends and the hellride starts [2] • Clearly the answer is to kill every bird in nearby airspace by writing a Riak storage backend and integrated query mechanism! [2,3,4] [1] Cliff Moon suggested the use of this word (for emphasis) [2] It’s okay, I have done this several dozen times [3] Yes, really: BDB, BDBJE, Innostore, sqlite, hsql, mysql, postgres, etc. [4] This is not a pride thing: it was a hard, lonely, unforgiving road Tuesday, September 27, 2011
  • 21. Not Bob’s Riak: A Moment of Weakness “My way of joking is to tell the truth; it’s the funniest joke in the world” George Bernard Shaw Tuesday, September 27, 2011
  • 22. Not Bob’s Riak: Introducing “Mecha” • Bob? Tuesday, September 27, 2011
  • 23. Mecha: Goals • Birds to kill: • Tight, purposed Riak integration • Efficient & feature-complete indexing • Fast sequential & range object access • Flexible distributed query mechanism • Query parallelism where appropriate Tuesday, September 27, 2011
  • 24. Mecha: What Stays The Same • Works with “stock” (unmodified) Riak 1.0 (pre & release) • Little/no difference from the “Riak side” -- everything works “as it should” • Differences: • All objects you “put” into Riak must be JSON objects (this will change by release to respect content type) • Any fields without a specific “indexed field type” (e.g., _t, _s, _dt, etc.) are simply stored along with the rest of the fields (i.e. “stored field”) Tuesday, September 27, 2011
  • 25. Mecha: At a glance • Written in Java, uses many, many third-party libraries: LevelDB (JNI), Solr, JInterface, Jetlang, Netty, Protobufs, Commons, ... • Riak backend module written in Erlang; beaten into submission for reliable interaction with JInterface-driven Java node • LevelDB instance per partition, per bucket; “ulimit -n 90000” :) • One Solr instance per node (covers all partitions, buckets) • Objects stored as Riak Objects in JSON form in LevelDB • Standardized schema covering common data types using name suffixes Tuesday, September 27, 2011
  • 26. Mecha: Riak Integration Tuesday, September 27, 2011
  • 27. Mecha: Index Field Types • Supported index field types (by suffix): • _t, _tt - full-text (optionally w/ term vectors, ...) • _s, _s_mv - exact string, multi-value exact string • _i, _l, _d, _f - trie-based integer, long, double, float • _dt - trie-based date (YYYY-MM-DDTHH:MM:SS) • _b - boolean • _xy, _ll, _geo - point, lat/lon & geohash Tuesday, September 27, 2011
  • 28. Mecha: Example Object { “content_t”: “NoSQL is a ghetto”, “lol_count_i”: 47, “lol_dt”: “2009-04-13T07:01:43.000Z” “rating_f”: 4.111111164093018, “tags_s_mv”: [ “funny”, “lol”, “nosql”, “ghetto”]} Tuesday, September 27, 2011
  • 29. Mecha: Fast Sequential & Range Object Access • Every bucket gets own instance of LevelDB per partition • No “multiplexing” buckets or partitions per LevelDB instance • Keys are literal, no encoded Erlang terms; simple ranges, smaller values • Values stored as JSON-ized Riak objects (why? you’ll see) • LevelDB JNI binaries shipped with Snappy compression built-in Tuesday, September 27, 2011
  • 30. Mecha: Flexible Distributed Querying • Exact, prefix, suffix, & wildcard filtering on multiple index fields • Ultra-fast list_keys, list_bucket, list_buckets replacements • Equally fast bucket count operation • Ridiculous range query performance (any trie- type; sane datetime functions) • Faceting, group counts • Spatial (bounding box/bowl, geofilt, Haversine, distance faceting) Tuesday, September 27, 2011
  • 31. Mecha: Query Parallelism Tuesday, September 27, 2011
  • 32. Mecha: Coverage • Coverage is the set of nodes and respectively owned partitions you must process to cover all objects in a given bucket. Node 1 Node 2 (for n=2) Tuesday, September 27, 2011
  • 33. Mecha: Examples: Count & Faceting • Count the number of records in the "research" bucket modified within the last 10 seconds • Top search terms by count for the last hour Tuesday, September 27, 2011
  • 34. Mecha: Examples: Multi-value String Fields • See which field values occur with other values, and how often (using a multi- value string field) Tuesday, September 27, 2011
  • 35. Mecha: Next Steps • Code cleanup, test & polish current functionality • Sane build & deployment (yay, Maven) • Simplify configuration • Embed Solr (currently running standalone for debugging) • Extend & improve Solr “standard configuration” • “Out of Band” Map/reduce with “direct bucket-level” LevelDB instance • After that? Join operators, ... :) Tuesday, September 27, 2011
  • 36. Mecha: Availability • Right now it is a complex system • It will be worth waiting for tighter integration & polish • This is me not answering your question :) • As soon as responsible, sane, & possible :) Tuesday, September 27, 2011
  • 37. Thank you! • Basho Technologies, especially Andy Gross (@argv0) & Kelly McLaughlin (@_klm) specifically for help with Riak 1.0 changes • Of course, without Erlang/OTP, LevelDB, Solr, Java, JInterface, and a host of other open source projects, this would have never even gotten started -- thank you. • Last but not least, thank you to my Showyou teammates for encouragement & support! • Contact: John Muellerleile / http://twitter.com/jrecursive john@remixation.com, jmuellerleile@gmail.com http://github.com/jrecursive Tuesday, September 27, 2011