SlideShare uma empresa Scribd logo
1 de 24
NOSQL. WTW?


               adicu.com
           February 2011
        Alexander Sicular
               @siculars
Who is this blowhard?
Columbia University pays my mortgage

For the better part of a decade in Medical
Informatics

Am not shilling for any of these companies

Am not a computer scientist

Am a computer science enthusiast
particularly in the area of Informatics
NoSQL or NOSQL?
Not Only SQL

Non/post relational

Big tent policy

Umbrella term

Fragmented



                      http://www.flickr.com/photos/morgennebel/2933723145/
Your Usage Patterns
Read vs. Write

Mutable vs. Immutable

Product Considerations:

  In place updates

  Write Only Logs
This vs. That
Riak wiki comparisons page
http://wiki.basho.com/Riak-Comparisons.html


Popular one page comparison of a number of
NOSQL players by Kristof Kovacs:
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
NOSQL concepts are
  Not Brand New
Memcached since 2003                       http://memcached.org



Google papers 2004-2006

Amazon Dynamo 2007

Consistent Hashing 2007 http://www.last.fm/user/RJ/journal/
2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients


Using relational systems as a key-value blob
store
    2009 FriendFeed (not the first)         http://bret.appspot.com/entry/how-
    friendfeed-uses-mysql
Why NOSQL
Support for “Vary Large” data sets

Schemaless

Denormalized

Green field

New applications



                      http://www.flickr.com/photos/gailtang/1243984297/
Academia
Google:

  Bigtable        http://labs.google.com/papers/bigtable.html



  GFS     http://labs.google.com/papers/gfs.html



  M/R     http://labs.google.com/papers/mapreduce.html



Amazon:

  Dynamo         http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf




NOSQL Summer                   http://nosqlsummer.org/papers



NOSQL Tapes             http://nosqltapes.com
Under the Hood
      Terminology
Write Only Log           http://en.wikipedia.org/wiki/Log-structured_file_system



Merkle Trees        http://en.wikipedia.org/wiki/Hash_tree



B-trees   http://en.wikipedia.org/wiki/B-tree



Vector clock       http://en.wikipedia.org/wiki/Vector_clock



Bloom filters       http://en.wikipedia.org/wiki/Bloom_filters



Big O Notation         http://en.wikipedia.org/wiki/Big_o_notation



Consistent Hashing              http://en.wikipedia.org/wiki/Consistent_hashing
CAP Theorem
           http://en.wikipedia.org/wiki/CAP_theorem




Consistency

Availability

Partition Tolerance

   Pick two?

                                             http://guide.couchdb.org/draft/consistency.html
CouchDB
CouchOne, Cloudant    HTTP interface

Erlang                Offline usage

Extreme replication   Sharded scaling
scenarios

Works on phones

Updated indexing
(b-tree)
CouchDB Internal
  Architecture




  http://nosqlpedia.com/wiki/File:CouchDB-Arch.JPG
MongoDB
10Gen, MongoHQ,      Soft landing for
MongoLab             those coming from
                     mysql (relational
C++                  databases)

huMONGOus            Native javascript

Sharded scaling,     Secondary indexes
replicated master/
slave

Located in NYC
(go visit them)
MongoDB Sharding
     Diagram




http://www.snailinaturtleneck.com/blog/2010/03/30/sharding-with-the-fishes/
MySQL to Mongo Query similarity




       http://nosqlpedia.com/wiki/File:MongoDB.JPG
Riak
Basho, Joyent               Multiple backends

Erlang                      Homogeneous

Distributed                 CAP tunable

HTTP, protobuf

Native javascript,
erlang
Hadoop
Cloudera, Apache       Huge ecosystem
Foundation
                          Yahoo, FB, Twitter,
Java                      Fortune 500

High latency              Pig, Hive, Flume

Batch oriented

HDFS is GFS based

Open source Google
stack via the Google
papers
HBase
Java

Low latency store

sits on top of Hadoop

Modeled after Google Bigtable

Column oriented

Thrift, protobuf

Backend for new Facebook Messaging service
Cassandra
Apache

Java

Column oriented

Like Bigtable and Dynamo

Originated at Facebook

At Twitter, Distributed counting
http://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-King
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
Redis
OpenRedis              incredibly fast

C                      memcached on
                       steroids
REmote
DIctionary             replicated
Server                 master/slave

Specific data
structures
Commonalities
Open Source

Adherence to common or standard:

  data formats

    json, bson, utf8, binary

  data trandport mechanisms

    http, thrift, protobuf,
    simple wire protocols
Ok. So Now What?
Analyze your requirements

Mailing lists

IRC, twitter

Project pages, wiki

Github/Google Code/Bitbucket:

  project page

  specific language clients
Variety Pack
Hybrid architectures will become the norm

Twitter - mysql, cassandra, hadoop

Google - mysql, GAE (BT)

Facebook - mysql,
cassandra, hbase,
memcached

Yahoo - mysql, hadoop

LinkedIn - voldemort       http://www.flickr.com/photos/uncleweed/82245324/
Questions?




               adicu.com
           February 2011
        Alexander Sicular
               @siculars

Mais conteúdo relacionado

Mais procurados

Node.js: its potential in healthcare
Node.js: its potential in healthcareNode.js: its potential in healthcare
Node.js: its potential in healthcareRob Tweed
 
Presto Fast SQL on Anything
Presto Fast SQL on AnythingPresto Fast SQL on Anything
Presto Fast SQL on AnythingAlluxio, Inc.
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
elasticsearch basics workshop
elasticsearch basics workshopelasticsearch basics workshop
elasticsearch basics workshopMathieu Elie
 
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)MongoSF
 
Data Visualization on the Tech Side
Data Visualization on the Tech SideData Visualization on the Tech Side
Data Visualization on the Tech SideMathieu Elie
 
Lessons Learned from Building SW at Google
Lessons Learned from Building SW at GoogleLessons Learned from Building SW at Google
Lessons Learned from Building SW at Googleadrianionel
 
Google cluster architecture
Google cluster architecture Google cluster architecture
Google cluster architecture Abhijeet Desai
 
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...StampedeCon
 
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingMementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingSawood Alam
 

Mais procurados (11)

Node.js: its potential in healthcare
Node.js: its potential in healthcareNode.js: its potential in healthcare
Node.js: its potential in healthcare
 
Presto Fast SQL on Anything
Presto Fast SQL on AnythingPresto Fast SQL on Anything
Presto Fast SQL on Anything
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
elasticsearch basics workshop
elasticsearch basics workshopelasticsearch basics workshop
elasticsearch basics workshop
 
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
 
Data Visualization on the Tech Side
Data Visualization on the Tech SideData Visualization on the Tech Side
Data Visualization on the Tech Side
 
Hadoop - Apache Hive
Hadoop - Apache HiveHadoop - Apache Hive
Hadoop - Apache Hive
 
Lessons Learned from Building SW at Google
Lessons Learned from Building SW at GoogleLessons Learned from Building SW at Google
Lessons Learned from Building SW at Google
 
Google cluster architecture
Google cluster architecture Google cluster architecture
Google cluster architecture
 
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
 
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingMementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
 

Semelhante a NOSQL Explained

Technology Stack Discussion
Technology Stack DiscussionTechnology Stack Discussion
Technology Stack DiscussionZaiyang Li
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h basehdhappy001
 
Mongodb - drupal dev days
Mongodb - drupal dev daysMongodb - drupal dev days
Mongodb - drupal dev daysPierre Joye
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Lecture-20.pptx
Lecture-20.pptxLecture-20.pptx
Lecture-20.pptxmohaaalsa
 
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...Amazon Web Services
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
Rapidly Building and Deploying Scalable Web Architectures
Rapidly Building and Deploying Scalable Web ArchitecturesRapidly Building and Deploying Scalable Web Architectures
Rapidly Building and Deploying Scalable Web ArchitecturesKeith Fitzgerald
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousingSneha Challa
 
Using MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content RepositoryUsing MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content RepositoryMongoDB
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsgagravarr
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Frank Munz
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and HowBigBlueHat
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBRick Copeland
 

Semelhante a NOSQL Explained (20)

Technology Stack Discussion
Technology Stack DiscussionTechnology Stack Discussion
Technology Stack Discussion
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Mongodb - drupal dev days
Mongodb - drupal dev daysMongodb - drupal dev days
Mongodb - drupal dev days
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Lecture-20.pptx
Lecture-20.pptxLecture-20.pptx
Lecture-20.pptx
 
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
 
Whynosql
WhynosqlWhynosql
Whynosql
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Rapidly Building and Deploying Scalable Web Architectures
Rapidly Building and Deploying Scalable Web ArchitecturesRapidly Building and Deploying Scalable Web Architectures
Rapidly Building and Deploying Scalable Web Architectures
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
 
Using MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content RepositoryUsing MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content Repository
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 

NOSQL Explained

  • 1. NOSQL. WTW? adicu.com February 2011 Alexander Sicular @siculars
  • 2. Who is this blowhard? Columbia University pays my mortgage For the better part of a decade in Medical Informatics Am not shilling for any of these companies Am not a computer scientist Am a computer science enthusiast particularly in the area of Informatics
  • 3. NoSQL or NOSQL? Not Only SQL Non/post relational Big tent policy Umbrella term Fragmented http://www.flickr.com/photos/morgennebel/2933723145/
  • 4. Your Usage Patterns Read vs. Write Mutable vs. Immutable Product Considerations: In place updates Write Only Logs
  • 5. This vs. That Riak wiki comparisons page http://wiki.basho.com/Riak-Comparisons.html Popular one page comparison of a number of NOSQL players by Kristof Kovacs: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
  • 6. NOSQL concepts are Not Brand New Memcached since 2003 http://memcached.org Google papers 2004-2006 Amazon Dynamo 2007 Consistent Hashing 2007 http://www.last.fm/user/RJ/journal/ 2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients Using relational systems as a key-value blob store 2009 FriendFeed (not the first) http://bret.appspot.com/entry/how- friendfeed-uses-mysql
  • 7. Why NOSQL Support for “Vary Large” data sets Schemaless Denormalized Green field New applications http://www.flickr.com/photos/gailtang/1243984297/
  • 8. Academia Google: Bigtable http://labs.google.com/papers/bigtable.html GFS http://labs.google.com/papers/gfs.html M/R http://labs.google.com/papers/mapreduce.html Amazon: Dynamo http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf NOSQL Summer http://nosqlsummer.org/papers NOSQL Tapes http://nosqltapes.com
  • 9. Under the Hood Terminology Write Only Log http://en.wikipedia.org/wiki/Log-structured_file_system Merkle Trees http://en.wikipedia.org/wiki/Hash_tree B-trees http://en.wikipedia.org/wiki/B-tree Vector clock http://en.wikipedia.org/wiki/Vector_clock Bloom filters http://en.wikipedia.org/wiki/Bloom_filters Big O Notation http://en.wikipedia.org/wiki/Big_o_notation Consistent Hashing http://en.wikipedia.org/wiki/Consistent_hashing
  • 10. CAP Theorem http://en.wikipedia.org/wiki/CAP_theorem Consistency Availability Partition Tolerance Pick two? http://guide.couchdb.org/draft/consistency.html
  • 11. CouchDB CouchOne, Cloudant HTTP interface Erlang Offline usage Extreme replication Sharded scaling scenarios Works on phones Updated indexing (b-tree)
  • 12. CouchDB Internal Architecture http://nosqlpedia.com/wiki/File:CouchDB-Arch.JPG
  • 13. MongoDB 10Gen, MongoHQ, Soft landing for MongoLab those coming from mysql (relational C++ databases) huMONGOus Native javascript Sharded scaling, Secondary indexes replicated master/ slave Located in NYC (go visit them)
  • 14. MongoDB Sharding Diagram http://www.snailinaturtleneck.com/blog/2010/03/30/sharding-with-the-fishes/
  • 15. MySQL to Mongo Query similarity http://nosqlpedia.com/wiki/File:MongoDB.JPG
  • 16. Riak Basho, Joyent Multiple backends Erlang Homogeneous Distributed CAP tunable HTTP, protobuf Native javascript, erlang
  • 17. Hadoop Cloudera, Apache Huge ecosystem Foundation Yahoo, FB, Twitter, Java Fortune 500 High latency Pig, Hive, Flume Batch oriented HDFS is GFS based Open source Google stack via the Google papers
  • 18. HBase Java Low latency store sits on top of Hadoop Modeled after Google Bigtable Column oriented Thrift, protobuf Backend for new Facebook Messaging service
  • 19. Cassandra Apache Java Column oriented Like Bigtable and Dynamo Originated at Facebook At Twitter, Distributed counting http://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-King http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
  • 20. Redis OpenRedis incredibly fast C memcached on steroids REmote DIctionary replicated Server master/slave Specific data structures
  • 21. Commonalities Open Source Adherence to common or standard: data formats json, bson, utf8, binary data trandport mechanisms http, thrift, protobuf, simple wire protocols
  • 22. Ok. So Now What? Analyze your requirements Mailing lists IRC, twitter Project pages, wiki Github/Google Code/Bitbucket: project page specific language clients
  • 23. Variety Pack Hybrid architectures will become the norm Twitter - mysql, cassandra, hadoop Google - mysql, GAE (BT) Facebook - mysql, cassandra, hbase, memcached Yahoo - mysql, hadoop LinkedIn - voldemort http://www.flickr.com/photos/uncleweed/82245324/
  • 24. Questions? adicu.com February 2011 Alexander Sicular @siculars

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n