SlideShare uma empresa Scribd logo
1 de 84
Baixar para ler offline
NoSQL
           What it is and is it for you?

           Iraj Islam
           Rubayeet Islam
           Nurul Ferdous


                     NewsCred

Thursday, February 3, 2011
Agenda                                                   NewsCred



                    •        Part 1. Why NoSQL?

                    •        Part 2. NoSQL Use Cases

                    •        Part 3. Choosing a NoSQL Solution

                    •        Part 4. Understanding MongoDB

                    •        Part 5. Building a MongoDB App

                    •        Part 6. Scaling MongoDB

                    •        Questions



Thursday, February 3, 2011
Who We Are                                   NewsCred




                Iraj Islam
                CTO/Co-founder, NewsCred


                Rubayeet Islam
                Senior Software Engineer, NewsCred


                Nurul Ferdous
                Senior Software Engineer, NewsCred




Thursday, February 3, 2011
Our Story                                                NewsCred




                Launched 2008
                Founded by two Bangladeshis 2008


                Funded By Investors of Twitter
                Floodgate Ventures (twitter), Bessemer Cap. (LinkedIn)


                Top-tier Clients
                Yahoo! Orange Telecom, Harvard U, The Daily Star etc.




Thursday, February 3, 2011
What We Do                     NewsCred



                             Domain Expertise
                             •   Big Data

                             •   Information Retrieval

                             •   Machine Learning

                             •   Semantic Web


                             Technologies
                             •   Apache Solr

                             •   MySQL/MongoDB

                             •   Python/Java



Thursday, February 3, 2011
Part 1
           Why NoSQL?


                     NewsCred


Thursday, February 3, 2011
What’s NoSQL?                                  NewsCred




                                   NoSQL
                             What’s with the weird name?




Thursday, February 3, 2011
What’s NoSQL?                                NewsCred




                                NoSQL
                      Non-relational, web-scale database.




Thursday, February 3, 2011
Why NoSQL?                                       NewsCred


                                       Web 1.0
                                The read intensive web
             Publishing Model




Thursday, February 3, 2011
Why NoSQL?                                         NewsCred


                                         Web 1.0
                                  The read intensive web
             Publishing Model




                Textual Content




Thursday, February 3, 2011
Why NoSQL?                                             NewsCred


                                         Web 1.0
                                  The read intensive web
             Publishing Model                              Small Data




                Textual Content




Thursday, February 3, 2011
Why NoSQL?                                             NewsCred


                                         Web 1.0
                                  The read intensive web
             Publishing Model                Browsing      Small Data




                Textual Content




Thursday, February 3, 2011
Why NoSQL?                                              NewsCred


                                         Web 1.0
                                  The read intensive web
             Publishing Model                Browsing      Small Data




                Textual Content                            Search




Thursday, February 3, 2011
Why NoSQL?                                               NewsCred


                                         Web 1.0
                                  The read intensive web
             Publishing Model                   Browsing    Small Data




                Textual Content         Personal Computer   Search




Thursday, February 3, 2011
Why NoSQL?                                                          NewsCred


                                    The Age of Big Data
                              Exabytes (1018) of data stored per year

                                                                      1000



                                                                      750


                                                                   500


                                                                  250
                             2006
                                    2007
                                           2008                   0
                                                  2009
                                                          2010


Thursday, February 3, 2011
Why NoSQL?                                     NewsCred


                                   Web 2.0+
                             The write intensive web




Thursday, February 3, 2011
Why NoSQL?                                       NewsCred


                                   Web 2.0+
                             The write intensive web




                                                  User-generated Content




Thursday, February 3, 2011
Why NoSQL?                                        NewsCred


                                   Web 2.0+
                             The write intensive web
                                                       Big Data




                                                  User-generated Content




Thursday, February 3, 2011
Why NoSQL?                                        NewsCred


                                   Web 2.0+
                             The write intensive web
      Semi-structured Data                             Big Data




                                                  User-generated Content




Thursday, February 3, 2011
Why NoSQL?                                           NewsCred


                                      Web 2.0+
                                The write intensive web
      Semi-structured Data                                Big Data




                 Semantic Web                        User-generated Content




Thursday, February 3, 2011
Why NoSQL?                                           NewsCred


                                      Web 2.0+
                                The write intensive web
      Semi-structured Data                Real-time       Big Data




                 Semantic Web                         User-generated Content




Thursday, February 3, 2011
Why NoSQL?                                                  NewsCred


                                      Web 2.0+
                                The write intensive web
      Semi-structured Data                    Real-time          Big Data




                 Semantic Web              Ubiquity          User-generated Content
                                     Any device. Anywhere.




Thursday, February 3, 2011
Why NoSQL?                               NewsCred


                             The MySQL Problem
                                  1. Default

                               Application


                    Data
                   Source
                                       Writing

                                                 MySQL


                     User             Reading




Thursday, February 3, 2011
Why NoSQL?                                          NewsCred


                             The MySQL Problem
                                  1. Default

                               Application
                                                 Bottleneck, too much load!


                    Data
                   Source
                                       Writing

                                                             MySQL


                     User             Reading




Thursday, February 3, 2011
Why NoSQL?                               NewsCred


                             The MySQL Problem
                                2. Replication

                               Application


                    Data
                   Source
                                       Writing     MySQL
                                                   Master




                     User             Reading         MySQL
                                                      Slaves




Thursday, February 3, 2011
Why NoSQL?                                             NewsCred


                             The MySQL Problem
                                2. Replication

                               Application


                    Data
                   Source
                                       Writing                     MySQL
                                                                   Master




                     User             Reading                         MySQL
                                                                      Slaves


                                                 Scalable Reads!

Thursday, February 3, 2011
Why NoSQL?                                                NewsCred


                             The MySQL Problem
                                2. Replication
                                                 Bottleneck, writes won’t scale!
                               Application


                    Data
                   Source
                                       Writing                       MySQL
                                                                     Master




                     User             Reading                            MySQL
                                                                         Slaves


                                                  Scalable Reads!

Thursday, February 3, 2011
Why NoSQL?                                   NewsCred


                             The MySQL Problem
                                 3. Sharding

                               Application


                    Data
                   Source
                                       Writing   S

                                                       MySQL


                     User             Reading    S


Thursday, February 3, 2011
Why NoSQL?                                             NewsCred


                             The MySQL Problem
                                 3. Sharding

                               Application
                                                     Great, scalable writes!

                    Data
                   Source
                                       Writing   S

                                                                    MySQL


                     User             Reading    S


Thursday, February 3, 2011
Why NoSQL?                                                NewsCred


                             The MySQL Problem
                                 3. Sharding

                               Application
                                                        Great, scalable writes!

                    Data
                   Source
                                       Writing    S

                                                                       MySQL


                     User             Reading     S
                                             Development and maintenance
                                                costs just skyrocketed!

Thursday, February 3, 2011
Why NoSQL?                                                  NewsCred


                                      Web 2.0+
                                The write intensive web
      Semi-structured Data                    Real-time          Big Data




                 Semantic Web              Ubiquity          User-generated Content
                                     Any device. Anywhere.




Thursday, February 3, 2011
Why NoSQL?                                             NewsCred


                                    The NoSQL Solution
                                       Design Goals


                             Semi-structure   >> Schema-free




Thursday, February 3, 2011
Why NoSQL?                                                NewsCred


                                    The NoSQL Solution
                                       Design Goals


                             Semi-structure   >> Schema-free

                                  Big Data    >> Scalable reads/writes




Thursday, February 3, 2011
Why NoSQL?                                                NewsCred


                                    The NoSQL Solution
                                       Design Goals


                             Semi-structure   >> Schema-free

                                  Big Data    >> Scalable reads/writes

                                 Real-time    >> High-performance




Thursday, February 3, 2011
Why NoSQL?                                                   NewsCred


                                    The NoSQL Solution
                                       Design Goals


                             Semi-structure   >> Schema-free

                                  Big Data    >> Scalable reads/writes

                                 Real-time    >> High-performance

                                  Ubiquity    >> High-availability



Thursday, February 3, 2011
NoSQL vs RDMS                                   NewsCred



          NoSQL                           RDBMS
          • Schema-free                   • Relational schema
          • Scalable writes/reads         • Scalable reads
                                     vs
          • Auto high-availability        • Custom high-availability




Thursday, February 3, 2011
NoSQL vs RDMS                                       NewsCred



          NoSQL                               RDBMS
          • Schema-free                       • Relational schema
          • Scalable writes/reads             • Scalable reads
                                   vs
          • Auto high-availability            • Custom high-availability
          • Limited queries                   • Flexible queries
          • Eventual Consistency *            • Consistency
          • BASE                              • ACID
            * Applies to most NoSQL systems


Thursday, February 3, 2011
Is NoSQL For You?                                   NewsCred



          NoSQL                               RDBMS
          • Schema-free                       • Relational schema
          • Scalable writes/reads             • Scalable reads
                                   vs
          • Auto high-availability            • Custom high-availability
          • Limited queries                   • Flexible queries
          • Eventual Consistency *            • Consistency
          • BASE                              • ACID
            * Applies to most NoSQL systems


Thursday, February 3, 2011
Is NoSQL For You?                                   NewsCred



          NoSQL                               RDBMS
          • Schema-free                       • Relational schema
          • Scalable writes/reads             • Scalable reads
                                   vs
          • Auto high-availability            • Custom high-availability
          • Limited queries                   • Flexible queries
          • Eventual Consistency *            • Consistency
          • BASE                              • ACID
            * Applies to most NoSQL systems


Thursday, February 3, 2011
Part 2
           NoSQL Use Cases


                     NewsCred


Thursday, February 3, 2011
Who’s Using NoSQL?   NewsCred




Thursday, February 3, 2011
NoSQL Use Cases                    NewsCred



                • Consumer Use Cases
                        • Facebook
                        • Twitter
                        • NetFlix


                  • Enterprise Use Cases
                        • Rackspace
                        • TrendMicro
                        • NewsCred




Thursday, February 3, 2011
NoSQL Use Cases                                               NewsCred



                • Facebook
                        • Hbase - Facebook messages
                        • Scribe - Real-time click logs
                        • Hive      - SQL queries -> MapReduce jobs
                        • Hadoop
                             • Web analytics warehouse
                             • Distributed datastore
                             • MySQL backups




Thursday, February 3, 2011
NoSQL Use Cases                                          NewsCred



                • Twitter
                        • Hadoop    - Analytics
                        • Hbase     - People search
                        • Scribe    - Log collection framework
                        • FlockDB   - Social graph analysis




Thursday, February 3, 2011
NoSQL Use Cases                                                NewsCred



                • Rackspace
                        • Cassandra – stat collection, mail and apps

                  • TrendMicro
                        • Hbase & Hadoop – reputation databases

                  • NewsCred
                        • MongoDB
                          • API usage analytics
                          • Pixel tracking analytics
                          • Entity metadata storage


Thursday, February 3, 2011
Demo
           NewsCred API Analytics


                     NewsCred


Thursday, February 3, 2011
Part 3
           Choosing a NoSQL Solution


                     NewsCred


Thursday, February 3, 2011
Choosing a NoSQL Solution                                                                                                    NewsCred

                                                                          Availability
                                                                   Each:client:can:always:read:and:write




                                                                                A
                                     RDBMSs                                                                     Cassandra
                                      MySQL:                                                                    Voldemort
                                  PostgreSQL                                                                    CouchDB
                                   Aster:Data         CA                                                   AP   Dynamo
                                  GreenPlum                                                                     SimpleDB
                                       Vertica                                                                  Tokyo:Cabinet
                                                                                                                Riak




                                                 C                                                          P          PartitionDtolerance:
                 Consistency                                                    CP
                 All:clients:have:the:same:view:of:                                                                    The:system:works:well:despite:
                 the:data                                  BigTable        Scalaris                                    physical:network:partitions

                                                           HyperTable      Berkeley:DB
                                                           Hbase           Memcache:DB
                                                           MongoDB         Redis



Thursday, February 3, 2011
Consistent, Available (CA)                                 NewsCred


                             CA-systems have trouble with partitions and
                                     deal with it with replication.

                  • Examples
                        • MySQL (relational)
                        • Aster Data (relational)
                        • Greenplum (relational)
                        • Vertica (column)




Thursday, February 3, 2011
Availability, Partition-Tolerant (AP)                    NewsCred


                         AP-systems have trouble with consistency, achieve
                            “eventual consistency” through replication.

                  • Examples
                        • Cassandra (column/tabular)
                        • Dynamo (key-value)
                        • Voldemort (key-value)
                        • Tokyo Cabinet (key-value)
                        • CouchDB (document)
                        • SimpleDB (document)
                        • Riak (document)



Thursday, February 3, 2011
Consistent, Partition-Tolerant (CP)                          NewsCred


                              CP-systems have trouble with availability while
                             keeping data consistent across partitioned nodes.

                  • Examples
                        • MongoDB (document)
                        • BigTable (column/tabular)
                        • HyperTable (column/tabular)
                        • Hbase (column/tabular)
                        • Redis (key-value)
                        • Scalaris (key-value)
                        • MemcacheDB (key-value)



Thursday, February 3, 2011
Hbase                                                                 NewsCred


             Selling point:                                             A
             Billions of rows, millions of columns


             Use when you need:
             Random, real-time access to Big Data
                                                              C                  P

             Written in: Java
             License: Apache
             Type: Column/Tabular
             Protocol: HTTP/REST/Thrift              Users:
             Community Support: Good                 Yahoo!, Facebook, Microsoft, Adobe,
             Learning Curve: High                    StumbleUpon etc.



Thursday, February 3, 2011
Cassandra                                                              NewsCred


             Selling point:                                            A
             Best of Google BigTable and Amazon Dynamo


             Use when you need:
             To write more than you read (logging)
                                                              C                   P

             Written in: Java
             License: Apache
             Type: Column/Tabular
             Protocol: Custom, binary (Thrift)       Users:
             Community Support: Great                Facebook, Twitter, Digg, Reddit,
             Learning Curve: Medium                  Rackspace, Cisco, SimpleGeo, Cloudkick etc.



Thursday, February 3, 2011
Redis                                                                   NewsCred


             Selling point:                                              A
             Blazing fast, in-memory like memcached


             Use when you need:
             To manage rapidly changing data
                                                               C                     P

             Written in: C/C++
             License: BSD
             Type: Key-value
             Protocol: Telnet-like                    Users:
             Community Support: Good                  Github, Craigslist, Stackoverflow,
             Learning Curve: Low                      Disqus, The Guardian Uk etc.



Thursday, February 3, 2011
MongoDB                                                               NewsCred


             Selling point:                                           A
             Best of NoSQL and RDBMS


             Use when you need:
             Dynamic queries and indexing on a Big DB
                                                             C                   P

             Written in: C++
             License: AGPL
             Type: Document
             Protocol: Custom, binary (BSON)        Users:
             Community Support: Great               NewsCred, Foursquare, Github, Sourceforge,
             Learning Curve: Low                    The New York Times, Etsy, Shutterfly etc.



Thursday, February 3, 2011
Part 4
           Understanding MongoDB


                     NewsCred


Thursday, February 3, 2011
Understanding MongoDB            NewsCred



                • Database == Database
                • Table == Collection
                • Row == Document




Thursday, February 3, 2011
Understanding MongoDB   NewsCred



                • Mongo Shell




Thursday, February 3, 2011
Understanding MongoDB   NewsCred



                • INSERT




Thursday, February 3, 2011
Understanding MongoDB                                                   NewsCred



                • SELECT

                SELECT * FROM users WHERE X = 3 AND Y = 'abc';

                db.users.find({X:3, Y: ”abc”})



                SELECT * FROM users WHERE X = 3 AND Y = 'abc' ORDER BY X ASC;

                db.users.find({X:3, Y: ”abc”}).sort({X:1})



                SELECT username, email FROM users WHERE X = 3 AND Y = 'abc';

                db.users.find({X:3, Y: ”abc”}, {username:true, email:true})




Thursday, February 3, 2011
Understanding MongoDB                                                                 NewsCred



                • UPDATE
                db.collection.update(criteria, modifier, upsert, multi)


                criteria : Query which selects the record(s) to update
                modifier : $set, $inc, $unset, $push, $pop...
                upsert : Insert if not exists, update otherwise
                multi : Update multiple docs matching the criteria


                UPDATE users SET X = 4, Y = 'abc' WHERE username = 'joegunchy';

                db.users.update({username:”joegunchy”}, {$set: {X:4, Y:'abc'}}, true, true)




Thursday, February 3, 2011
Understanding MongoDB                                                              NewsCred



                • DELETE
                db.articles.remove({}) /*remove all*/

                db.articles.remove({tag:'sql'}) /*remove all articles with tag = 'sql'*/

                db.articles.remove({tag:'sql'}) /*block other ops while removing*/




Thursday, February 3, 2011
Understanding MongoDB                                          NewsCred



                • AGGREGATION
                > db.users.count()
                42

                > db.addresses.distinct('zipcode', {'city':'Dhaka'})
                [1000, 1100, 1204, 1205....]




Thursday, February 3, 2011
Understanding MongoDB                                                  NewsCred



                • Map/Reduce
                       • Algorithm introduced by Google for processing large
                             datasets on clusters



                • MongoDB uses it for:
                       • Aggregation (Group By, Avg, Sum etc.)
                       • Batch processing jobs




Thursday, February 3, 2011
Understanding MongoDB   NewsCred



                • Map/Reduce




Thursday, February 3, 2011
Understanding MongoDB                       NewsCred



                • Map/Reduce Example

                  Document




                  We want to do something like...




Thursday, February 3, 2011
Understanding MongoDB          NewsCred



                • Map/Reduce Example

                  Map




                  Reduce




Thursday, February 3, 2011
Understanding MongoDB          NewsCred



                • Map/Reduce Example

                  Execute




Thursday, February 3, 2011
Understanding MongoDB          NewsCred



                • Map/Reduce Example

                  Result




Thursday, February 3, 2011
Part 5
           Building a MongoDB App


                     NewsCred


Thursday, February 3, 2011
Part 6
           Scaling with MongoDB


                     NewsCred


Thursday, February 3, 2011
Scaling with MongoDB               NewsCred



                • Scaling is a challenge

                • No silver bullet

                • Strategies
                       • Replication
                       • Replica Sets
                       • Auto-sharding


Thursday, February 3, 2011
Scaling with MongoDB                               NewsCred


                                     Replication


                                         Master




                             Slave       Slave     Slave




Thursday, February 3, 2011
Scaling with MongoDB                            NewsCred


                                    Replica Sets


                                            Secondary




                             User
                                                        Passive




                                            Primary




Thursday, February 3, 2011
Scaling with MongoDB                                           NewsCred


                                 Replica Sets: Election
                                                                Synced,3ms,ago




                                                      C
                                                  Priority,1
                             A

                                                                Synced,1ms,ago




                                                     E
                                                  Priority,1
                                                   Priority 1




                             B

                                                      D
                                                  Priority,0




Thursday, February 3, 2011
Scaling with MongoDB                                                           NewsCred



                • Replica Sets: Network Partition
                             • Election Process initiated
                                 • When a node can’t reach primary
                                 • When primary can’t reach majority of nodes in set

                             • New primary is elected by majority of nodes in set

                             • Node with the most recent data gets priority

                             • Arbiter node used to break ties




Thursday, February 3, 2011
Scaling with MongoDB                                                     NewsCred



                • Auto-sharding
                             • Cluster handles sharding data and rebalancing
                               automatically

                             • No administrative headaches of manual sharding

                             • Application is oblivious to existence of shards




Thursday, February 3, 2011
Scaling with MongoDB                                  NewsCred


                                              Auto-sharding




                             Big$Collection




Thursday, February 3, 2011
Scaling with MongoDB                 NewsCred


                             Auto-sharding

                                   User




                                 Router)




Thursday, February 3, 2011
Scaling with MongoDB                                      NewsCred


                                             Auto-sharding

               • Connect to a single server
                        • db = connect(‘localhost:27017’)

               • Connect to a router
                        • db = connect(‘localhost:27017’)



                                      User

                                                             Mongo)DB




Thursday, February 3, 2011
Scaling with MongoDB                                                NewsCred



                • When to shard?
                             • Running out of disk space
                             • Write intensive
                             • Need to keep large chunk of data in memory


                • Don’t start out with a sharded collection!

                • Shard “if and when” you need to



Thursday, February 3, 2011
Scaling with MongoDB                                                      NewsCred



                • Choosing a Shard Key
                             • Incremental
                                • Example: timestamps i.e. ‘created_at’
                                • Queries on shard key is highly efficient

                             • Random
                                • Example: ‘username’
                                • Writes are distributed across multiple shards




Thursday, February 3, 2011
Scaling with MongoDB                                  NewsCred


                             Sharding + Replica Sets

                                           User




                                         Router




                                 P                    P




                             S       S            S       S




Thursday, February 3, 2011
Questions?                                 NewsCred




                Iraj Islam
                iraj@newscred.com, @irajislam


                Rubayeet Islam
                rubayeet@newscred.com, @rubayeet


                Nurul Ferdous
                nurul@newscred.com, @ferdous




Thursday, February 3, 2011

Mais conteúdo relacionado

Semelhante a NoSQL! is it for you?

MySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP PerspectiveMySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP PerspectiveTim Juravich
 
Community Code: Xero
Community Code: XeroCommunity Code: Xero
Community Code: XeroSencha
 
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Matthew Petrillo
 
2011 The Year of Web apps
2011 The Year of Web apps2011 The Year of Web apps
2011 The Year of Web appsJungHyuk Kwon
 
Node js techtalksto
Node js techtalkstoNode js techtalksto
Node js techtalkstoJason Diller
 
Web micro-framework BATTLE!
Web micro-framework BATTLE!Web micro-framework BATTLE!
Web micro-framework BATTLE!Richard Jones
 
LinkedOpenDataItalia@LAPSI-Primer-Milan-2011
LinkedOpenDataItalia@LAPSI-Primer-Milan-2011LinkedOpenDataItalia@LAPSI-Primer-Milan-2011
LinkedOpenDataItalia@LAPSI-Primer-Milan-2011Christian Morbidoni
 
LinkedOpenDataItalia@LAPSI-Primer-Milano-2011
LinkedOpenDataItalia@LAPSI-Primer-Milano-2011LinkedOpenDataItalia@LAPSI-Primer-Milano-2011
LinkedOpenDataItalia@LAPSI-Primer-Milano-2011Christian Morbidoni
 
Publishing linked data from relational databases
Publishing linked data from relational databasesPublishing linked data from relational databases
Publishing linked data from relational databasesIván Ruiz-Rube
 
NoSQL and SQL Databases
NoSQL and SQL DatabasesNoSQL and SQL Databases
NoSQL and SQL DatabasesGaurav Paliwal
 
Big Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataBig Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataTugdual Grall
 
Flowdock's full-text search with MongoDB
Flowdock's full-text search with MongoDBFlowdock's full-text search with MongoDB
Flowdock's full-text search with MongoDBFlowdock
 
Database Management in Different Applications of IOT
Database Management in Different Applications of IOTDatabase Management in Different Applications of IOT
Database Management in Different Applications of IOTijceronline
 
JavaScript as a Server side language (NodeJS): JSConf 2011, Dhaka
JavaScript as a Server side language (NodeJS): JSConf 2011, DhakaJavaScript as a Server side language (NodeJS): JSConf 2011, Dhaka
JavaScript as a Server side language (NodeJS): JSConf 2011, DhakaNurul Ferdous
 
Nuxeo introduction to ecr at the NYC Java meetup, April 2011
Nuxeo introduction to ecr at the NYC Java meetup, April 2011Nuxeo introduction to ecr at the NYC Java meetup, April 2011
Nuxeo introduction to ecr at the NYC Java meetup, April 2011Nuxeo
 
Froscon2011: How i learned to use sql and then learned not to use it
Froscon2011:  How i learned to use sql and then learned not to use itFroscon2011:  How i learned to use sql and then learned not to use it
Froscon2011: How i learned to use sql and then learned not to use itHenrik Ingo
 
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)Emil Eifrem
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 

Semelhante a NoSQL! is it for you? (20)

MySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP PerspectiveMySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP Perspective
 
Community Code: Xero
Community Code: XeroCommunity Code: Xero
Community Code: Xero
 
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Ontotext Overview Winter 2012
Ontotext Overview Winter 2012
 
2011 The Year of Web apps
2011 The Year of Web apps2011 The Year of Web apps
2011 The Year of Web apps
 
Node js techtalksto
Node js techtalkstoNode js techtalksto
Node js techtalksto
 
Web micro-framework BATTLE!
Web micro-framework BATTLE!Web micro-framework BATTLE!
Web micro-framework BATTLE!
 
LinkedOpenDataItalia@LAPSI-Primer-Milan-2011
LinkedOpenDataItalia@LAPSI-Primer-Milan-2011LinkedOpenDataItalia@LAPSI-Primer-Milan-2011
LinkedOpenDataItalia@LAPSI-Primer-Milan-2011
 
LinkedOpenDataItalia@LAPSI-Primer-Milano-2011
LinkedOpenDataItalia@LAPSI-Primer-Milano-2011LinkedOpenDataItalia@LAPSI-Primer-Milano-2011
LinkedOpenDataItalia@LAPSI-Primer-Milano-2011
 
Publishing linked data from relational databases
Publishing linked data from relational databasesPublishing linked data from relational databases
Publishing linked data from relational databases
 
NoSQL and SQL Databases
NoSQL and SQL DatabasesNoSQL and SQL Databases
NoSQL and SQL Databases
 
Linq to sql
Linq to sqlLinq to sql
Linq to sql
 
Big Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataBig Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big Data
 
Flowdock's full-text search with MongoDB
Flowdock's full-text search with MongoDBFlowdock's full-text search with MongoDB
Flowdock's full-text search with MongoDB
 
Database Management in Different Applications of IOT
Database Management in Different Applications of IOTDatabase Management in Different Applications of IOT
Database Management in Different Applications of IOT
 
JavaScript as a Server side language (NodeJS): JSConf 2011, Dhaka
JavaScript as a Server side language (NodeJS): JSConf 2011, DhakaJavaScript as a Server side language (NodeJS): JSConf 2011, Dhaka
JavaScript as a Server side language (NodeJS): JSConf 2011, Dhaka
 
Nuxeo introduction to ecr at the NYC Java meetup, April 2011
Nuxeo introduction to ecr at the NYC Java meetup, April 2011Nuxeo introduction to ecr at the NYC Java meetup, April 2011
Nuxeo introduction to ecr at the NYC Java meetup, April 2011
 
Froscon2011: How i learned to use sql and then learned not to use it
Froscon2011:  How i learned to use sql and then learned not to use itFroscon2011:  How i learned to use sql and then learned not to use it
Froscon2011: How i learned to use sql and then learned not to use it
 
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
Architecting for failure
Architecting for failureArchitecting for failure
Architecting for failure
 

Último

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

NoSQL! is it for you?

  • 1. NoSQL What it is and is it for you? Iraj Islam Rubayeet Islam Nurul Ferdous NewsCred Thursday, February 3, 2011
  • 2. Agenda NewsCred • Part 1. Why NoSQL? • Part 2. NoSQL Use Cases • Part 3. Choosing a NoSQL Solution • Part 4. Understanding MongoDB • Part 5. Building a MongoDB App • Part 6. Scaling MongoDB • Questions Thursday, February 3, 2011
  • 3. Who We Are NewsCred Iraj Islam CTO/Co-founder, NewsCred Rubayeet Islam Senior Software Engineer, NewsCred Nurul Ferdous Senior Software Engineer, NewsCred Thursday, February 3, 2011
  • 4. Our Story NewsCred Launched 2008 Founded by two Bangladeshis 2008 Funded By Investors of Twitter Floodgate Ventures (twitter), Bessemer Cap. (LinkedIn) Top-tier Clients Yahoo! Orange Telecom, Harvard U, The Daily Star etc. Thursday, February 3, 2011
  • 5. What We Do NewsCred Domain Expertise • Big Data • Information Retrieval • Machine Learning • Semantic Web Technologies • Apache Solr • MySQL/MongoDB • Python/Java Thursday, February 3, 2011
  • 6. Part 1 Why NoSQL? NewsCred Thursday, February 3, 2011
  • 7. What’s NoSQL? NewsCred NoSQL What’s with the weird name? Thursday, February 3, 2011
  • 8. What’s NoSQL? NewsCred NoSQL Non-relational, web-scale database. Thursday, February 3, 2011
  • 9. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Thursday, February 3, 2011
  • 10. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Textual Content Thursday, February 3, 2011
  • 11. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Small Data Textual Content Thursday, February 3, 2011
  • 12. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual Content Thursday, February 3, 2011
  • 13. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual Content Search Thursday, February 3, 2011
  • 14. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual Content Personal Computer Search Thursday, February 3, 2011
  • 15. Why NoSQL? NewsCred The Age of Big Data Exabytes (1018) of data stored per year 1000 750 500 250 2006 2007 2008 0 2009 2010 Thursday, February 3, 2011
  • 16. Why NoSQL? NewsCred Web 2.0+ The write intensive web Thursday, February 3, 2011
  • 17. Why NoSQL? NewsCred Web 2.0+ The write intensive web User-generated Content Thursday, February 3, 2011
  • 18. Why NoSQL? NewsCred Web 2.0+ The write intensive web Big Data User-generated Content Thursday, February 3, 2011
  • 19. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Big Data User-generated Content Thursday, February 3, 2011
  • 20. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Big Data Semantic Web User-generated Content Thursday, February 3, 2011
  • 21. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web User-generated Content Thursday, February 3, 2011
  • 22. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web Ubiquity User-generated Content Any device. Anywhere. Thursday, February 3, 2011
  • 23. Why NoSQL? NewsCred The MySQL Problem 1. Default Application Data Source Writing MySQL User Reading Thursday, February 3, 2011
  • 24. Why NoSQL? NewsCred The MySQL Problem 1. Default Application Bottleneck, too much load! Data Source Writing MySQL User Reading Thursday, February 3, 2011
  • 25. Why NoSQL? NewsCred The MySQL Problem 2. Replication Application Data Source Writing MySQL Master User Reading MySQL Slaves Thursday, February 3, 2011
  • 26. Why NoSQL? NewsCred The MySQL Problem 2. Replication Application Data Source Writing MySQL Master User Reading MySQL Slaves Scalable Reads! Thursday, February 3, 2011
  • 27. Why NoSQL? NewsCred The MySQL Problem 2. Replication Bottleneck, writes won’t scale! Application Data Source Writing MySQL Master User Reading MySQL Slaves Scalable Reads! Thursday, February 3, 2011
  • 28. Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Data Source Writing S MySQL User Reading S Thursday, February 3, 2011
  • 29. Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Great, scalable writes! Data Source Writing S MySQL User Reading S Thursday, February 3, 2011
  • 30. Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Great, scalable writes! Data Source Writing S MySQL User Reading S Development and maintenance costs just skyrocketed! Thursday, February 3, 2011
  • 31. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web Ubiquity User-generated Content Any device. Anywhere. Thursday, February 3, 2011
  • 32. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Thursday, February 3, 2011
  • 33. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writes Thursday, February 3, 2011
  • 34. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writes Real-time >> High-performance Thursday, February 3, 2011
  • 35. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writes Real-time >> High-performance Ubiquity >> High-availability Thursday, February 3, 2011
  • 36. NoSQL vs RDMS NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability Thursday, February 3, 2011
  • 37. NoSQL vs RDMS NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systems Thursday, February 3, 2011
  • 38. Is NoSQL For You? NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systems Thursday, February 3, 2011
  • 39. Is NoSQL For You? NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systems Thursday, February 3, 2011
  • 40. Part 2 NoSQL Use Cases NewsCred Thursday, February 3, 2011
  • 41. Who’s Using NoSQL? NewsCred Thursday, February 3, 2011
  • 42. NoSQL Use Cases NewsCred • Consumer Use Cases • Facebook • Twitter • NetFlix • Enterprise Use Cases • Rackspace • TrendMicro • NewsCred Thursday, February 3, 2011
  • 43. NoSQL Use Cases NewsCred • Facebook • Hbase - Facebook messages • Scribe - Real-time click logs • Hive - SQL queries -> MapReduce jobs • Hadoop • Web analytics warehouse • Distributed datastore • MySQL backups Thursday, February 3, 2011
  • 44. NoSQL Use Cases NewsCred • Twitter • Hadoop - Analytics • Hbase - People search • Scribe - Log collection framework • FlockDB - Social graph analysis Thursday, February 3, 2011
  • 45. NoSQL Use Cases NewsCred • Rackspace • Cassandra – stat collection, mail and apps • TrendMicro • Hbase & Hadoop – reputation databases • NewsCred • MongoDB • API usage analytics • Pixel tracking analytics • Entity metadata storage Thursday, February 3, 2011
  • 46. Demo NewsCred API Analytics NewsCred Thursday, February 3, 2011
  • 47. Part 3 Choosing a NoSQL Solution NewsCred Thursday, February 3, 2011
  • 48. Choosing a NoSQL Solution NewsCred Availability Each:client:can:always:read:and:write A RDBMSs Cassandra MySQL: Voldemort PostgreSQL CouchDB Aster:Data CA AP Dynamo GreenPlum SimpleDB Vertica Tokyo:Cabinet Riak C P PartitionDtolerance: Consistency CP All:clients:have:the:same:view:of: The:system:works:well:despite: the:data BigTable Scalaris physical:network:partitions HyperTable Berkeley:DB Hbase Memcache:DB MongoDB Redis Thursday, February 3, 2011
  • 49. Consistent, Available (CA) NewsCred CA-systems have trouble with partitions and deal with it with replication. • Examples • MySQL (relational) • Aster Data (relational) • Greenplum (relational) • Vertica (column) Thursday, February 3, 2011
  • 50. Availability, Partition-Tolerant (AP) NewsCred AP-systems have trouble with consistency, achieve “eventual consistency” through replication. • Examples • Cassandra (column/tabular) • Dynamo (key-value) • Voldemort (key-value) • Tokyo Cabinet (key-value) • CouchDB (document) • SimpleDB (document) • Riak (document) Thursday, February 3, 2011
  • 51. Consistent, Partition-Tolerant (CP) NewsCred CP-systems have trouble with availability while keeping data consistent across partitioned nodes. • Examples • MongoDB (document) • BigTable (column/tabular) • HyperTable (column/tabular) • Hbase (column/tabular) • Redis (key-value) • Scalaris (key-value) • MemcacheDB (key-value) Thursday, February 3, 2011
  • 52. Hbase NewsCred Selling point: A Billions of rows, millions of columns Use when you need: Random, real-time access to Big Data C P Written in: Java License: Apache Type: Column/Tabular Protocol: HTTP/REST/Thrift Users: Community Support: Good Yahoo!, Facebook, Microsoft, Adobe, Learning Curve: High StumbleUpon etc. Thursday, February 3, 2011
  • 53. Cassandra NewsCred Selling point: A Best of Google BigTable and Amazon Dynamo Use when you need: To write more than you read (logging) C P Written in: Java License: Apache Type: Column/Tabular Protocol: Custom, binary (Thrift) Users: Community Support: Great Facebook, Twitter, Digg, Reddit, Learning Curve: Medium Rackspace, Cisco, SimpleGeo, Cloudkick etc. Thursday, February 3, 2011
  • 54. Redis NewsCred Selling point: A Blazing fast, in-memory like memcached Use when you need: To manage rapidly changing data C P Written in: C/C++ License: BSD Type: Key-value Protocol: Telnet-like Users: Community Support: Good Github, Craigslist, Stackoverflow, Learning Curve: Low Disqus, The Guardian Uk etc. Thursday, February 3, 2011
  • 55. MongoDB NewsCred Selling point: A Best of NoSQL and RDBMS Use when you need: Dynamic queries and indexing on a Big DB C P Written in: C++ License: AGPL Type: Document Protocol: Custom, binary (BSON) Users: Community Support: Great NewsCred, Foursquare, Github, Sourceforge, Learning Curve: Low The New York Times, Etsy, Shutterfly etc. Thursday, February 3, 2011
  • 56. Part 4 Understanding MongoDB NewsCred Thursday, February 3, 2011
  • 57. Understanding MongoDB NewsCred • Database == Database • Table == Collection • Row == Document Thursday, February 3, 2011
  • 58. Understanding MongoDB NewsCred • Mongo Shell Thursday, February 3, 2011
  • 59. Understanding MongoDB NewsCred • INSERT Thursday, February 3, 2011
  • 60. Understanding MongoDB NewsCred • SELECT SELECT * FROM users WHERE X = 3 AND Y = 'abc'; db.users.find({X:3, Y: ”abc”}) SELECT * FROM users WHERE X = 3 AND Y = 'abc' ORDER BY X ASC; db.users.find({X:3, Y: ”abc”}).sort({X:1}) SELECT username, email FROM users WHERE X = 3 AND Y = 'abc'; db.users.find({X:3, Y: ”abc”}, {username:true, email:true}) Thursday, February 3, 2011
  • 61. Understanding MongoDB NewsCred • UPDATE db.collection.update(criteria, modifier, upsert, multi) criteria : Query which selects the record(s) to update modifier : $set, $inc, $unset, $push, $pop... upsert : Insert if not exists, update otherwise multi : Update multiple docs matching the criteria UPDATE users SET X = 4, Y = 'abc' WHERE username = 'joegunchy'; db.users.update({username:”joegunchy”}, {$set: {X:4, Y:'abc'}}, true, true) Thursday, February 3, 2011
  • 62. Understanding MongoDB NewsCred • DELETE db.articles.remove({}) /*remove all*/ db.articles.remove({tag:'sql'}) /*remove all articles with tag = 'sql'*/ db.articles.remove({tag:'sql'}) /*block other ops while removing*/ Thursday, February 3, 2011
  • 63. Understanding MongoDB NewsCred • AGGREGATION > db.users.count() 42 > db.addresses.distinct('zipcode', {'city':'Dhaka'}) [1000, 1100, 1204, 1205....] Thursday, February 3, 2011
  • 64. Understanding MongoDB NewsCred • Map/Reduce • Algorithm introduced by Google for processing large datasets on clusters • MongoDB uses it for: • Aggregation (Group By, Avg, Sum etc.) • Batch processing jobs Thursday, February 3, 2011
  • 65. Understanding MongoDB NewsCred • Map/Reduce Thursday, February 3, 2011
  • 66. Understanding MongoDB NewsCred • Map/Reduce Example Document We want to do something like... Thursday, February 3, 2011
  • 67. Understanding MongoDB NewsCred • Map/Reduce Example Map Reduce Thursday, February 3, 2011
  • 68. Understanding MongoDB NewsCred • Map/Reduce Example Execute Thursday, February 3, 2011
  • 69. Understanding MongoDB NewsCred • Map/Reduce Example Result Thursday, February 3, 2011
  • 70. Part 5 Building a MongoDB App NewsCred Thursday, February 3, 2011
  • 71. Part 6 Scaling with MongoDB NewsCred Thursday, February 3, 2011
  • 72. Scaling with MongoDB NewsCred • Scaling is a challenge • No silver bullet • Strategies • Replication • Replica Sets • Auto-sharding Thursday, February 3, 2011
  • 73. Scaling with MongoDB NewsCred Replication Master Slave Slave Slave Thursday, February 3, 2011
  • 74. Scaling with MongoDB NewsCred Replica Sets Secondary User Passive Primary Thursday, February 3, 2011
  • 75. Scaling with MongoDB NewsCred Replica Sets: Election Synced,3ms,ago C Priority,1 A Synced,1ms,ago E Priority,1 Priority 1 B D Priority,0 Thursday, February 3, 2011
  • 76. Scaling with MongoDB NewsCred • Replica Sets: Network Partition • Election Process initiated • When a node can’t reach primary • When primary can’t reach majority of nodes in set • New primary is elected by majority of nodes in set • Node with the most recent data gets priority • Arbiter node used to break ties Thursday, February 3, 2011
  • 77. Scaling with MongoDB NewsCred • Auto-sharding • Cluster handles sharding data and rebalancing automatically • No administrative headaches of manual sharding • Application is oblivious to existence of shards Thursday, February 3, 2011
  • 78. Scaling with MongoDB NewsCred Auto-sharding Big$Collection Thursday, February 3, 2011
  • 79. Scaling with MongoDB NewsCred Auto-sharding User Router) Thursday, February 3, 2011
  • 80. Scaling with MongoDB NewsCred Auto-sharding • Connect to a single server • db = connect(‘localhost:27017’) • Connect to a router • db = connect(‘localhost:27017’) User Mongo)DB Thursday, February 3, 2011
  • 81. Scaling with MongoDB NewsCred • When to shard? • Running out of disk space • Write intensive • Need to keep large chunk of data in memory • Don’t start out with a sharded collection! • Shard “if and when” you need to Thursday, February 3, 2011
  • 82. Scaling with MongoDB NewsCred • Choosing a Shard Key • Incremental • Example: timestamps i.e. ‘created_at’ • Queries on shard key is highly efficient • Random • Example: ‘username’ • Writes are distributed across multiple shards Thursday, February 3, 2011
  • 83. Scaling with MongoDB NewsCred Sharding + Replica Sets User Router P P S S S S Thursday, February 3, 2011
  • 84. Questions? NewsCred Iraj Islam iraj@newscred.com, @irajislam Rubayeet Islam rubayeet@newscred.com, @rubayeet Nurul Ferdous nurul@newscred.com, @ferdous Thursday, February 3, 2011