SlideShare uma empresa Scribd logo
1 de 66
Baixar para ler offline
What Exactly is NoSQL?
Document databases, Column-family stores, Key-value pairs, more


Shashank Tiwari
blog: shanky.org | twitter: @tshanky
st@treasuryofideas.com
NoSQL?
NoSQL : Various Shapes and Sizes

• Document Databases


• Column-family Oriented Stores


• Key/value Data stores


• XML Databases


• Object Databases


• Graph Databases
Document Databases




• mostly MongoDB, little CouchDB
What is a document db?

• One that stores documents


• Popular options:


  • MongoDB -- C++


  • CouchDB -- Erlang


  • Also Amazon’s SimpleDB


• ...what exactly is a document?
In the real world




• (Source: http://guide.couchdb.org/draft/why.html)
In terms of JSON

• {name: “John Doe”,


• zip: 10001}
What about db schema?

• Schema-less


• Different documents could be stored in a single collection
Data types: MongoDB

• Essential JSON types:


• string


• integer


• boolean


• double
Data types: MongoDB (...cont)

• Additional JSON types


• null, array and object


• BSON types -- binary encoded serialization of JSON like documents


   • date, binary data, object id, regular expression and code


   • (Reference: bsonspec.org)
A BSON example: object id
Data types: CouchDB

• Everything JSON


• Large objects: attachments
CRUD operations for documents

• Create


• Read


• Update


• Delete
MongoDB: Create Document

• use mydb


• w = {name: “John Doe”, zip: 10001};


• db.location.save(w);
Create db and collection

• Lazily created


• Implicitly created


• use mydb


• db.collection.save(w)
MongoDB: Read Document

• db.location.find({zip: 10001});


• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe",
  "zip" : 10001 }
MongoDB: Read Document (...cont)

• db.location.find({name: "John Doe"});


• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe",
  "zip" : 10001 }
MongoDB: Update Document

• Atomic operations on single documents


• db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
Indexes
(explain)

• db.ratings.find().explain();
Indexes
(explain output)

•{


•    "cursor" : "BasicCursor",


•    "nscanned" : 1000209,


•    "nscannedObjects" : 1000209,


•    "n" : 1000209,


•    "millis" : 1549,


•    "indexBounds" : {
Indexes
(ensure index)

• db.ratings.ensureIndex({ movie_id:1 });


• db.ratings.ensureIndex({ movie_id:-1 });
Indexes
(explain when index used)

•{


•    "cursor" : "BtreeCursor movie_id_1",


•    "nscanned" : 2077,


•    "nscannedObjects" : 2077,


•    "n" : 2077,


•    "millis" : 2,


•    "indexBounds" : {
Indexes
(get indexes)

• db.ratings.getIndexes();
Sorted Ordered Column-family Datastores

• Sorted


• Ordered


• Distributed


• Map
Essential schema
Multi-dimensional View
A Map/Hash View

•{


• "row_key_1" : { "name" : {


•     "first_name" : "Jolly", "last_name" : "Goodfellow"


•     } } },


•    "location" : { "zip": "94301" },
Architectural View (HBase)
The Persistence Mechanism
The underlying file format
Model Wrappers (The GAE Way)

• Python


  • Model, Expando, PolyModel


• Java


  • JDO, JPA
HBase Data Access

• Thrift + Avro


• Java API -- HTable, HBaseAdmin


• Hive (SQL like)


• MapReduce -- sink and/or source
Transactions

• Atomic row level


• GAE Entity Groups
Indexes

• Row ordered


• Secondary indexes


• GAE style multiple indexes


  • thinking from output to query
Use cases

• Many Google’s Products


• Facebook Messaging


• StumbleUpon


  • Open TSDB


• Mahalo, Ning, Meetup, Twitter, Yahoo!


• Lily -- open source CMS built on HBase & Solr
Brewer’s CAP Theorem




• http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf


• http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
Distributed Systems & Consistency (case: success)
Distributed Systems & Consistency (case: failure)
Binding by Transactions
Consistency Spectrum
Inconsistency Window
RWN Math

• R – Number of nodes that are read from.


• W – Number of nodes that are written to.


• N – Total number of nodes in the cluster.




• In general: R < N and W < N for higher availability
R+W>N

• Easy to determine consistent state


• R + W = 2N


  • absolutely consistent, can provide ACID gaurantee


• In all cases when R + W > N there is some overlap between read and write
  nodes.
R = 1, W = N

• more reads than writes


•W=N


  • 1 node failure = entire system unavailable
R = N, W =1

•W=N


 • Chance of data inconsistency quite high


•R=N


 • Read only possible when all nodes in the cluster are available
R = W = ceiling ((N + 1)/2)
Effective quorum for eventual consistency
Eventual consistency variants

• Causal consistency -- A writes and informs B then B always sees updated
  value


• Read-your-writes-consistency -- A writes a new value and never see the old
  one


• Session consistency -- read-your-writes-consistency within a client session


• Monotonic read consistency -- once seen a new value, never return previous
  value


• Monotonic write consistency -- serialize writes by the same process
Dynamo Techniques

• Consistent Hashing (Incremental scalability)


• Vector clocks (high availability for writes)


• Sloppy quorum and hinted handoff (recover from temporary failure)


• Gossip based membership protocol (periodic, pair wise, inter-process
  interactions, low reliability, random peer selection)


• Anti-entropy using Merkle trees


• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-
  dynamo-sosp2007.pdf)
Consistent Hashing
Vector clocks (a trivial example)

• 4 hackers: Joe, Hillary, Eric and Ajay decide to meetup


• Joe -- suggests Palo Alto (t0)


• Hillary and Eric -- decide to meet in Mountain View (t1)


• Eric and Ajay -- decide to meet in Los Altos (t2)


• Joe mails: PA, Hillary responds: Mtn View, Ajay responds: Los Altos (t3)


   • both Hillary and Ajay say: Eric knows
Vector clocks (how it works)

• Venue : Palo Alto


• Vector Clock: Joe (ver 1)


• Venue: Mountain View


• Vector Clock: Joe (ver 1), Hillary (ver 1), Eric (ver 1)


• Venue: Los Altos


• Vector Clock: Joe (ver 1), Ajay (ver 1), Eric (ver 1)
Vector clock (resolution)

• Venue : Palo Alto


• Vector Clock: Joe (ver 1)


• Venue: Mountain View


• Vector Clock: Joe (ver 1), Hillary (ver 1), Ajay (ver 0), Eric (ver 2)


• Venue: Los Altos


• Vector Clock: Joe (ver 1), Hillary (ver 0), Ajay (ver 1), Eric (ver 1)
CouchDB MVCC Style




• (Source: http://guide.couchdb.org/draft/consistency.html)
Key/value Stores

• Memcached


• Membase


• Redis


• Tokyo Cabinet


• Kyoto Cabinet


• Berkeley DB
Redis -- a key-value data structure server

• open source key-value store


• a data structure server


   • values in key-value pairs can be strings, hashes, lists, sets, sorted sets
Where to find it?

• redis.io


• download a copy from http://redis.io/download
Who is building it?

• Core developers


  • Salvatore Sanfilippo, twitter: @antirez


  • Pieter Noordhuis, twitter: @pnoordhuis


• Main sponsor


  • VMware
Written in

• ANSI C


  • runs on POSIX compliant systems with no external dependencies
How can it be used?

• as an in memory data store


  • with option to persist to disk


• in standalone mode or as a master-slave replicated set


  • Redis cluster -- coming soon! (June 2011)


• as cache
Redis Architecture
Download and install

• curl -O http://redis.googlecode.com/files/redis-2.2.0-rc4.tar.gz


      • (just a 436kb download)


• tar zxvf redis-2.2.0-rc4.tar.gz


• cd redis-2.2.0-rc4


• make & make install (installs in /usr/local/bin)


• make test (to be sure you install it correctly)
Start the redis-server

• /usr/local/bin/redis-server




• ...Server started, Redis version 2.1.12


• ...The server is now ready to accept connections on port 6379
Connect with redis-cli

• /usr/local/bin/redis-cli




• redis> set key1 val1


• OK


• redis> get key1


• "val1"
String key-value pairs

• like memcached


   • with persistence


• key and value -- binary-safe strings
Binary-safe?

• redis> set "a key _" "another value"


• OK


• redis> get "a key _"


• "another value"
Questions?




• blog: shanky.org | twitter: @tshanky


• st@treasuryofideas.com

Mais conteúdo relacionado

Mais procurados

Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Federico Panini
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
The ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsThe ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsItamar
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Lucidworks
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchclintongormley
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Philips Kokoh Prasetyo
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesItamar
 
Webinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDBWebinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDBMongoDB
 
Real time fulltext search with sphinx
Real time fulltext search with sphinxReal time fulltext search with sphinx
Real time fulltext search with sphinxAdrian Nuta
 
Elastic Search
Elastic SearchElastic Search
Elastic SearchNavule Rao
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Lucidworks
 
SQL for Elasticsearch
SQL for ElasticsearchSQL for Elasticsearch
SQL for ElasticsearchJodok Batlogg
 
Efficient cluster resource management by using Cook and Mesos / Li Jin (Two S...
Efficient cluster resource management by using Cook and Mesos / Li Jin (Two S...Efficient cluster resource management by using Cook and Mesos / Li Jin (Two S...
Efficient cluster resource management by using Cook and Mesos / Li Jin (Two S...Ontico
 
Using Sphinx for Search in PHP
Using Sphinx for Search in PHPUsing Sphinx for Search in PHP
Using Sphinx for Search in PHPMike Lively
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture ForumChristopher Spring
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in actionCodemotion
 

Mais procurados (20)

Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
The ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsThe ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch plugins
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Redis basics
Redis basicsRedis basics
Redis basics
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearch
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
 
Redis introduction
Redis introductionRedis introduction
Redis introduction
 
Webinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDBWebinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDB
 
Real time fulltext search with sphinx
Real time fulltext search with sphinxReal time fulltext search with sphinx
Real time fulltext search with sphinx
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
SQL for Elasticsearch
SQL for ElasticsearchSQL for Elasticsearch
SQL for Elasticsearch
 
Efficient cluster resource management by using Cook and Mesos / Li Jin (Two S...
Efficient cluster resource management by using Cook and Mesos / Li Jin (Two S...Efficient cluster resource management by using Cook and Mesos / Li Jin (Two S...
Efficient cluster resource management by using Cook and Mesos / Li Jin (Two S...
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Using Sphinx for Search in PHP
Using Sphinx for Search in PHPUsing Sphinx for Search in PHP
Using Sphinx for Search in PHP
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 

Semelhante a SDEC2011 NoSQL concepts and models

SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingKorea Sdec
 
No SQL : Which way to go? Presented at DDDMelbourne 2015
No SQL : Which way to go?  Presented at DDDMelbourne 2015No SQL : Which way to go?  Presented at DDDMelbourne 2015
No SQL : Which way to go? Presented at DDDMelbourne 2015Himanshu Desai
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBSean Laurent
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewAntonio Pintus
 
DSpace Under the Hood
DSpace Under the HoodDSpace Under the Hood
DSpace Under the HoodDuraSpace
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
An introduction to Pincaster
An introduction to PincasterAn introduction to Pincaster
An introduction to PincasterFrank Denis
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLRichard Schneeman
 
Object Relational Database Management System
Object Relational Database Management SystemObject Relational Database Management System
Object Relational Database Management SystemAmar Myana
 

Semelhante a SDEC2011 NoSQL concepts and models (20)

SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
 
No SQL : Which way to go? Presented at DDDMelbourne 2015
No SQL : Which way to go?  Presented at DDDMelbourne 2015No SQL : Which way to go?  Presented at DDDMelbourne 2015
No SQL : Which way to go? Presented at DDDMelbourne 2015
 
NoSQL, which way to go?
NoSQL, which way to go?NoSQL, which way to go?
NoSQL, which way to go?
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overview
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
DSpace Under the Hood
DSpace Under the HoodDSpace Under the Hood
DSpace Under the Hood
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
MongoDB
MongoDBMongoDB
MongoDB
 
KeyValue Stores
KeyValue StoresKeyValue Stores
KeyValue Stores
 
An introduction to Pincaster
An introduction to PincasterAn introduction to Pincaster
An introduction to Pincaster
 
Drop acid
Drop acidDrop acid
Drop acid
 
Mongodb my
Mongodb myMongodb my
Mongodb my
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
MongoDB for Genealogy
MongoDB for GenealogyMongoDB for Genealogy
MongoDB for Genealogy
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
Object Relational Database Management System
Object Relational Database Management SystemObject Relational Database Management System
Object Relational Database Management System
 

Mais de Korea Sdec

SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerKorea Sdec
 
SDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionSDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionKorea Sdec
 
SDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopSDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopKorea Sdec
 
Sdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoopSdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoopKorea Sdec
 
SDEC2011 Essentials of Pig
SDEC2011 Essentials of PigSDEC2011 Essentials of Pig
SDEC2011 Essentials of PigKorea Sdec
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutKorea Sdec
 
SDEC2011 Essentials of Hive
SDEC2011 Essentials of HiveSDEC2011 Essentials of Hive
SDEC2011 Essentials of HiveKorea Sdec
 
Sdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopSdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopKorea Sdec
 
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveSDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveKorea Sdec
 
SDEC2011 Rapidant
SDEC2011 RapidantSDEC2011 Rapidant
SDEC2011 RapidantKorea Sdec
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACCKorea Sdec
 
SDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesSDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesKorea Sdec
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedKorea Sdec
 
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudSDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudKorea Sdec
 

Mais de Korea Sdec (15)

SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuer
 
SDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionSDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestion
 
SDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopSDEC2011 Introducing Hadoop
SDEC2011 Introducing Hadoop
 
Sdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoopSdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoop
 
SDEC2011 Essentials of Pig
SDEC2011 Essentials of PigSDEC2011 Essentials of Pig
SDEC2011 Essentials of Pig
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
SDEC2011 Essentials of Hive
SDEC2011 Essentials of HiveSDEC2011 Essentials of Hive
SDEC2011 Essentials of Hive
 
Sdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopSdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
 
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveSDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
 
SDEC2011 Rapidant
SDEC2011 RapidantSDEC2011 Rapidant
SDEC2011 Rapidant
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACC
 
SDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesSDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & Experiences
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
 
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudSDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloud
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 

Último (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

SDEC2011 NoSQL concepts and models

  • 1. What Exactly is NoSQL? Document databases, Column-family stores, Key-value pairs, more Shashank Tiwari blog: shanky.org | twitter: @tshanky st@treasuryofideas.com
  • 3. NoSQL : Various Shapes and Sizes • Document Databases • Column-family Oriented Stores • Key/value Data stores • XML Databases • Object Databases • Graph Databases
  • 4. Document Databases • mostly MongoDB, little CouchDB
  • 5. What is a document db? • One that stores documents • Popular options: • MongoDB -- C++ • CouchDB -- Erlang • Also Amazon’s SimpleDB • ...what exactly is a document?
  • 6. In the real world • (Source: http://guide.couchdb.org/draft/why.html)
  • 7. In terms of JSON • {name: “John Doe”, • zip: 10001}
  • 8. What about db schema? • Schema-less • Different documents could be stored in a single collection
  • 9. Data types: MongoDB • Essential JSON types: • string • integer • boolean • double
  • 10. Data types: MongoDB (...cont) • Additional JSON types • null, array and object • BSON types -- binary encoded serialization of JSON like documents • date, binary data, object id, regular expression and code • (Reference: bsonspec.org)
  • 11. A BSON example: object id
  • 12. Data types: CouchDB • Everything JSON • Large objects: attachments
  • 13. CRUD operations for documents • Create • Read • Update • Delete
  • 14. MongoDB: Create Document • use mydb • w = {name: “John Doe”, zip: 10001}; • db.location.save(w);
  • 15. Create db and collection • Lazily created • Implicitly created • use mydb • db.collection.save(w)
  • 16. MongoDB: Read Document • db.location.find({zip: 10001}); • { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 17. MongoDB: Read Document (...cont) • db.location.find({name: "John Doe"}); • { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 18. MongoDB: Update Document • Atomic operations on single documents • db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
  • 20. Indexes (explain output) •{ • "cursor" : "BasicCursor", • "nscanned" : 1000209, • "nscannedObjects" : 1000209, • "n" : 1000209, • "millis" : 1549, • "indexBounds" : {
  • 21. Indexes (ensure index) • db.ratings.ensureIndex({ movie_id:1 }); • db.ratings.ensureIndex({ movie_id:-1 });
  • 22. Indexes (explain when index used) •{ • "cursor" : "BtreeCursor movie_id_1", • "nscanned" : 2077, • "nscannedObjects" : 2077, • "n" : 2077, • "millis" : 2, • "indexBounds" : {
  • 24. Sorted Ordered Column-family Datastores • Sorted • Ordered • Distributed • Map
  • 27. A Map/Hash View •{ • "row_key_1" : { "name" : { • "first_name" : "Jolly", "last_name" : "Goodfellow" • } } }, • "location" : { "zip": "94301" },
  • 31. Model Wrappers (The GAE Way) • Python • Model, Expando, PolyModel • Java • JDO, JPA
  • 32. HBase Data Access • Thrift + Avro • Java API -- HTable, HBaseAdmin • Hive (SQL like) • MapReduce -- sink and/or source
  • 33. Transactions • Atomic row level • GAE Entity Groups
  • 34. Indexes • Row ordered • Secondary indexes • GAE style multiple indexes • thinking from output to query
  • 35. Use cases • Many Google’s Products • Facebook Messaging • StumbleUpon • Open TSDB • Mahalo, Ning, Meetup, Twitter, Yahoo! • Lily -- open source CMS built on HBase & Solr
  • 36. Brewer’s CAP Theorem • http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf • http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
  • 37. Distributed Systems & Consistency (case: success)
  • 38. Distributed Systems & Consistency (case: failure)
  • 42. RWN Math • R – Number of nodes that are read from. • W – Number of nodes that are written to. • N – Total number of nodes in the cluster. • In general: R < N and W < N for higher availability
  • 43. R+W>N • Easy to determine consistent state • R + W = 2N • absolutely consistent, can provide ACID gaurantee • In all cases when R + W > N there is some overlap between read and write nodes.
  • 44. R = 1, W = N • more reads than writes •W=N • 1 node failure = entire system unavailable
  • 45. R = N, W =1 •W=N • Chance of data inconsistency quite high •R=N • Read only possible when all nodes in the cluster are available
  • 46. R = W = ceiling ((N + 1)/2) Effective quorum for eventual consistency
  • 47. Eventual consistency variants • Causal consistency -- A writes and informs B then B always sees updated value • Read-your-writes-consistency -- A writes a new value and never see the old one • Session consistency -- read-your-writes-consistency within a client session • Monotonic read consistency -- once seen a new value, never return previous value • Monotonic write consistency -- serialize writes by the same process
  • 48. Dynamo Techniques • Consistent Hashing (Incremental scalability) • Vector clocks (high availability for writes) • Sloppy quorum and hinted handoff (recover from temporary failure) • Gossip based membership protocol (periodic, pair wise, inter-process interactions, low reliability, random peer selection) • Anti-entropy using Merkle trees • (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon- dynamo-sosp2007.pdf)
  • 50. Vector clocks (a trivial example) • 4 hackers: Joe, Hillary, Eric and Ajay decide to meetup • Joe -- suggests Palo Alto (t0) • Hillary and Eric -- decide to meet in Mountain View (t1) • Eric and Ajay -- decide to meet in Los Altos (t2) • Joe mails: PA, Hillary responds: Mtn View, Ajay responds: Los Altos (t3) • both Hillary and Ajay say: Eric knows
  • 51. Vector clocks (how it works) • Venue : Palo Alto • Vector Clock: Joe (ver 1) • Venue: Mountain View • Vector Clock: Joe (ver 1), Hillary (ver 1), Eric (ver 1) • Venue: Los Altos • Vector Clock: Joe (ver 1), Ajay (ver 1), Eric (ver 1)
  • 52. Vector clock (resolution) • Venue : Palo Alto • Vector Clock: Joe (ver 1) • Venue: Mountain View • Vector Clock: Joe (ver 1), Hillary (ver 1), Ajay (ver 0), Eric (ver 2) • Venue: Los Altos • Vector Clock: Joe (ver 1), Hillary (ver 0), Ajay (ver 1), Eric (ver 1)
  • 53. CouchDB MVCC Style • (Source: http://guide.couchdb.org/draft/consistency.html)
  • 54. Key/value Stores • Memcached • Membase • Redis • Tokyo Cabinet • Kyoto Cabinet • Berkeley DB
  • 55. Redis -- a key-value data structure server • open source key-value store • a data structure server • values in key-value pairs can be strings, hashes, lists, sets, sorted sets
  • 56. Where to find it? • redis.io • download a copy from http://redis.io/download
  • 57. Who is building it? • Core developers • Salvatore Sanfilippo, twitter: @antirez • Pieter Noordhuis, twitter: @pnoordhuis • Main sponsor • VMware
  • 58. Written in • ANSI C • runs on POSIX compliant systems with no external dependencies
  • 59. How can it be used? • as an in memory data store • with option to persist to disk • in standalone mode or as a master-slave replicated set • Redis cluster -- coming soon! (June 2011) • as cache
  • 61. Download and install • curl -O http://redis.googlecode.com/files/redis-2.2.0-rc4.tar.gz • (just a 436kb download) • tar zxvf redis-2.2.0-rc4.tar.gz • cd redis-2.2.0-rc4 • make & make install (installs in /usr/local/bin) • make test (to be sure you install it correctly)
  • 62. Start the redis-server • /usr/local/bin/redis-server • ...Server started, Redis version 2.1.12 • ...The server is now ready to accept connections on port 6379
  • 63. Connect with redis-cli • /usr/local/bin/redis-cli • redis> set key1 val1 • OK • redis> get key1 • "val1"
  • 64. String key-value pairs • like memcached • with persistence • key and value -- binary-safe strings
  • 65. Binary-safe? • redis> set "a key _" "another value" • OK • redis> get "a key _" • "another value"
  • 66. Questions? • blog: shanky.org | twitter: @tshanky • st@treasuryofideas.com