SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
CouchDB

King Chung Huang
Information Technologies
   University of Calgary
Relax
Document-oriented Databases
Today’s Talk   CouchDB Overview
               Demonstrations
Document-oriented Databases
Databases
Flat
Hierarchical
   Network
 Relational Databases
Post-Relational Databases
     Dimensional
            Object
Document-oriented
Document-oriented Databases
    Comparable to documents in the real world
•

    Records are stored as schema-less documents
•
         Each document is uniquely named
     ■



         Documents are the primary unit of storage
     ■



    Structures are not explicitly defined
•
         No tables with uniform, pre-defined fields
     ■



         Every document can have varying fields of different types
     ■



    Documents are self contained
•
         Data is not decomposed into tables with relations
     ■



         Documents contain the context needed to understand them
     ■
Document-oriented Databases
    Examples
•
         Lotus Notes
     ■



         Amazon SimpleDB
     ■



         CouchDB
     ■




    Key-Value Stores
•
         Amazon S3
     ■



            Dynamo: Amazon’s Highly Available Key-value Store, DeCandia, et al., 2007
          ■



         Facebook Cassandra
     ■



            Recently accepted as an Apache incubation project
          ■



         Google BigTable
     ■



            Bigtable: A Distributed Storage System for Structured Data, Chang, et al.,
          ■


            2006
CouchDB Overview
Document database server
                   REST API
What is CouchDB?   JSON documents
                   Views with MapReduce
                   Highly Scalable
Document Database Server
    Implemented in Erlang
•
         Ericsson Language
     ■



         Highly concurrent, functional programming language
     ■



    Designed with modern web applications in mind
•

    Atomic Consistent Isolated Durable (ACID)
•

    “Crash-only” design
•

    Supports external handlers
•
         Change notification
     ■



         Custom processing
     ■



•
REST HTTP API
    Representational State Transfer
•
         A set of principles about how resources are defined and addressed
     ■




    World Wide Web (HTTP) is RESTful
•
         Uniform interface for accessing resources
     ■



         Resources identified by URI
     ■



         Actions transmitted in HTTP methods
     ■



         Status communicated in status codes
     ■
REST HTTP API
CRUD
  Create, Read, Update, and Delete
•
• In HTTP
        POST /some/resource/id
    ■



        GET /some/resource/id
    ■



        PUT /some/resource/id
    ■



        DELETE /some/resource/id
    ■
JSON Documents
    JavaScript Object Notation
•
         Considered language-independent
     ■




    CouchDB stored XML documents before version 0.8
•
         Suitable if content is already in XML
     ■



         Human readable, but can be onerous to type
     ■



         Markup language, requires transformation from/to data structures
     ■



    Represents primitive data types and structures
•
         Strings, numbers, booleans
     ■



         Arrays, dictionaries
     ■



         Null
     ■




    Documents can have attachments
•
JSON Documents
Example
{
                   _id:   “post1”,
                  _rev:   “123456”,
                 title:   “A Blog Post”,
                  tags:   [“blue”, “glue”],
             post_date:   1239910768,
                  body:   “Once upon a time…”,
          is_published:   true
}
JSON Documents
Example
{
                   _id:   “post1”,
                  _rev:   “123456”,
                 title:   “A Blog Post”,
                  tags:   [“blue”, “glue”],
             post_date:   1239910768,
                  body:   “Once upon a time…”,
          is_published:   true
}
JSON Documents
Example
{
                   _id:   “post1”,
                  _rev:   “123456”,
                 title:   “A Blog Post”,
                  tags:   [“blue”, “glue”],
             post_date:   1239910768,
                  body:   “Once upon a time…”,
          is_published:   true
}
JSON Documents
Example
{
                   _id:   “post1”,
                  _rev:   “123456”,
                 title:   “A Blog Post”,
                  tags:   [“blue”, “glue”],
             post_date:   1239910768,
                  body:   “Once upon a time…”,
          is_published:   true
}
JSON Documents
Example
{
                  _id:    “post1”,
                 _rev:    “123456”,
                          …
          _attachments:   {
                            “picture.png”: {
                               stub: true,
                               content_type: “image/png”,
                               length: 384
                            }
                          }
}
Views
    Used to sort and filter through data
•

    Lazily evaluated, highly efficient
•
         Similar to indexing in relational databases
     ■




    Defined in design documents
•
         Documents named _design/…
     ■



    Consist of map and reduce functions
•
         Language independent
     ■



         JavaScript supported by default
     ■



            Mozilla Spidermonkey included
          ■
Data Processing with MapReduce
    Programming model for processing and generating large data sets
•

    Related, but not equivalent to map and reduce operations in
•
    functional languages
    Take and produce key/value pairs with map and reduce functions
•

    Map functions
•
         Take input key/value pairs and produce an intermediate set of key/value pairs
     ■



    Reduce functions
•
         Take intermediate key and set of values for the key, and merges them into a
     ■


         possibly smaller set of values
    MapReduce: Simplified Data Processing on Large Clusters
•
    Jeff Dean, Sanjay Ghemawat, Google Inc.
Data Processing with MapReduce
Example
{
                   _id:   “post1”,
                  _rev:   “123456”,
                 title:   “A Blog Post”,
                  tags:   [“blue”, “glue”],
             post_date:   1239910768,
                  body:   “Once upon a time…”,
          is_published:   true
}
Data Processing with MapReduce
Example
“post1” = {
                   _id:   “post1”,
                  _rev:   “123456”,
                 title:   “A Blog Post”,
                  tags:   [“blue”, “glue”],
             post_date:   1239910768,
                  body:   “Once upon a time…”,
          is_published:   true
}
Data Processing with MapReduce
Example
“post1” = {
              title:   “A Blog Post”,
               tags:   [“blue”, “glue”],
          post_date:   1239910768,
               body:   “Once upon a time…”
}
Data Processing with MapReduce
Emit Posts by post_date
“post1” = {
                title:    “A Blog Post”,
                 tags:    [“blue”, “glue”],
            post_date:    1239910768,
                 body:    “Once upon a time…”
}
1239910768 = {
                 title:   “A Blog Post”,
                  tags:   [“blue”, “glue”],
             post_date:   1239910768,
                  body:   “Once upon a time…”
}
Data Processing with MapReduce
Emit Posts by post_date

       1208456184 {title: “A bloody long time ago”, …}

       1215421546 {title: “A blue moon ago”, …}

       1222654641 {title: “Just Yesterday”, …}

       1239910768 {title: “A Blog Post”, …}

       1246816518 {title: “That was Then”, …}

       1251687980 {title: “This is Now”, …}

       1264836981 {title: “When Will Then Be Now?”, …}
Data Processing with MapReduce
Emit Posts by tag
“post1” = {
                 title:      “A Blog Post”,
                  tags:      [“blue”, “glue”],
             post_date:      1239910768,
                  body:      “Once upon a time…”
}

“blue” = {          title:   “A Blog Post”, … }
“glue” = {          title:   “A Blog Post”, … }
Data Processing with MapReduce
Emit Posts by tag

               blue {title: “Just Yesterday”, …}

               blue {title: “A Blog Post”, …}

               clue {title: “Just Yesterday”, …}

               flue {title: “When Will Then Be Now?”, …}

               flue {title: “This is Now”, …}

               glue {title: “A Blog Post”, …}

             wazoo {title: “That was Then”, …}
Data Processing with MapReduce
Emit Posts by tag, Reduced
                   {title: “Just Yesterday”, …},
              blue
                   {title: “A Blog Post”, …}

              clue {title: “Just Yesterday”, …}

                   {title: “When Will Then Be Now?”, …},
              flue
                   {title: “This is Now”, …}

              glue {title: “A Blog Post”, …}


             wazoo {title: “That was Then”, …}
Scalability
    Incremental MapReduce
•

    Multiversion Concurrency Control (MVCC)
•
         Achieves serializability through multiversioning instead of locking
     ■



         Eliminates waits to access objects
     ■



         Updates create new documents
     ■



         Tradeoff point: no waits, increased data storage
     ■



    Incremental Distributed Replication
•

    Eventual Consistency
•
         Changes eventually propagate through distributed systems
     ■



         Tradeoff point: increase availability and tolerancy, decreased freshness
     ■
Demonstrations

Mais conteúdo relacionado

Mais procurados

MongoDB @ Frankfurt NoSql User Group
MongoDB @  Frankfurt NoSql User GroupMongoDB @  Frankfurt NoSql User Group
MongoDB @ Frankfurt NoSql User Group
Chris Harris
 
MongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL DatabaseMongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL Database
Ruben Inoto Soto
 
Python-CouchDB Training at PyCon PL 2012
Python-CouchDB Training at PyCon PL 2012Python-CouchDB Training at PyCon PL 2012
Python-CouchDB Training at PyCon PL 2012
Stefan Kögl
 
Mongo db – document oriented database
Mongo db – document oriented databaseMongo db – document oriented database
Mongo db – document oriented database
Wojciech Sznapka
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Justin Smestad
 

Mais procurados (20)

Grails 2.0 Update
Grails 2.0 UpdateGrails 2.0 Update
Grails 2.0 Update
 
MongoDB @ Frankfurt NoSql User Group
MongoDB @  Frankfurt NoSql User GroupMongoDB @  Frankfurt NoSql User Group
MongoDB @ Frankfurt NoSql User Group
 
MongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL DatabaseMongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL Database
 
Python-CouchDB Training at PyCon PL 2012
Python-CouchDB Training at PyCon PL 2012Python-CouchDB Training at PyCon PL 2012
Python-CouchDB Training at PyCon PL 2012
 
Mongo db – document oriented database
Mongo db – document oriented databaseMongo db – document oriented database
Mongo db – document oriented database
 
Cassandra
CassandraCassandra
Cassandra
 
Introduction to JSON & AJAX
Introduction to JSON & AJAXIntroduction to JSON & AJAX
Introduction to JSON & AJAX
 
Scala with mongodb
Scala with mongodbScala with mongodb
Scala with mongodb
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know
 
Introduction to couchdb
Introduction to couchdbIntroduction to couchdb
Introduction to couchdb
 
Jaxitalia09 Spring Best Practices
Jaxitalia09 Spring Best PracticesJaxitalia09 Spring Best Practices
Jaxitalia09 Spring Best Practices
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
 
Scaling Databases with DBIx::Router
Scaling Databases with DBIx::RouterScaling Databases with DBIx::Router
Scaling Databases with DBIx::Router
 
Scala with MongoDB
Scala with MongoDBScala with MongoDB
Scala with MongoDB
 
MySQL Rises with JSON Support
MySQL Rises with JSON SupportMySQL Rises with JSON Support
MySQL Rises with JSON Support
 
CQL3 in depth
CQL3 in depthCQL3 in depth
CQL3 in depth
 
OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchApps
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
Mongo DB 102
Mongo DB 102Mongo DB 102
Mongo DB 102
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 

Destaque

Destaque (7)

Cloud Computing in Practice
Cloud Computing in PracticeCloud Computing in Practice
Cloud Computing in Practice
 
Modeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key PatternsModeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key Patterns
 
Harvesting The Web With Cloud Computing
Harvesting The Web With Cloud ComputingHarvesting The Web With Cloud Computing
Harvesting The Web With Cloud Computing
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 

Semelhante a CouchDB

Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
MongoDB
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
MongoDB APAC
 
2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new
MongoDB
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
 

Semelhante a CouchDB (20)

Ruby sittin' on the Couch
Ruby sittin' on the CouchRuby sittin' on the Couch
Ruby sittin' on the Couch
 
MongoDB - Ruby document store that doesn't rhyme with ouch
MongoDB - Ruby document store that doesn't rhyme with ouchMongoDB - Ruby document store that doesn't rhyme with ouch
MongoDB - Ruby document store that doesn't rhyme with ouch
 
Einführung in MongoDB
Einführung in MongoDBEinführung in MongoDB
Einführung in MongoDB
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
Visualizing Web Data Query Results
Visualizing Web Data Query ResultsVisualizing Web Data Query Results
Visualizing Web Data Query Results
 
WWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL QueriesWWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL Queries
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new
 
jQuery Presentation to Rails Developers
jQuery Presentation to Rails DevelopersjQuery Presentation to Rails Developers
jQuery Presentation to Rails Developers
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
CouchDB
CouchDBCouchDB
CouchDB
 
Vidoop CouchDB Talk
Vidoop CouchDB TalkVidoop CouchDB Talk
Vidoop CouchDB Talk
 
JavaScript the Smart Way - Getting Started with jQuery
JavaScript the Smart Way - Getting Started with jQueryJavaScript the Smart Way - Getting Started with jQuery
JavaScript the Smart Way - Getting Started with jQuery
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlin
 
LuSql: (Quickly and easily) Getting your data from your DBMS into Lucene
LuSql: (Quickly and easily) Getting your data from your DBMS into LuceneLuSql: (Quickly and easily) Getting your data from your DBMS into Lucene
LuSql: (Quickly and easily) Getting your data from your DBMS into Lucene
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling database
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
Latinoware
LatinowareLatinoware
Latinoware
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

CouchDB

  • 1. CouchDB King Chung Huang Information Technologies University of Calgary
  • 3. Document-oriented Databases Today’s Talk CouchDB Overview Demonstrations
  • 6. Flat Hierarchical Network Relational Databases
  • 7. Post-Relational Databases Dimensional Object Document-oriented
  • 8. Document-oriented Databases Comparable to documents in the real world • Records are stored as schema-less documents • Each document is uniquely named ■ Documents are the primary unit of storage ■ Structures are not explicitly defined • No tables with uniform, pre-defined fields ■ Every document can have varying fields of different types ■ Documents are self contained • Data is not decomposed into tables with relations ■ Documents contain the context needed to understand them ■
  • 9. Document-oriented Databases Examples • Lotus Notes ■ Amazon SimpleDB ■ CouchDB ■ Key-Value Stores • Amazon S3 ■ Dynamo: Amazon’s Highly Available Key-value Store, DeCandia, et al., 2007 ■ Facebook Cassandra ■ Recently accepted as an Apache incubation project ■ Google BigTable ■ Bigtable: A Distributed Storage System for Structured Data, Chang, et al., ■ 2006
  • 11. Document database server REST API What is CouchDB? JSON documents Views with MapReduce Highly Scalable
  • 12. Document Database Server Implemented in Erlang • Ericsson Language ■ Highly concurrent, functional programming language ■ Designed with modern web applications in mind • Atomic Consistent Isolated Durable (ACID) • “Crash-only” design • Supports external handlers • Change notification ■ Custom processing ■ •
  • 13. REST HTTP API Representational State Transfer • A set of principles about how resources are defined and addressed ■ World Wide Web (HTTP) is RESTful • Uniform interface for accessing resources ■ Resources identified by URI ■ Actions transmitted in HTTP methods ■ Status communicated in status codes ■
  • 14. REST HTTP API CRUD Create, Read, Update, and Delete • • In HTTP POST /some/resource/id ■ GET /some/resource/id ■ PUT /some/resource/id ■ DELETE /some/resource/id ■
  • 15. JSON Documents JavaScript Object Notation • Considered language-independent ■ CouchDB stored XML documents before version 0.8 • Suitable if content is already in XML ■ Human readable, but can be onerous to type ■ Markup language, requires transformation from/to data structures ■ Represents primitive data types and structures • Strings, numbers, booleans ■ Arrays, dictionaries ■ Null ■ Documents can have attachments •
  • 16. JSON Documents Example { _id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”, is_published: true }
  • 17. JSON Documents Example { _id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”, is_published: true }
  • 18. JSON Documents Example { _id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”, is_published: true }
  • 19. JSON Documents Example { _id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”, is_published: true }
  • 20. JSON Documents Example { _id: “post1”, _rev: “123456”, … _attachments: { “picture.png”: { stub: true, content_type: “image/png”, length: 384 } } }
  • 21. Views Used to sort and filter through data • Lazily evaluated, highly efficient • Similar to indexing in relational databases ■ Defined in design documents • Documents named _design/… ■ Consist of map and reduce functions • Language independent ■ JavaScript supported by default ■ Mozilla Spidermonkey included ■
  • 22. Data Processing with MapReduce Programming model for processing and generating large data sets • Related, but not equivalent to map and reduce operations in • functional languages Take and produce key/value pairs with map and reduce functions • Map functions • Take input key/value pairs and produce an intermediate set of key/value pairs ■ Reduce functions • Take intermediate key and set of values for the key, and merges them into a ■ possibly smaller set of values MapReduce: Simplified Data Processing on Large Clusters • Jeff Dean, Sanjay Ghemawat, Google Inc.
  • 23. Data Processing with MapReduce Example { _id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”, is_published: true }
  • 24. Data Processing with MapReduce Example “post1” = { _id: “post1”, _rev: “123456”, title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…”, is_published: true }
  • 25. Data Processing with MapReduce Example “post1” = { title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…” }
  • 26. Data Processing with MapReduce Emit Posts by post_date “post1” = { title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…” } 1239910768 = { title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…” }
  • 27. Data Processing with MapReduce Emit Posts by post_date 1208456184 {title: “A bloody long time ago”, …} 1215421546 {title: “A blue moon ago”, …} 1222654641 {title: “Just Yesterday”, …} 1239910768 {title: “A Blog Post”, …} 1246816518 {title: “That was Then”, …} 1251687980 {title: “This is Now”, …} 1264836981 {title: “When Will Then Be Now?”, …}
  • 28. Data Processing with MapReduce Emit Posts by tag “post1” = { title: “A Blog Post”, tags: [“blue”, “glue”], post_date: 1239910768, body: “Once upon a time…” } “blue” = { title: “A Blog Post”, … } “glue” = { title: “A Blog Post”, … }
  • 29. Data Processing with MapReduce Emit Posts by tag blue {title: “Just Yesterday”, …} blue {title: “A Blog Post”, …} clue {title: “Just Yesterday”, …} flue {title: “When Will Then Be Now?”, …} flue {title: “This is Now”, …} glue {title: “A Blog Post”, …} wazoo {title: “That was Then”, …}
  • 30. Data Processing with MapReduce Emit Posts by tag, Reduced {title: “Just Yesterday”, …}, blue {title: “A Blog Post”, …} clue {title: “Just Yesterday”, …} {title: “When Will Then Be Now?”, …}, flue {title: “This is Now”, …} glue {title: “A Blog Post”, …} wazoo {title: “That was Then”, …}
  • 31. Scalability Incremental MapReduce • Multiversion Concurrency Control (MVCC) • Achieves serializability through multiversioning instead of locking ■ Eliminates waits to access objects ■ Updates create new documents ■ Tradeoff point: no waits, increased data storage ■ Incremental Distributed Replication • Eventual Consistency • Changes eventually propagate through distributed systems ■ Tradeoff point: increase availability and tolerancy, decreased freshness ■