SlideShare a Scribd company logo
1 of 43
Computational Research Division Lawrence Berkeley National Laboratory Dan Gunter
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Terminology: NOSQL and “Schemaless” ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NOSQL past and present Pre-RDBMS RDBMS era NOSQL
Pre-relational structured storage systems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Computer Systems News , 11/28/83
The relational model ,[object Object],[object Object],[object Object],A 1 ... A n Value 1 ... Value n R Relation (Table) Relation variable (Table name) Attribute (Column) {unordered} Heading Tuple (Row) {unordered}
Recent NOSQL database products Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak
Why NOSQL? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
CAP Theorem ,[object Object],[object Object],[object Object],[object Object],[object Object],All robust distributed systems live here Forfeit partition-tolerance Forfeit availability Forfeit consistency Single-site databases, cluster databases, LDAP  Distributed databases w/pessimistic locking, majority protocols Coda, web caching, DNS,  Dynamo
CAP, ACID, and BASE ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ACID BASE
Pioneers ,[object Object],[object Object],These implementations are  not  publicly available, but the distributed-system techniques that they integrated to build huge databases have been imitated, to a greater or lesser extent, by every implementation that followed.
Google BigTable ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BigTable’s Data Model Google’s Bigtable is essentially a massive, distributed 3-D spreadsheet. It doesn’t do SQL, there is limited support for atomic transactions, nor does it support the full relational database model. In short, in these and other areas, the Google team made design trade-offs to enable the scalability and fault-tolerance Google apps require. - Robin Harris, StorageMojo (blog), 2006-09-08 t 6 t 5 t 3 name contents: anchor:cnnsi.com ... anchor:my.look.ca ... “ com.cnn.www” “ CNN” ... “ CNN.com” ... “ <html>...” “ <html>...” “ <html>...”
Tablets and SSTables ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Use of Bloom Filters to optimize lookups ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],w  is not in {  x, y, z  } because it hashes to one position with a 0 1 1 1 0 0 1 0 1 0 1 0 0 1 0 { x,  w y, z }
Chubby and Paxos ,[object Object],Each “DB” is a replica Each server runs on its own host Google tends to run 5 servers, with only one being the “master” at any one time Chubby server  DB Chubby server  DB Chubby server  DB Chubby server  DB Chubby server  DB Master
What about CAP? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Amazon’s Dynamo ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dynamo data partitioning and replication Virtual node Host “node” Host “node” Virtual node Virtual node Virtual node Virtual node Virtual node Virtual node . . Hash ring using consistent hashing Host “node” Virtual node Virtual node Virtual node Virtual node 4 4 3 Item Hashes to this spot coordinator node replicas
Eventual consistency and sloppy quorum ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Replica synchronization with Merkle trees ,[object Object],[object Object],[object Object],For Dynamo, the “data” are the keys stored in a given virtual node Each node is a hash of its children If two top hashes match, then the trees are the same
Infrastructure (at scale) is fractal ,[object Object],[object Object],[object Object]
The Gold Rush Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Key/Value Store Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Project Voldemort Type Key/Value Store License Apache 2.0 Language Java Company Linked-In Web project-voldemort.com
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],result = self.client.add(bucket.get_name()).map(&quot;Riak.mapValuesJson” .reduce(&quot;Riak.reduceSum”.run() Riak Example: Map/reduce with the Python API Type Key/Value Store License Open-Source Language Erlang Company Basho Web wiki.basho.com/display/RIAK/Riak/
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Hibari Type Key/Value Store  License Open-Source Language Erlang Company Gemini Mobile Web sourceforge.net/projects/hibari/
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Cassandra ,[object Object],Type Extensible column store License Apache 2.0 Language Java Company Apache Software Foundation Web cassandra.apache.org
[object Object],[object Object],[object Object],[object Object],[object Object],SimpleDB Document Store CouchDB MongoDB Lotus Domino Mnesia
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],CouchDB Type Document store License Apache 2.0 Language Erlang Company Apache Software Foundation Web couchdb.org
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],MongoDB ,[object Object],http://www.slideshare.net/mongodb/mongodb-replica-sets Type Document store License GPL Language C++ Company 10gen Web mongodb.org
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Mnesia * Mozilla Public License modified to conform with laws of Sweden (more herring) Type Document store License EPL* Language Erlang Company Ericsson Web www.erlang.org Papers http://www.erlang.se/publications/mnesia_overview.pdf
Why do we care about Mnesia / OTP? ,[object Object],[object Object],females() -> F = fun() -> Q = query [E.name || E <- table(employee),     E.sex = female] end, mnemosyne:eval(Q) end, mnesia:transaction(F).  Erlang query for “all females” in company* *I know, but it’s not  my  example. This is right out of the manual.
Comparison of MongoDB and CouchDB ,[object Object],[object Object],[object Object],[object Object],[object Object],Database Inserts/sec MongoDB 16,000 CouchDB 70 CouchDB, batch 1,800
Schemaless data modeling http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/
Example from distributed monitoring ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],All of these are data modeling “anti-patterns” for relational DBs
What’s wrong with EAV? ,[object Object],[object Object]
What about queries?
SQL vs. M/R and other models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Selected references ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
Fabio Fumarola
 

What's hot (20)

Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 
Big data visualization
Big data visualizationBig data visualization
Big data visualization
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 

Similar to Schemaless Databases

No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
sankarapu posibabu
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
Bhupesh Bansal
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
Igor Moochnick
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
rICh morrow
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
J Singh
 

Similar to Schemaless Databases (20)

05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
 
NoSQL Basics - A Quick Tour
NoSQL Basics - A Quick TourNoSQL Basics - A Quick Tour
NoSQL Basics - A Quick Tour
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
No sql
No sqlNo sql
No sql
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
Datastores
DatastoresDatastores
Datastores
 
No sql
No sqlNo sql
No sql
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
 
NOSQL
NOSQLNOSQL
NOSQL
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Schemaless Databases

  • 1. Computational Research Division Lawrence Berkeley National Laboratory Dan Gunter
  • 2.
  • 3.
  • 4. NOSQL past and present Pre-RDBMS RDBMS era NOSQL
  • 5.
  • 6.
  • 7. Recent NOSQL database products Columnar or Extensible record Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. BigTable’s Data Model Google’s Bigtable is essentially a massive, distributed 3-D spreadsheet. It doesn’t do SQL, there is limited support for atomic transactions, nor does it support the full relational database model. In short, in these and other areas, the Google team made design trade-offs to enable the scalability and fault-tolerance Google apps require. - Robin Harris, StorageMojo (blog), 2006-09-08 t 6 t 5 t 3 name contents: anchor:cnnsi.com ... anchor:my.look.ca ... “ com.cnn.www” “ CNN” ... “ CNN.com” ... “ <html>...” “ <html>...” “ <html>...”
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Dynamo data partitioning and replication Virtual node Host “node” Host “node” Virtual node Virtual node Virtual node Virtual node Virtual node Virtual node . . Hash ring using consistent hashing Host “node” Virtual node Virtual node Virtual node Virtual node 4 4 3 Item Hashes to this spot coordinator node replicas
  • 20.
  • 21.
  • 22.
  • 23. The Gold Rush Columnar or Extensible record Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Schemaless data modeling http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/
  • 37.
  • 38.
  • 40.
  • 41.
  • 42.
  • 43.

Editor's Notes

  1. This talk really comes out of my attempt to orient myself in this space. Background is in monitoring distributed systems, concerned with scalable collection and data analysis. But also want to know what I can use for semi-structured data “in the small”.
  2. Where it applies, the distinction between relatively fixed schemas and dynamic ones is more technically significant than what query syntax is used to access the data, as has been shown by a number of products that provided a dialect of SQL as an alternative query language either alongside or on top of their native syntax.
  3. PICK -- MultiValue (aka PICK) databases are developed at TRW in 1965. M[umps] -- According to comment from Scott Jones M[umps] is developed at Mass General Hospital in 1966. It is a programming language that incorporates a hierarchical database with B+ tree storage. IBM IMS -- IBM IMS, a hierarchical database, is developed with Rockwell and Caterpillar for the Apollo space program in 1966. ISM -- InterSystems develops the ISM product family succeeded by the Open M product, all M[umps] implementations. See comment from Scott Jones below. ANSI M -- M[umps] is approved as a ANSI standard language in 1977. AT&amp;T DBM -- in 1979 Ken Thompson creates DBM which is released by AT&amp;T. At it&apos;s core it is a file-based hash. TDBM -- TDBM supporting atomic transactions NDBM -- NDBM was the Berkeley version of DBM supporting having multiple databases open at the same time. SDBM -- SDBM - another clone of DBM mainly for licensing reasons. GT.M -- GT.M is the first version of a key-value store with focus on high performance transaction processing. It is open sourced in 2000. BerkeleyDB -- BerkeleyDB is created at Berkeley in the transition from 4.3BSD to 4.4BSD. Sleepycat software is started as a company in 1996 when Netscape needed new features for BerkeleyDB. Later acquired by Oracle which still sell and maintain BerkeleyDB. Lotus Domino -- Lotus Notes or rather the server part, Lotus Domino, which really is a document database has it&apos;s initial release in 1989, now sold by IBM. It has evolved a lot from the early versions and is now a full office and collaboration suite. GDBM -- GDBM is the Gnu project clone of DBM Mnesia -- Mnesia is developed by Ericsson as a soft real-time database to be used in telecom. It is relational in nature but does not use SQL as query language but rather Erlang itself. Cache -- InterSystems CachÈ launched in 1997 and is a hybrid so-called post-relational database. It has object interfaces, SQL, PICK/MultiValue and direct manipulation of data structures. It is a M[umps] implementation. See Scott Jones comment below for more on the history of InterSystems Metakit -- Metakit is started in 1997 and is probably the first document oriented database. Supports smaller datasets than the ones in vogue nowadays. Neo4j -- Graph database Neo4j is started in 2000. db4o -- db4o an object database for java and .net is started in 2000 QDBM -- QDBM is a re-implementation of DBM with better performance by Mikio Hirabayashi. Memcached -- Memcached is started in 2003 by Danga to power Livejournal. Memcached isn&apos;t really a database since it&apos;s memory-only but there is soon a version with file storage called memcachedb. Infogrid graph DB -- Infogrid graph database is started as closed source in 2005, open sourced in 2008 CouchDB -- CouchDB is started in 2005 and provides a document database inspired by Lotus Notes. The project moves to the Apache Foundation in 2008. Google BigTable -- Google BigTable is started in 2004 and the research paper is released in 2006. JackRabbit -- JackRabbit is started in 2006 as an implementation of JSR 170 and 283. Tokyo Cabinet -- Tokyo Cabinet is a successor to QDBM by (Mikio Hirabayashi) started in 2006 Dynamo -- The research paper on Amazon Dynamo is released in 2007. MongoDB -- The document database MongoDB is started in 2007 as a part of a open source cloud computing stack and first standalone release in 2009. Cassandra -- Facebooks open sources the Cassandra project in 2008 Voldemort -- Project Voldemort is a replicated database with no single point-of-failure. Started in 2008. Dynomite -- Dynomite is a Dynamo clone written in Erlang. Terrastore -- Terrastore is a scalable elastic document store started in 2009 Redis -- Redis is persistent key-value store started in 2009 Riak -- Riak Another dynamo-inspired database started in 2009. HBase -- HBase is a BigTable clone for the Hadoop project while Hypertable is another BigTable type database also from 2009. Vertexdb -- Vertexdb another graph database is started in 2009 Term: NOSQL -- Eric Evans of Rackspace, a committer on the Cassandra project, introduces the term NoSQL often used in the sense of Not only SQL to describe the surge of new projects and products.
  4. Both of these systems are still used. An open-source version of M, called GT.M, is available (since 2000). M is still used by the US Dept of Veterans Affairs, and also by Ameritrade (Cache’: 12B transactions a day), ING Direct, and others in the financial industry. The IBM IMS system is still very actively used today, in particular for the US Federal Reserve. According to Wikipedia, odds are good your ATM transaction hits an IMS database. Chinese banks have purchased IMS technology. IMS includes a separate “transaction management” (TM) system.
  5. E. F. Codd’s seminal 1970 paper, “ A Relational Model of Data for Large Shared Data Banks” laid out a solid mathematical basis for databases in contrast to the hierarchical and network models of the time, relational algebra, an offshoot of first-order logic, provided a declarative means of reasoning about the data that did not depend on the implementation SQL is “loosely based” on relational algebra
  6. This taxonomy will be explored in more detail later, the point for now is that there are several different types of datastores and a number of examples of each and, referring back to the timeline, most of these implementations have occurred in the past few years..
  7. Corporations (once again) found themselves at the forefront of systems research. But what was that research? (Read on..)
  8. If nothing else, being able to refer to the “CAP theorem” the next time your networked demo breaks..
  9. In his talk, Brewer said “there is almost no work in this area”. I think that the existence of scalable (schemaless) database systems is proof that this has changed.
  10. Pictured is Parliament, pioneers of funk!
  11. Trivia: what major movie was about producing a script called “Chubby Rain”?
  12. Example of a BIgTable that stores web pages (directly out of the paper). The row names are reversed URLs (so sorted rows tend to group things by the same domain) There are two column families, “contents” and “anchor” In this example, each anchor cell has one version, and the contents column has 3
  13. Paxos is an old and well-known algorithm. The Chubby “Database” is really a set of directories with small “lockfiles”. Each tablet server gets one Chubby directory, and each of its tablets is a lockfile.
  14. These core services included the Amazon e-commerce shopping cart.
  15. Each virtual node is responsible for keys between itself and its predecessor on the ring. The mapping of a single node to a variable number of virtual nodes on the hash ring accounts for heterogeneity (host “power”) in the system.
  16. The quorum is “sloppy” because R and W refer to the number of healthy nodes, which may change between the write and subsequent read of the key.
  17. (Who knows what this is?) The picture is a close-up of a vegetable: the “ Chou Romanesco&amp;quot; cauliflower
  18. Particularly appropriate analogy because of the industry’s tendency to rush towards shiny new technologies! Following sections will examine each of these categories and walk through one publicly available product (or more) for each. With the exception of graph databases, which I simply haven’t taken the time to grok yet.
  19. Both Voldemort and the next database, Riak, claim they were “inspired” by the early Dynamo paper
  20. In the diagram, the green nodes are head; orange middle; red are tails. The white arrows are write requests, grey read requests, and red are (all) replies.
  21. Developed by former engineers from BigTable and Dynamo projects, in heavy use at Facebook. For consistency level, zero = totally async.; Any= 1 node, including hinted handoff; Quorum = R/2+1 where R = #replicas Reads of 0 or Any don’t make sense. 0=no data, Any=wrong node; can’t do read-repairs, just the handed-off version
  22. Has a nice Web UI called “Futon”. Yes, everything is a reclining furniture pun.
  23. Obviously, this is at best a micro-benchmark. YCSB stands for Yahoo! Cloud Serving Benchmark
  24. I won’t attempt to actually cover Map/Reduce, and don’t know Erlang. Instead: what impact do these databases have on data modeling efforts?