Human Factors of XR: Using Human Factors to Design XR Systems
Couchdb: No SQL? No driver? No problem
1. *
CouchDB
No SQL? No Driver?
No problem.
Angel Pizarro
angel@upenn.edu
* www.bauwel-movement.co.uk/ sculpture.php
2. About Me
Me: CBIL alumni! Work in mass spec
proteomics
Lots of data in lots of formats in
bioinformatics
Ruby for programming and Ruby on Rails for
Web apps
But that doesn’t matter for CouchDB!
Interested in CouchDB for AWS deployment
3. Overview
Talk about Key-Value stores
Introduce some general theory and
concepts
CouchDB specifics
Example problem
More CouchDB specifics
Questions?
4. Key-Value Databases
Datastore of values indexed
by keys (duh!)
Hash or B-Tree index for
keys
Cassandra
Hash is FAST, but only allows
single-value lookups
B-Tree is slower, but allows
range queries
Horizontally scalable - via key
partitioning
5. The CAP theory : applies when business
logic is separate from storage
Consistency vs. Availability
vs. Partition tolerance
RDBMS = enforced
consistency
PAXOS = quorum
consistency
CouchDB (and others) =
eventual consistency
and horizontally
scalable
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
8. CouchDB
Document Oriented Database
JSON documents
HTTP protocol using REST operations
No direct native language drivers *
Javascript is the lingua franca
* Hovercraft: http://github.com/jchris/hovercraft/
9. CouchDB
Document Oriented Database
JSON documents
HTTP protocol using REST operations
No direct native language drivers *
Javascript is the lingua franca
ACID & MVCC guarantees on a per-
document basis
* Hovercraft: http://github.com/jchris/hovercraft/
10. CouchDB
Document Oriented Database
JSON documents
HTTP protocol using REST operations
No direct native language drivers *
Javascript is the lingua franca
ACID & MVCC guarantees on a per-
document basis
Map-Reduce indexing and views
* Hovercraft: http://github.com/jchris/hovercraft/
11. CouchDB
Document Oriented Database
JSON documents
HTTP protocol using REST operations
No direct native language drivers *
Javascript is the lingua franca
ACID & MVCC guarantees on a per-
document basis
Map-Reduce indexing and views
Back-ups and replication are easy-peasy
* Hovercraft: http://github.com/jchris/hovercraft/
19. REST
Representational State Transfer
Clients-Server separation with uniform interface
(HTTP)
Load-balancing, caching, authorization & authentication,
proxies
Stateless - client is responsible for creating a self-
sufficient request
20. REST
Representational State Transfer
Clients-Server separation with uniform interface
(HTTP)
Load-balancing, caching, authorization & authentication,
proxies
Stateless - client is responsible for creating a self-
sufficient request
Resources are cacheable - servers must mark
non-cacheable resources as such
21. REST
Representational State Transfer
Clients-Server separation with uniform interface
(HTTP)
Load-balancing, caching, authorization & authentication,
proxies
Stateless - client is responsible for creating a self-
sufficient request
Resources are cacheable - servers must mark
non-cacheable resources as such
Only 5 HTTP verbs
22. REST
Representational State Transfer
Clients-Server separation with uniform interface
(HTTP)
Load-balancing, caching, authorization & authentication,
proxies
Stateless - client is responsible for creating a self-
sufficient request
Resources are cacheable - servers must mark
non-cacheable resources as such
Only 5 HTTP verbs
GET, PUT, POST, DELETE, HEAD
23. CouchDB
REST/CRUD
GET read
PUT create or update
DELETE delete something
POST bulk operations
24. CouchDB passes the
ACID test
Each document is completely self-sufficient
Each document has a version number
An update operation writes a complete
new copy of the the record and is assigned
the new version number
Append-only file structure allows the write
to occur while still serving read requests
25. MVCC RDBMS CouchDB
Multi-Version
Concurrency Control
RDBMS enforces consistency
using read/write locks
Instead of locks, CouchDB
just serve up old data
Multi-document (mutli-row)
transactional semantics
must be handled by the
application
26. Database API
Create a DB:
$ curl -X PUT http://127.0.0.1:5984/friendbook
{"ok":true}
27. Database API
Create a DB:
Protocol
$ curl -X PUT http://127.0.0.1:5984/friendbook
{"ok":true}
28. Database API
Create a DB:
CouchDB server
$ curl -X PUT http://127.0.0.1:5984/friendbook
{"ok":true}
29. Database API
Create a DB:
DB name
$ curl -X PUT http://127.0.0.1:5984/friendbook
{"ok":true}
30. Database API
Create a DB:
$ curl -X PUT http://127.0.0.1:5984/friendbook
{"ok":true}
Try it Again: {"error":"db_exists"}
31. Database API
Create a DB:
$ curl -X PUT http://127.0.0.1:5984/friendbook
{"ok":true}
Try it Again: {"error":"db_exists"}
Not recoverable!
Delete a DB:
$ curl -X DELETE http://localhost:5984/friendbook
{"ok":true}
32. Inserting a document
All insert require that you give a unique ID. You can
request one from CouchDB:
$ curl -X GET http://localhost:5984/_uuids
{"uuids":["d1dde0996a4db7c1ebc78fb89c01b9e6"]}
33. Inserting a document
All insert require that you give a unique ID. You can
request one from CouchDB:
$ curl -X GET http://localhost:5984/_uuids
{"uuids":["d1dde0996a4db7c1ebc78fb89c01b9e6"]}
We’ll just give one:
$ curl -X PUT http://localhost:5984/friendbook/j_doe
-d @j_doe.json
{"ok":true,
"id":"j_doe",
"rev":"1-062af1c4ac73287b7e07396c86243432"}
34. Inserting a document
All insert require that you give a unique ID. You can
request one from CouchDB:
$ curl -X GET http://localhost:5984/_uuids
{"uuids":["d1dde0996a4db7c1ebc78fb89c01b9e6"]}
We’ll just give one:
$ curl -X PUT http://localhost:5984/friendbook/j_doe
-d @j_doe.json
Read a JSON file
{"ok":true,
"id":"j_doe",
"rev":"1-062af1c4ac73287b7e07396c86243432"}
35. Full JSON document
Before
{ "name": "J. Doe",
"friends": 0 }
After
{ "_id": "j_doe",
"_rev": "1-062af1c4ac73287b7e07396c86243432",
"name": "J. Doe",
"friends": 0 }
36. Updating a document
$ curl -X PUT http://localhost:5984/friendbook/j_doe
-d '{"name": "J. Doe", "friends": 1 }'
{"error":"conflict","reason":"Document update conflict."}
37. Updating a document
$ curl -X PUT http://localhost:5984/friendbook/j_doe
-d '{"name": "J. Doe", "friends": 1 }'
{"error":"conflict","reason":"Document update conflict."}
Must give _rev (revision number) for updates!
revised.json
{ "_rev":"1-062af1c4ac73287b7e07396c86243432",
"name":"J. Doe", "friends": 1 }
$ curl -X PUT http://localhost:5984/friendbook/j_doe -d @revised.json
{"ok":true,"id":"j_doe","rev":"2-0629239b53a8d146a3a3c4c63e
2dbfd0"}
38. Deleting a document
$ curl -X DELETE http://localhost:5984/friendbook/j_doe
{"error":"conflict","reason":"Document update conflict."}
Must give revision number for deletes!
$ curl -X DELETE http://localhost:5984/friendbook/j_doe?
rev=2-0629239b53a8d146a3a3c4c63e2dbfd0
{"ok":true,"id":"j_doe",
"rev":"3-57673a4b7b662bb916cc374a92318c6b"}
Returns a revision number for the delete
$ curl -X GET http://localhost:5984/friendbook/j_doe
{"error":"not_found","reason":"deleted"}
39. Bulk operation
POST /database/_bulk_docs with a
JSON document containing all of the new
or updated documents.
// documents to bulk upload
{
"docs": [
{"_id": "0", "integer": 0, "string": "0"},
{"_id": "1", "integer": 1, "string": "1"},
{"_id": "2", "integer": 2, "string": "2"}
]
// reply from CouchDB
}
[
{"id":"0","rev":"1-62657917"},
{"id":"1","rev":"1-2089673485"},
{"id":"2","rev":"1-2063452834"}
]
40. GOTCHA’s!
Version storage is not guaranteed!
Do not use this as a VCS!
POST to /db/_compact deletes all older vesions
To “roll back a transaction” you must:
Retrieve all related records, cache these
Insert any updates to records.
On failure, use the returned revision numbers to
re-insert the older record as a new one
43. Our Example Problem
Hello world? Blog? Twitter clone?
Let’s store all human proteins instead
44. Our Example Problem
Hello world? Blog? Twitter clone?
Let’s store all human proteins instead
LOCUS YP_003024029 227 aa linear PRI 09-JUL-2009
DEFINITION cytochrome c oxidase subunit II [Homo sapiens].
ACCESSION YP_003024029
VERSION YP_003024029.1 GI:251831110
DBLINK Project:30353
DBSOURCE REFSEQ: accession NC_012920.1
KEYWORDS .
SOURCE mitochondrion Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
45. Our Example Problem
Hello world? Blog? Twitter clone?
Let’s store all human proteins instead
LOCUS YP_003024029 227 aa linear PRI 09-JUL-2009
DEFINITION cytochrome c oxidase subunit II [Homo sapiens].
ACCESSION YP_003024029
VERSION YP_003024029.1 GI:251831110
DBLINK Project:30353
FEATURES
DBSOURCE REFSEQ: accession NC_012920.1 Location/Qualifiers
KEYWORDS . source 1..227
SOURCE /organism="Homo sapiens"
mitochondrion Homo sapiens (human)
ORGANISM Homo sapiens /organelle="mitochondrion"
/isolation_source="caucasian"
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
/db_xref="taxon:9606"
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo./tissue_type="placenta"
/country="United Kingdom: Great Britain"
/note="this is the rCRS"
Protein 1..227
/product="cytochrome c oxidase subunit II"
/calculated_mol_wt=25434
http://www.ncbi.nlm.nih.gov/
46. Our Example Problem
Hello world? Blog? Twitter clone?
Let’s store all human proteins instead
LOCUS YP_003024029 227 aa linear PRI 09-JUL-2009
DEFINITION cytochrome c oxidase subunit II [Homo sapiens].
ACCESSION YP_003024029
VERSION YP_003024029.1 GI:251831110
DBLINK Project:30353
FEATURES
DBSOURCE REFSEQ: accession NC_012920.1 Location/Qualifiers
KEYWORDS . source 1..227
SOURCE /organism="Homo sapiens"
mitochondrion Homo sapiens (human)
ORGANISM Homo sapiens /organelle="mitochondrion"
/isolation_source="caucasian"
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
/db_xref="taxon:9606"
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo./tissue_type="placenta"
/country="United Kingdom: Great Britain"
/note="this is the rCRS"
Protein 1..227
/product="cytochrome c oxidase subunit II"
/calculated_mol_wt=25434
http://www.ncbi.nlm.nih.gov/
53. Design Documents
The key to using CouchDB as more than a
key-value store
Just another JSON document, but contain
javascript functions that CouchDB treats
as application code
Functions are executed within CouchDB
54. Design Documents
The key to using CouchDB as more than a
key-value store
Just another JSON document, but contain
javascript functions that CouchDB treats
as application code
Functions are executed within CouchDB
Contain sections for map-reduce views,
data validation, alternate formatting, ...
Also library code & data structures specific to
the design document
56. Soy Map!
Views use a Map-Reduce model for
indexing and defining “virtual” documents
Fits well with assumptions of self-sufficient
documents and eventual consistency
57. Soy Map!
Views use a Map-Reduce model for
indexing and defining “virtual” documents
Fits well with assumptions of self-sufficient
documents and eventual consistency
Map function is applied to all documents in
the database
Emits (parts of) documents that pass mustard
Indexing is incremental after an initial definition
You can choose to defer an index update for
insert speed
65. GET by the indexed key
GET /refseq_human/_design/gb/_view/dbXref?key="GeneID:10"
{"total_rows":7,"offset":2,"rows":[
{"id":"NP_000006",
"key":"GeneID:10",
"value":"NP_000006"}
]}
66. Reduce functions
Optional and used in concert with a
specific map function
Great for summarizing or collating
numerical data points
E.g. counts, number of over time X, average
load, probability of conversion
Not really applicable to our example, so
we’ll not cover it today
67. Show me the ... HTML?
JSON is great, but what about, ya know,
something useful?
You can make a separate app to reformat
the JSON
OR you can use the “shows” section of a
_design document.
Rich formating possible with functions,
templates, and special include macros
72. Backups & Replication
Backup: simply copy the database file
Replicate: send a POST request with a source and
target database
73. Backups & Replication
Backup: simply copy the database file
Replicate: send a POST request with a source and
target database
Source and target DB’s can either be local (just
the db name) or remote (full URL)
74. Backups & Replication
Backup: simply copy the database file
Replicate: send a POST request with a source and
target database
Source and target DB’s can either be local (just
the db name) or remote (full URL)
“continous”: true option will register the
target to the source’s _changes notification API
75. Backups & Replication
Backup: simply copy the database file
Replicate: send a POST request with a source and
target database
Source and target DB’s can either be local (just
the db name) or remote (full URL)
“continous”: true option will register the
target to the source’s _changes notification API
$ curl -X POST http://localhost:5984/_replicate
-d '{"source":"db", "target":"db-replica", "continuous":true}'
77. Data normalization? Schema?
Foreign Keys? Column
Constraints?
forgetaboutit
Italian for “forget about it”
… “or die”
78. Data normalization? Schema?
Foreign Keys? Column
Constraints?
forgetaboutit
Italian for “forget about it”
… “or die”
Denormalize “until it hurts”
79. Data normalization? Schema?
Foreign Keys? Column
Constraints?
forgetaboutit
Italian for “forget about it”
… “or die”
Denormalize “until it hurts”
But there are validations are available
80. Data normalization? Schema?
Foreign Keys? Column
Constraints?
forgetaboutit
Italian for “forget about it”
… “or die”
Denormalize “until it hurts”
But there are validations are available
Validates a record on update with a JS function
81. Required Fields
function(newDoc, oldDoc, userCtx) {
function require(field, message) {
message = message || "Document must have a " + field;
if (!newDoc[field]) throw({forbidden : message});
};
if (newDoc.type == "blogPost") {
require("title");
require("created_at");
require("body");
Convention alert!
require("author");
} ...
}
- If the key is a DateTime, then B-tree is a much better choice
Brewer’s CAP Theorem http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
Partition tolerance encompasses both business logic and data partitioning.
PAXOS will override more recent updates to a disconnected resource if it did not vote on a previous transaction.
Highlighted words covered later in order that they appear
Highlighted words covered later in order that they appear
Highlighted words covered later in order that they appear
Highlighted words covered later in order that they appear
Highlighted words covered later in order that they appear
Highlighted words covered later in order that they appear
Other stuff, but this is the most relevant for the discussion
Older browsers only support green verbs
Other stuff, but this is the most relevant for the discussion
Older browsers only support green verbs
Other stuff, but this is the most relevant for the discussion
Older browsers only support green verbs
Other stuff, but this is the most relevant for the discussion
Older browsers only support green verbs
Other stuff, but this is the most relevant for the discussion
Older browsers only support green verbs
Other stuff, but this is the most relevant for the discussion
Older browsers only support green verbs
Other stuff, but this is the most relevant for the discussion
Older browsers only support green verbs
CRUD = Create Read Update Delete
Next is the API discussions
You can give a “count” parameter to UUID function:
$ curl -X GET http://localhost:5984/_uuids?count=10
You can give a “count” parameter to UUID function:
$ curl -X GET http://localhost:5984/_uuids?count=10
Can give it as an URL parameter or in the E-Tag HTTP header.
You cannot delete a specific revision! The revision number is only there so that the server can definitively say you are talking about the most recent record.
You need delete rev for replication of delete operations on other servers that are being synced to this one.
Might also be able to delete a particualr version. Will have to check that.
Note: I could’ve made GI a number, but did not in this case
Zipcodes would be a bad thing to turn into numbers, b/c of possible leading zeros
Note: I could’ve made GI a number, but did not in this case
Zipcodes would be a bad thing to turn into numbers, b/c of possible leading zeros
Note: I could’ve made GI a number, but did not in this case
Zipcodes would be a bad thing to turn into numbers, b/c of possible leading zeros
Best practice = One design document per application or set of requirements
Next: Map-Reduce Views
Best practice = One design document per application or set of requirements
Next: Map-Reduce Views
Best practice = One design document per application or set of requirements
Next: Map-Reduce Views
We are just going to take a look at a simple plain text example of FASTA file
Append-only file structure ensures that your DB is always valid, even during mid-write server failures.
Append-only file structure ensures that your DB is always valid, even during mid-write server failures.
Append-only file structure ensures that your DB is always valid, even during mid-write server failures.
Append-only file structure ensures that your DB is always valid, even during mid-write server failures.
Append-only file structure ensures that your DB is always valid, even during mid-write server failures.