A presentation given at Refresh Savannah on the wonderful world of NoSQL, what it includes, what it means and where using new-style databases makes sense. There's also a demo that contrasts doing tags with MongoDB vs. doing it with a traditional RDBMS.
2. What are we running
from?
⢠Relational databases are the defacto
standard for storing data in a web
application.
⢠A lot of times, that data isnât really
relational at all.
⢠RDBMSâs have lots of rules that can impact
performance.
3. Rules? What Rules?
⢠Classic relational databases follow the
ACID rules:
⢠Atomicity
⢠Consistency
⢠Isolation
⢠Durability
4. Atomicity
⢠If any part of the update fails, it all fails.
⢠Databases have to be able to lock tables
and rows for operations, which can block
or delay other incoming requests.
5. Consistency
⢠After a transaction, all copies of the data
must be consistent with each other (my
interpretation).
⢠Replication across lots of shards is
expensive especially if thereâs locking
involved.
6. Isolation
⢠Data involved in a transaction must be
inaccessible to other operations.
⢠Remember the thing about locked rows
and tables?
⢠Itâs a bummer.
7. Durability
⢠Once a user is notiďŹed that a transaction
has completed, the data must be accessible
and all integrity constraints have been met.
8. I come not to bury
MySQL...
⢠Relational databases are great for a lot of
uses.
⢠If you have data thatâs actually relational and
you need transactions, joins and have a
limited number of data types, then an
RDBMS will work for you.
9. But...
⢠RDBMSâs have been
treated like hammers
and used for things
theyâre not good at and
werenât designed for.
⢠Like the web...
16. Key-Value BeneďŹts
⢠Simple
⢠High performance (usually) because there
are no transactions or relations so itâs a
simple bucket and lookup.
⢠Extremely ďŹexible
⢠Commonly used as caches in front of
slower resources (like MySQL - bazinga!)
17. Popular Players
⢠memcached - in memory only, extremely
efďŹcient hashing algorithm allows you to
scale easily to hundreds of nodes.
⢠Redis - persistent, slightly more complex
than memcached (has support for arrays)
but still highly performant.
⢠Riak - The Rails Machine guys love it. Jesse?
18. My Uses
⢠memcached: Read-through cache for
Rails with cache-money.
⢠redis: persistent cache for results from
our algorithm, partitioned by version and
instance.
19. Wide Column
⢠Family of databases modeled on either
Googleâs BigTable or Amazonâs Dynamo.
⢠Pick two out of three from the CAP
theorem in order to get horizontal
scalability.
⢠Data stored by column instead of by row.
20. CAP?
⢠Consistency:All clients always have the
same view of the data.
⢠Availability: Each client can always read
and write.
⢠Partition Tolerance:The system works
well despite physical network partitions
21. Use cases
⢠Making sense out of large amounts of data
where you know your query scenario
ahead of time.
⢠Large = 100s of millions of records.
⢠Data-mining log ďŹles and other sources of
similar data.
22. Big Players
⢠HBase
⢠Cassandra
⢠Hypertable
⢠Amazonâs SimpleDB
⢠Googleâs BigTable (the granddaddy of all of
them)
23. Graph Databases
⢠Store nodes, edges and properties
⢠Think of them as Things, Connections and
Properties
⢠Good for storing properties and
relationships.
⢠Honestly, I donât fully understand them...
anyone?
25. Document Stores
⢠Short on relationships, tall on rich data
types.
⢠Big on eventual consistency and ďŹexible
schemas.
⢠Hybrid of traditional RDBMS and Key-Value
stores.
26. Use Cases
⢠Content Management Systems
⢠Applications with rapid partial updates
⢠Anything you donât need joins or
transactions for that you would normally
use a RDBMS for.
28. MongoDB
⢠Support for rich data types: arrays, hashes,
embedded documents, etc
⢠Support for adding and removing things
from arrays and embedded documents
(addToSet, for example).
⢠Map/Reduce support and strong indexes
⢠Regular expression support in queries
29. Design Considerations
⢠Embedded Documents - Use only if it
the embedded document will always be
selected with the parent.
⢠Indexes - MongoDB punishes you much
earlier for missing indexes than MySQL.
⢠Document size - Currently, documents
are limited to 4MB, which should be large
enough, but if itâs not...
30. Real-World MongoDB
⢠We use MongoDB heavily at MIS.
⢠Statistics application and reporting
⢠Top-secret new application
⢠Web crawler and indexer
⢠CMS
34. And to get a âthingâsâ
tags?
SELECT `tags`.* FROM `tags`
INNER JOIN `taggings` ON `tags`.id = `taggings`.tag_id
WHERE ((`taggings`.taggable_id = 237)
AND (`taggings`.taggable_type = 'Song'))
35. Yuck!
Thatâs a lot of pain for something so simple.
And I didnât even show you ďŹnding things with tag âxâ.
Or how to set and unset tags on a âthingâ.
Ouch.
38. Letâs Make This Easy...
def add_tag(tag)
tag = Post.clean_tag(tag)
self.tags << tag
self.add_to_set(:tags => tag) unless self.new_record?
end
def remove_tag(tag)
tag = Post.clean_tag(tag)
self.tags.delete(tag)
self.pull(:tags => tag) unless self.new_record?
end
def self.clean_tag(str)
str.strip.downcase.gsub(" ","-").gsub(/[^a-z0-9-]/,"")
end
def self.clean_tags(str)
out = []
arr = str.split(",")
arr.each do |t|
out << self.clean_tag(t)
end
out
end
39. Demo Time
Sorry if youâre looking at this later, but itâs console time!
40. Why I Love MongoDB
⢠Document model ďŹts how I build web apps.
⢠For most apps, I donât need transactions.
⢠Eventual consistency is actually OK.
⢠Partial updates and arrays make things that
are a pain in SQL-land absolutely painless.
⢠Itâs just smart enough without getting in the
way.
41. Whatâs NoSQL, really?
⢠The right tool for the job.
⢠Weâve got lots of options for storing
application data.
⢠The key is picking the one that solves our
real problem.
⢠And if an RDBMS is the right tool, thatâs OK
too.