Agenda
• What is NoSQL?
• NoSQL Technology Breakdown
• Where is NoSQL a Killer App?
• What Good is Relational?
• NoSQL and CouchDB
• NoSQL, Relational, or Both?
What is NoSQL ?
•Common traits
• Non-relational, Scalable
• Non-schematized/schema-free
• Eventual consistency
• Open source
• Distributed
• “Web scale”
• Developed at big Internet companies
Consistency
• CAP Theorem Databases may only excel at two of the following three
attributes: consistency, availability and partition tolerance
• NoSQL does not offer “ACID” guarantees Atomicity, consistency,
isolation and durability
• Instead offers “eventual consistency” Similar to DNS propagation
• Indexing
• Most NoSQL databases are indexed by key
• Some allow so-called “secondary” indexes
• Often the primary key indexes are clustered
• Hbase uses Hadoop Distributed File System, which is append-only
• Writes are logged
• Logged writes are batched
• File is re-created and sorted
• You get control back quikly
• Queries
• Typically no query language
• Instead, create procedural program
• Sometimes SQL is supported
• Sometimes MapReduce code is used…
MapReduce
• Map step: split the query up
• Reduce Most typical of Hadoop andstep: merge the results
• used with Wide Column Stores, esp. Hbase
• Amazon Web Services’ Elastic MapReduce (EMR) can read/write
DynamoDB, S3, Relational Database Service (RDS)
• “Hive”add-on to Hadoop offers a HiveQL (SQL-like) abstraction over MR
• Use with Hive tables
• Use with Hbase
NoSQL Technology Breakdown
• Key-Value Stores
• The most common; not necessarily the most popular
• Has rows, each with something like a big dictionary/associative array
• Schema may differ from row to row
• Common on Cloud platforms
• e.g. Amazon SimpleDB, Azure Table Storage
• MemcacheDB, Voldemort
• DynamoDB (AWS), Dynomite, Redis and Riak
Wide Column Stores
• Has tables with declared column families
• Each column family has “columns” which are KV pair that can vary from row
to row
• These are the most foundational for large sites
• Big Table (Google)
• Hbase (Originally part of Yahoo-dominated Hadoop project)
• Cassandra (Facebook)
• Calls column families “super columns” and tables “super column families”
• They are the most “Big Data”-ready
• Especially Hbase + Hadoop
Document Stores
• Have “databases,” which are akin to tables
• Have “documents,” akin to rows
• Documents are typically JSON objects
• Each document has properties and values
• Values can be scalars, arrays, links to documents in other databases or sub-documents (i.e.
contained JSON objects - Allows for hierarchical storage)
• Can have attachments as well
• Old versions are retained
• So Doc Stores work well for content management
• Some view doc stores as specialized KV stores
• Most popular with developers, startups, VCs
• The biggies:
• CouchDB
• MongoDB
Document Store Application Orientation
• Documents can each be addressed by URIs
• CouchDB supports full REST interface
• Very geared towards JavaScript and JSON
• Documents are JSON objects
• CouchDB/MongoDB use JavaScript as native language
• In CouchDB, “view functions” also have unique URIs and they return HTML
• So you can build entire applications in the database
Graph Databases
• Great for social network applications and others where relationships are
important
• Nodes and edges
• Edge like a join
• Nodes like rows in a table
• Nodes can also have properties and values
• Neo4j is a popular graph db
Where is NoSQL a Killer App?
• Content Management
• Document databases work really well here
• Regular KV pairs can store meta data
• Can also store text-based content
• Attachments can store file-based or binary content
• Versioning and URI addressability help as well
• CouchDB gets called a “Web database”
• Product Catalogs
• Products in a catalog tend to have many attributes in common and then
various others that are class-specific
• Common
• Product ID
• Name
• Description
• Price
• Key Value Stores and Wide Column Stores work well here
• Social
• Graph databases work best here
• Great for tracking:
• Networks
• Followers
• Group membership
• Threaded interactions (comments, likes/favorites)
• Great for Membership, Ownership
• Avoids the self-joins and many-to-many tables necessary in relational DBs
• Big Data
• Wide Column and Key-Value Stores work best here
• MapReduce is designed for this scenario
• Hadoop and Hbase come up a lot
• Sharding and append-only help here
• Premise of analytics is reading data, not maintaining it.
• Miscellaneous
• Event-driven data (i.e. logs)
• User profiles, preferences
• Mail, status message streams
• Other Web data
• Automobile directions
• Info for sites on maps (category, name, description, photo)
• User reviews
• Etc.
What Good is Relational ?
Transactional
• support transactions
• Business systems require atomic transactions
• You can’t process an order without decrementing inventory
• You can’t register a credit without its corresponding debit
• No exceptions, no excuses
Formal Schema
• Regular processes have regular data
• Stocks, trades
• PO line items
• Personnel records
• Insurance policies
• These need relational databases with declared schema
• These don’t need MapReduce, document or graph representation
• Banded Reporting
• Operational reporting is based on detail and group sections with predictable,
consistent layout, based on known schema
• Very hard to design pixel-perfect reports against indeterminate schema
• This highlights how operational business processes almost always require
relational databases
• Data Size
• A well-defined query language
• Mature development and administration tools
• Denormalize the database
NoSQL, Relational, or Both?
• Type of App
• Really a question of consistency versus massive scale
• Is this an internal system or a public one?
• Is it an application for the data or data for a system?
• Below a certain threshold of concurrent usage, NoSQL may be slower than relational
• Skill Sets and Investment
• Does your staff have RDBMS skills already?
• Do you have significant investment in relational database hw/sw?
• Lots of apps that use an RDBMS?
• Do you want to support both?
• Are you a startup?
• Employ developers who possess NoSQL skills and prefer NoSQL?
• Relational databases use Structured Querying Language (SQL), making them a
good choice for applications that involve the management of several
transactions.
• NoSQL Limitations
• In non-relational databases like Mongo, there are no joins like there would be in relational
databases.
• It also doesn’t automatically treat operations as transactions the way a relational database
does, you must manually choose to create a transaction and then manually verify it, manually
commit it or roll it back.