This presentation covers some common terminology used to describe NoSQL databases, goes into depth on some popular scalable database architectures, and includes an overview of Hypertable
3. www.hypertable.org
Structured, Semi-Structured,Structured, Semi-Structured,
and Unstructured Dataand Unstructured Data
Structured is what RDBMS storeStructured is what RDBMS store
Data is broken into discrete componentsData is broken into discrete components
Types associated with each component:Types associated with each component:
integer, floating point, date, stringinteger, floating point, date, string
Unstructured is free-form textUnstructured is free-form text
Semi-structured is combination ofSemi-structured is combination of
sturctured and semi-structuredsturctured and semi-structured
4. www.hypertable.org
Document-OrientedDocument-Oriented
Semi-structured documentsSemi-structured documents
Accepts documents in a format such asAccepts documents in a format such as
JSON, XML, YAMLJSON, XML, YAML
Often Schema-lessOften Schema-less
Auto-index fieldsAuto-index fields
Examples: CouchDB, MongoDBExamples: CouchDB, MongoDB
Best Fit: XML or Web documentsBest Fit: XML or Web documents
5. www.hypertable.org
Graph DatabasesGraph Databases
Database designed to represent graphsDatabase designed to represent graphs
APIs for performing graph operationsAPIs for performing graph operations
Traversal (depth-first, breadth-first)Traversal (depth-first, breadth-first)
Shortest/Cheapest pathShortest/Cheapest path
PartitioningPartitioning
Some allow HypergraphsSome allow Hypergraphs
Examples:Examples:
Neo4j, HyperGraphDB, InfoGrid,Neo4j, HyperGraphDB, InfoGrid,
AllegroGraph, Sones, DEX, FlockDB,AllegroGraph, Sones, DEX, FlockDB,
OrientDB, VertexDB, InfiniteGraph, FilamentOrientDB, VertexDB, InfiniteGraph, Filament
More info: sones graphdb landscapeMore info: sones graphdb landscape
6. www.hypertable.org
Column-OrientedColumn-Oriented
Data physically stored by columnData physically stored by column
RDBMS typically row-orientedRDBMS typically row-oriented
Improved performance for columnImproved performance for column
operationsoperations
Better data compressionBetter data compression
Examples:Examples:
Hypertable, HBase, Cassandra, VerticaHypertable, HBase, Cassandra, Vertica
7. www.hypertable.org
In-MemoryIn-Memory
Data set stored in RAMData set stored in RAM
Extremely fast accessExtremely fast access
Limited capacityLimited capacity
Examples:Examples:
Memcached, Redis, MonetDB, VoltDBMemcached, Redis, MonetDB, VoltDB
8. www.hypertable.org
Horizontal ScalabilityHorizontal Scalability
Scale outScale out
Increase capacity by adding machinesIncrease capacity by adding machines
Opposite of vertical scalability (scale up)Opposite of vertical scalability (scale up)
Commodity HardwareCommodity Hardware
10. www.hypertable.org
Amazon AWSAmazon AWS
S3S3
Online storage web serviceOnline storage web service
Designed for larger amounts of dataDesigned for larger amounts of data
Cost $0.15/GB per monthCost $0.15/GB per month
SimpleDBSimpleDB
Designed for smaller amounts of dataDesigned for smaller amounts of data
Provides indexing and richer query capabilityProvides indexing and richer query capability
Cost $0.27/GB per month + machine utilization feeCost $0.27/GB per month + machine utilization fee
RDSRDS
Managed MySQL instancesManaged MySQL instances
12. www.hypertable.org
Auto-ShardingAuto-Sharding
Splits table data into horizontal “shards”Splits table data into horizontal “shards”
Shards managed by traditional RDBMSShards managed by traditional RDBMS
(e.g. MySQL, Postgres)(e.g. MySQL, Postgres)
Automated “glue” code to handle shardingAutomated “glue” code to handle sharding
and request routingand request routing
Examples:Examples:
MongoDB, AsterData, GreenplumMongoDB, AsterData, Greenplum
14. www.hypertable.org
DynamoDynamo
Developed by Amazon.com for theirDeveloped by Amazon.com for their
Shopping CartShopping Cart
Designed for high write availabilityDesigned for high write availability
Eventually Consistent DHTEventually Consistent DHT
Implementations:Implementations:
CassandraCassandra
Project VoldemortProject Voldemort
RiakRiak
DynomiteDynomite
15. www.hypertable.org
Eventual ConsistencyEventual Consistency
Database update semantics in aDatabase update semantics in a
distributed system with data replicationdistributed system with data replication
Strong Consistency - after an updateStrong Consistency - after an update
completescompletes allall processes see the updatedprocesses see the updated
valuevalue
Eventual Consistency -Eventual Consistency - eventually alleventually all
processes will see the updated valueprocesses will see the updated value
Most well-known eventual consistencyMost well-known eventual consistency
system is DNSsystem is DNS
20. www.hypertable.org
Bigtable: the infrastructure thatBigtable: the infrastructure that
Google is built onGoogle is built on
Bigtable underpins 100+ GoogleBigtable underpins 100+ Google
services, including:services, including:
YouTube, Blogger, Google Earth, GoogleYouTube, Blogger, Google Earth, Google
Maps, Orkut, Gmail, Google Analytics,Maps, Orkut, Gmail, Google Analytics,
Google Book Search, Google Code,Google Book Search, Google Code,
Crawl Database…Crawl Database…
ImplementationsImplementations
HypertableHypertable
HBaseHBase
21. www.hypertable.org
Google StackGoogle Stack
GFSGFS - Replicates data inter-machine- Replicates data inter-machine
MapReduceMapReduce - Efficiently process data in GFS- Efficiently process data in GFS
BigtableBigtable - Indexed table structure- Indexed table structure
31. www.hypertable.org
Hypertable OverviewHypertable Overview
Massively Scalable DatabaseMassively Scalable Database
Modeled after Google’s BigtableModeled after Google’s Bigtable
High Performance Implementation (C++)High Performance Implementation (C++)
Thrift Interface for all popular High LevelThrift Interface for all popular High Level
Languages: Java, Ruby, Python, PHP, etcLanguages: Java, Ruby, Python, PHP, etc
Open Source (GPL license)Open Source (GPL license)
Project started March 2007 @ ZventsProject started March 2007 @ Zvents
39. www.hypertable.org
Data ModelData Model
Sparse, two-dimensional table with cell versionsSparse, two-dimensional table with cell versions
Cells are identified by a 4-part keyCells are identified by a 4-part key
Row (string)Row (string)
Column Family (byte)Column Family (byte)
Column Qualifier (string)Column Qualifier (string)
Timestamp (long integer)Timestamp (long integer)
Notas do Editor
Describe the 360 degree panoramic view feature of Google Maps