Code camp2012

Big Data and NoSQL
Landscape
Sanjeev Mishra
Silicon Valley Code Camp 2012

Sanjeev Mishra SVCC 2012

Timeline
• 1970s – Genesis of modern db
• Modeling the world based on relational
calculus: best for managing uniform data
• 1980s
• RDBMS takes over the world
• 1990s – 2000+
• Invention of HTML
• Spread of Web based technologies


Need for Modern Data Storage
• Amazon
• Managing: Shopping carts, Seller Lists, Customer Preferences,
Sales Rank, Recommendations

• Google
• Storing and managing web scale data

• Facebook
• Managing social graphs

• LinkedIn, Twitter and others


Data Explosion Current
• Every two days now we
create as much
information as we did
from the dawn of
civilization up
until 2003 - about 5
exabytes (1K PB) of
data: Eric Schmidt *


Data Explosion Future

• A telescope planned to be finished
in 2024 will generate more data
in a single day than the entire
Internet.*


What is Big Data?

• Terabytes(TB) is not big data, petabytes
(PB) (1000 TB) may be.

• Current definition of big data: zettabytes
(1M PB or 1G TB)


Nature of Big Data
Web 2.0 kind of data

• Different from traditional RDBMS/Warehouse
data – more reads less updates
• User Generated Content – Tweets, Reviews,
Comments etc…
• Lots of updates and lots of reads
• Scale to millions of users
• Not necessarily Transactional
• Compromised consistency


Data Explosion, So What?
• Structural issues
• The dynamic nature of data
• Performance issues
• Insertion
• Search
• Scaling Horizontally
• Dozens or hundreds of machines to operate as single
server


What is NoSQL?
Not Only SQL or Not Relational

• Carlo Strozzi used it in 1998 and then Eric Evans in 2009

• Simple call level interface (SQL not supported)

• Flexible schema

• Efficient use of distributed indexes

• Horizontally scaling of operations over many server
• No ACID but BASE (Basically Available, Soft state*,
Eventually consistent**)

CAP Theorem (Brewer’s Theorem)*

A distributed system can satisfy any two of
following three guarantees at any time

o Consistency (all nodes see the same data at the same
time)

o Availability (a guarantee that every request receives a
response about whether it was successful or failed)

o Partition tolerance (the system continues to operate
despite arbitrary message loss or failure of part of the
system)

Eventual Consistency Flavors
• Causal consistency
o changes are notified through events, the receiving
session will always see the updated value.
• Read your own writes
o a session that updates the db will immediately see the
changes.
• Monotonic consistency*
o once a session reads a value will never see an earlier
value.


Consistency Tradeoffs

Where,
o N is # of copies of each data that db maintains
o R is # of copies that is read for each read
o W is # of copies that must be written for each write

• Most NoSQL use N>W>1: More than one write must
complete but not all nodes need to update immediately.

Column Vs Row Storage


Row vs. Column Oriented DB
Id First name Last name SSN DOB

1 John Doe 111-222-3333 8/12/1968

2 Jane Doe 111-332-3408 4/3/1972

Row oriented Column oriented
1 1
John 2
Doe John
111-222-3333 Jane
8/12/1968 Doe

2 Doe

Jane 111-222-3333

Doe 111-332-3408

111-332-3408 8/12/1968

4/3/1972 4/3/1972

Contrasting Operations on Row vs Col DB
Insert a new tuple

1
1
2
John
3
Doe
John
111-22-3333
8/12/1968
Jane
Foo
2
Doe
Jane
Doe
Doe
111-32-3408 Bar
4/3/1972 111-22-3333

3 111-32-3408

Foo 237-23-3924

Bar 8/12/1968
237-23-3924 4/3/1972
2/3/1978 2/3/1978

Create a new attribute

1 1
John 2
Doe John
111-22-3333 Jane
8/12/1968 Doe
408-555-1212
Doe
2 111-22-3333
Jane 111-32-3408
Doe 8/12/1968
111-32-3408 4/3/1972
4/3/1972 408-555-1212

650-555-2323 650-555-2323


Get all who were born in a given year

Easy, just pick all rows where year Not so simple, scan the years and
of DOB matches the given year remember the indexes of all
occurrences that match given year
and extract based on these
indexes

Get sum of all years

Little difficult, data does not live Easy, the data is found
consecutively so scanning through consecutively
entire dataset needed


Glossary
• Consistent Hashing (Cassandra, Dynamo)
o the output range of a hash function is treated as a fixed circular space or “ring” (i.e. the

•
largest hash value wraps around to the smallest hash value)
Vector Clock (Cassandra, Riak, Dynamo)
o an algorithm for generating a partial ordering of events in a distributed system and

•
detecting causality violations

•
Quorum (Cassandra, Dynamo (sloppy))
Merkle Tree (Cassandra, Riak, Dynamo)
o a hash tree where leaves are hashes of the values of individual keys. Parent nodes higher in
the tree are hashes of their respective children. The principal advantage of Merkle tree is
that each branch of the tree can be checked independently without requiring nodes to

•
download the entire data set
Anti-Entropy Gossip Protocol (Cassandra, Dynamo)
o comparing all the replicas of each piece of data that exist and updating each replica to the

•
newest version
Order preserving partitioning (Cassandra, MongoDB)


Glossary
• MVCC
o
•
multi version concurrency control
Atomicity
o
•
all or nothing
Consistency
o
•
each transaction leaves the db in valid state
Isolation
o
•
concurrent execution of txn results into a state that is obtained if txn were executed serially
Durability
o committed txn remain so even in the event of power loss, crashes or errors

• WAL
o Write ahead logging – changes are written to a log before they are applied (Durability)

• Eventually consistent
o sufficiently long quiet period all updates can be expected to propagate eventually through
the system and all replicas will be consistent


Glossary
• Sharding
o horizontal partitioning of data, storing records on different servers according to some key
• Tuple
o row in RDBMS, predefined schema.
• Document
o contains nested document or lists as well as scalar values. No predefined schema.
• Extensible Record
o hybrid between Tuple and Document, families of attributes defined in a schema but attributes
can be added on a per record basis.
• Key-value Stores
o stores values indexed by a user defined key.
• Document Stores
o indexed document store
• Extensible Record Stores aka Wide Column Stores
o Stores extensible records partitioned vertically and horizontally across nodes.


NoSQL Categories
• Key-value Stores
o Stores values indexed by a user defined key.

• Document Stores
o Indexed document store

• Extensible Record Stores (Column Stores)
o Stores extensible records partitioned vertically and
horizontally across nodes.

• Graph Databases

Key-Value Stores


Key-Value Stores
• A distributed cache/Hashtable
o Inspired by Amazon Dynamo
o like memcached with
o persistence, replication, versioning, locking, transactions,
sorting etc.
o get/put and lookups
o No secondary indices or keys
o Values are BLOBs or in some cases JSON document
o Scalability through key distribution over nodes


Key-Value Stores
• Riak (Erlang/Basho/Apache)
• Membase (C+Erlang/Couchbase/Apache)
• Project Voldemort (Java/LinkedIn/Apache)
• Redis (C/VMWare/BSD)
• Scalaris (Erlang/Zuse+onScale/Apache)
• Tokyo Cabinet (C/Fal Labs/LGPL)
• Dynamo (Java/For Amazon internal use)

There are others
Key Value / Tuple Store at http://nosql-database.org/


Amazon Dynamo
• KV Store Developed by Amazon to support
o Best Seller Lists
o Shopping carts
o Customer Preferences
o Session Management
o Sales Rank
o Product Catalog etc...
• Variation of Consistent Hashing based Data
Partitioning and Replication
• Dynamic add/delete of Storage Nodes
• Each service uses distinct instance of
Dynamo

Amazon Dynamo Cont...
• Key/Value are opaque byte[]. ID= 128-
bit MD5 hash of the Key
• “always writeable” where no updates are
rejected due to failures or concurrent writes
• Simple Read/Write - get/put - operation on
data uniquely identified by a key, value is
binary object (BLOB)
o get(key): single or a list (conflicts with context)
o put(key,context,object)

• Eventual consistency with no isolation
guarantees

RIAK
• Developed in Erlang by Basho
• Clients:Python, Javascript, Java, PHP, Erlang
• Dynamo inspired Open-Source
o Advanced K/V and
o Document Store (not a full featured document store)
• Replication and sharding by primary key hash
o Consistent Hashing
o De-Centralized (No-Master node)
• Eventually consistent
o Tunable number of replicas for read and write
o Tunable per-read and per-write
o Different parts of application can choose different trade
offs Sanjeev Mishra SVCC 2012

Project Voldemort
• Java based advanced Key/Value store
• Developed at LinkedIn
• Open source, Apache license
• Supports MVCC for updates
• Replicas are updated asynchronously - up-to-
date view guaranteed if majority of replicas read
• Uses optimistic locking for consistent multi-
record updates
• Versions are ordered based on Vector clocks
• More info: http://www.project-voldemort.com/voldemort/


Document Stores


Document Stores
• Data more complex than that in K/V stores
• Data encapsulated and encoded in
o JSON, XML, YAML, BSON or some other standard format
• Multiple types of documents per database
o Documents of similar type grouped together
o Optional metadata/schema for the document
o Less rigid schema than that of RDBMS
• Nested documents or collection
• Secondary indexes
• Complex query/update support
o Multiple attributes, collections etc

Document Example
{
"when": "2011-09-19T02:10:11.3Z",
"author": "alex",
"title": "No Free Lunch",
"text": "This is the text of the post. It could be very long.",
"tags": [ "business", "ramblings“ ],
"votes": 5,
"voters": ["jane“, "joe", "spencer", "phyllis", "li”],
"comments": [
{
"who": "jane",
"when": "2011-09-19T04:00:10.112Z",
"comment": "I agree."
},
{
"who": "meghan",
"when": "2011-09-20T14:36:06.958Z",
"comment": "You must be joking. etc etc ..."
}
]
}


Document Stores
• MongoDB (C/10Gen/AGPL)
• Apache CouchDB (Erlang/Apache)
• Amazon SimpleDB (Erlang/Amazon)
• Terrastore (Java/Terracota/Apache)
• RavenDB (C#/HibernatingRhino/AGPL)

There are others
Document Store at http://nosql-database.org/


MongoDB


MongoDB
huMongous

• Document format: BSON (Binary JSON)
• Supports nested documents
• Documents are grouped in Collections
• Supports secondary indexes
• Scalability – auto sharding
• Consistency – Tunable based on request
(WriteConcerns)
• Replication – replica set – master – slave
• Atomicity – document level

MongoDB
Data Type SQL MongoDB
String Integer create table users db.createCollections(“users”)
(name varchar(128), age number)
Boolea Double
insert into users values („bob‟,32‟) db.users.insert
Null Array
({name:”bob”, age:32})
Object ObjectId
select * from user db.users.find()
Binary Regex
Code select name, age from users db.users.find
({}, {name:1, age:1,_id:0})
select name, age from users where age db.users.find
=32 ({age:32}, {name:1, age:1})
SQL MongoDB select * from user db.users.find().sort({name:1})
Database Database order by name asc
Table Collection select * from user db.users.find().skip(20).limit(10)
limit 10 offset 20
Index Index
select distinct name from user db.users.distinct(“name”)
Row Document
Column Field select count(*) from user db.users.count()

Join Embedding or
update users set age =39 where name = db.users.update({name:”bob”},
Linking
„bob‟ {$set:{age:33}}, false, true)
Primary _id delete from users where name=„bob‟ db.users.remove({name:”bob”})
Key


Extensible Record
Stores
aka
Column Stores


Extensible Record Stores
Column Stores

• Motivated by Google BigTable
• Basic Data Model – Rows and Columns
• Scale by splitting rows and columns over
multiple nodes
o Rows split by sharding on primary key – split
by range rather than hash function
o Columns split by column groups


Extensible Record Stores
• Cassandra (Java/Facebook/Apache)
• Marriage of Dynamo and BigTable

• HBase (Java/Yahoo/Apache)
• Inspired by BigTable, used HDFS for storage

• HyperTable (C/Zvent/GPL)
• Similar to HBase/BigTable

• Accumulo (Java/NSA/Apache)
• Uses Hadoop, ZooKeeper, and Thrift, cell level access control

• Google BigTable (Internal to Google)

There are others
Wide Column Store at http://nosql-database.org/

Cassandra


Cassandra Features
• Decentralized
o Data is distributed across cluster of nodes
o No master, any node can address any request
o No single point of failure
• Fault-tolerant (Configurable replication strategies)
o Simple Strategy (first determined by partitioner, rest
on other nodes clockwise)
o Network Topology Strategy: multi datacenter strategy


Cassandra Features Cont…
• Failure detection and recovery
o Based on Gossip protocol
o Node state updated based on gossip message version
o Per-node heartbeat threshold
• Tunable consistency
o Can be configured per read/write


Cassandra
Data Type SQL Cassandra QL
ascii int create database codecamp CREATE KEYSPACE codecamp WITH
strategy_class =
float decimal
„NetworkTopologyStrategy‟ AND
boolean bigint strategy_options:DC1=3
double varchar create table users CREATE COLUMNFAMILY users (key
(key varchar(128), name varchar PRIMARY KEY, name
counter timestamp
varchar(128), age number) varchar, age int)
uuid text
create index idx_name ON CREATE INDEX idx_name ON
blob varint users(name) users(name)
insert into users values („bob‟, „Bob‟,32‟) INSERT INTO users
(KEY, name, age)
SQL Cassandra VALUES(„jdoe‟,‟Jane Doe‟, 39)
Database Keyspace
select name, age from users SELECT name, age FROM users
Table Column Family where age>30 WHERE age>30

Index Index update users set age = 35 UPDATE users SET age=35
where name = „bob‟ WHERE name=„bob‟
Row Row
delete from users where DELETE FROM users where KEY =
Column Column key=„bob‟ „bob‟
DELETE age FROM users where
Join KEY=„alice‟

Primary Key Primary Key drop table users DROP COLUMNFAMILY users

drop database codecamp DROP KEYSPACE codecamp


Cassandra
Column and Column Family
Column Super Column
name:byte[] Name: byte[]
Value: Collection of Columns
value:byte[]

timestamp Super Column
name: homeaddress
Column
name:”userid” value:

value:”jdoe” name: “street” name: ”city” name: “zip”
value: “555 Homestead Rd” value:“Sunnyvale” value: “95051”
Timestamp: timestamp:… timestamp:… timestamp:…

Row
Row
Column Column Column
Key
name: “userid” name: “name” name: “age”
jdoe value: “jdoe” value: “Jane Doe” value: 33
Column timestamp:… timestamp:…= timestamp:…
Family name: “userid” name: “name” name: “age”
ladams value: “ladams” value: “Larry Adam” value: 47
timestamp:… timestamp:…= timestamp:…
name: “userid” name: “name” name: “age”
bdole value: “bdole” value: “Bob Dole” value: 67
timestamp:… timestamp:…= timestamp:…

Cassandra Keyspace
Analogous to database in RDBMS

• Contains one or more Column Families
analogous to tables in RDBMS
• Column Family contains columns
• A Row Key identifies a set of related columns
• A Row is not required to have same set of
columns
• No join between two column families:
o Each column family is self contained to serve a query
o A rule of thumb - one column family per query for
better performance
• Replication is controlled on per-keyspace basis

Cassendra In Enterprise
• Netflix, Twitter, Urban Airship, Constant
Contact, Reddit, Cisco, OpenX, Rackspace,
Ooyala, and many more
• The largest Cassandra cluster has over 300
TB of data in over 400 machines


HBase
• Design influenced by Google BigTable
• A type of NoSQL – more a data store than data base, lacks many
RDBMS features such as
• Typed column, secondary indexes, triggers, advanced query language etc.
• Build on top of HDFS: Data is stored in HDFS as indexed
“StoreFiles”
• Strongly consistent R/W not “eventually consistent” – suitable for
counter aggregation
• Auto Sharding
• Auto Region Server Failover
• Out of the box support for Hadoop/HDFS
• Can be used as Source and/or Sink for MapReduce
• Java, Thrift/REST client
• Support Block Cache and Bloom Filters for high volume query
optimization
• Web management tool and JMX support

NoSQL Growth Trends


Big Data and NoSQL
Landscape


Code camp2012

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)

Code camp2012