3. Two
big
drivers
for
NoSQL
adop&on
49%
35%
29%
16%
12%
11%
Lack
of
flexibility/
Inability
to
Performance
Cost
All
of
these
Other
rigid
schemas
scale
out
data
challenges
Source:
Couchbase
Survey,
December
2011,
n
=
1351.
3
6. Document
Databases
• Each
record
in
the
database
is
a
self-‐
describing
document
{
• Each
document
has
an
independent
“UUID”:
“ 21f7f8de-‐8051-‐5b89-‐86
“Time”:
“2011-‐04-‐01T13:01:02.42
“Server”:
“A2223E”,
structure
“Calling
Server”:
“A2213W”,
“Type”:
“E100”,
“Initiating
User”:
“dsallings@spy.net”,
• Documents
can
be
complex
“Details”:
{
“IP”:
“ 10.1.1.22”,
• All
databases
require
a
unique
key
“API”:
“InsertDVDQueueItem”,
“Trace”:
“cleansed”,
• Documents
are
stored
using
JSON
or
“Tags”:
[
“SERVER”,
XML
or
their
deriva&ves
“US-‐West”,
“API”
]
• Content
can
be
indexed
and
queried
}
}
• Offer
auto-‐sharding
for
scaling
and
replica&on
for
high-‐availability
6
9. Rela&onal
vs
Document
data
model
C1
C2
C3
C4
{
JSON
JSON
}
JSON
Rela)onal
data
model
Document
data
model
Highly-‐structured
table
organiza&on
Collec&on
of
complex
documents
with
with
rigidly-‐defined
data
formats
and
arbitrary,
nested
data
formats
and
record
structure.
varying
“record”
format.
9
10. Example:
User
Profile
User
Info
Address
Info
KEY
First
Last
ZIP_id
ZIP_id
CITY
STATE
ZIP
1
Dip)
Borkar
2
1
DEN
CO
30303
2
Joe Smith
2
2
MV
CA
94040
3
Ali
Dodson
2
3
CHI
IL
60609
4
John
Doe
3
4
NY
NY
10010
To
get
informa)on
about
specific
user,
you
perform
a
join
across
two
tables
10
11. Document
Example:
User
Profile
{
“ID”:
1,
=
+
“FIRST”:
“Dip)”,
“LAST”:
“Borkar”,
“ZIP”:
“94040”,
“CITY”:
“MV”,
“STATE”:
“CA”
}
JSON
All
data
in
a
single
document
11
12. Making
a
Change
Using
RDBMS
User
Table
Photo
Table
Country
Table
Country
TEL Country
User
ID
First
Last
Zip
ID
User
ID
3
Photo
ID
Comment
ID
Country
ID
Country
name
2
d043
NYC
001
001
USA
1
Dip)
Borkar
94040
001
2
b054
Bday
007
002
UK
2
Joe
Smith
94040
001
5
c036
Miami
001
003
Argen)na
3
Ali
Dodson
94040
001
7
d072
Sunset
133
004
Australia
5002
e086
Spain
133
4
Sarah
Gorin
NW1
002
005
Aruba
Status
Table
006
Austria
5
Bob
Young
30303
001
Country
User
ID
Status
ID
Text
ID
007
Brazil
6
Nancy
Baker
10010
001
1
a42
At
conf
134
008
Canada
4
b26
excited
007
7
Ray
Jones
31311
001
5
c32
hockey
008
009
Chile
8
Lee
Chen
V5V3M
008
12
d83
Go
A’s
001
•
•
•
5000
e34
sailing
005
•
.
•
.
130
Portugal
•
.
Affilia)ons
Table
Country
User
ID
Affl
ID
Affl
Name
ID
131
Romania
50000
Doug
Moore
04252
001
2
a42
Cal
001
132
Russia
4
b96
USC
001
50001
Mary
White
SW195
002
133
Spain
7
c14
UW
001
50002
Lisa
Clark
12425
001
8
e22
Oxford
002
134
Sweden
12
13. Making
the
Same
Change
with
a
Document
Database
{
“ID”:
1,
“FIRST”:
“Dip)”,
“LAST”:
“Borkar”,
“ZIP”:
“94040”,
“CITY”:
“MV”,
“STATE”:
“CA”,
“STATUS”:
}
,
{
“TEXT”:
“At
Conf”
}
“GEO_LOC”:
“134”
},
“COUNTRY”:
”USA”
}
JSON
Just
add
informa)on
to
a
document
13
14. Document
modeling
• Are
these
separate
object
in
the
model
layer?
Q
•
•
Are
these
objects
accessed
together?
Do
you
need
updates
to
these
objects
to
be
atomic?
• Are
mul&ple
people
edi&ng
these
objects
concurrently?
When
considering
how
to
model
data
for
a
given
applica&on
• Think
of
a
logical
container
for
the
data
• Think
of
how
data
groups
together
14
15. Document
Design
Op&ons
• One
document
that
contains
all
related
data
– Data
is
de-‐normalized
– Be]er
performance
and
scale
– Eliminate
client-‐side
joins
• Separate
documents
for
different
object
types
with
cross
references
– Data
duplica&on
is
reduced
– Objects
may
not
be
co-‐located
– Transac&ons
supported
only
on
a
document
boundary
– Most
document
databases
do
not
support
joins
15
16. Document
ID
/
Key
selec&on
• Similar
to
primary
keys
in
rela&onal
databases
• Documents
are
sharded
based
on
the
document
ID
• ID
based
document
lookup
is
extremely
fast
• Usually
an
ID
can
only
appear
once
in
a
bucket
Q
•
Do
you
have
a
unique
way
of
referencing
objects?
•
Are
related
objects
stored
in
separate
documents?
Op)ons
• UUIDs,
date-‐based
IDs,
numeric
IDs
• Hand-‐crajed
(human
readable)
• Matching
prefixes
(for
mul&ple
related
objects)
16
17. Example:
En&&es
for
a
Blog
BLOG
• User
profile
The
main
pointer
into
the
user
data
• Blog
entries
• Badge
sekngs,
like
a
twi]er
badge
• Blog
posts
Contains
the
blogs
themselves
• Blog
comments
• Comments
from
other
users
17
20. Threaded
Comments
• You
can
imagine
how
to
take
this
to
a
threaded
list
List
First
Reply
to
comment
Blog
List
comment
More
Comments
Advantages
• Only
fetch
the
data
when
you
need
it
• For
example,
rendering
part
of
a
web
page
• Spread
the
data
and
load
across
the
en&re
cluster
20
22. Rela&onal
Technology
Scales
Up
Applica)on
Scales
Out
Just
add
more
commodity
web
servers
System
Cost
Applica&on
Performance
Web/App
Server
Tier
Users
RDBMS
Scales
Up
Get
a
bigger,
more
complex
server
System
Cost
Applica&on
Performance
Won’t
scale
beyond
this
point
Rela)onal
Database
Users
Expensive
and
disrup)ve
sharding,
doesn’t
perform
at
web
scale
22
23. Couchbase
Server
Scales
Out
Like
App
Tier
Applica)on
Scales
Out
Just
add
more
commodity
web
servers
System
Cost
Applica&on
Performance
Web/App
Server
Tier
Users
NoSQL
Database
Scales
Out
Cost
and
performance
mirrors
app
)er
System
Cost
Applica&on
Performance
Couchbase
Distributed
Data
Store
Users
Scaling
out
flatens
the
cost
and
performance
curves
23
27. Performance
driven
use
cases
• Low
latency
• High
throughput
ma]ers
• Large
number
of
users
• Unknown
demand
with
sudden
growth
of
users/data
• Predominantly
direct
document
access
• Workloads
with
very
high
muta&on
rate
per
document
(temporal
locality)
Working
set
with
heavy
writes
27
28. Data
driven
use
cases
• Support
for
unlimited
data
growth
• Data
with
non-‐homogenous
structure
• Need
to
quickly
and
ojen
change
data
structure
• 3rd
party
or
user
defined
structure
• Variable
length
documents
• Sparse
data
records
• Hierarchical
data
28
29. Use
Case
Examples
Web
app
or
Use-‐case
Couchbase
Solu)on
Example
Customer
Content
and
Metadata
Couchbase
document
store
+
Elas&c
Search
McGraw-‐Hill…
Management
System
Social
Game
or
Mobile
Couchbase
stores
game
and
player
data
Zynga…
App
Ad
Targe)ng
Couchbase
stores
user
informa&on
for
fast
AOL…
access
User
Profile
Store
Couchbase
Server
as
a
key-‐value
store
TuneWiki…
Session
Store
Couchbase
Server
as
a
key-‐value
store
Concur….
High
Availability
Couchbase
Server
as
a
memcached
&er
Orbitz…
Caching
Tier
replacement
Chat/Messaging
Couchbase
Server
DOCOMO…
Plauorm
29
30. Use
Case:
Social
Gaming
Social
and
Mobile
Gaming
Types
of
Data
Applica)on
Requirements
• User
account
informa&on
• Ability
to
support
rapid
growth
• User
game
profile
info
• Fast
response
&mes
for
• User’s
social
graph
awesome
user
experience
• State
of
the
game
• Game
up&me
–24x7x365
• Player
badges
and
stats
• Easy
to
update
apps
with
new
features
Why
NoSQL
and
Couchbase
• Scalability
ensures
that
games
are
ready
to
handle
the
millions
of
users
that
come
with
viral
growth.
• High
performance
guarantees
players
are
never
lej
wai&ng
to
make
their
next
move.
• Always-‐on
opera&ons
means
zero
interrup&on
to
game
play
(and
revenue)
• Flexible
data
model
means
games
can
be
developed
rapidly
and
updated
easily
with
new
features
30
31. Use
Case:
Ad
Targe&ng
Ad
Targe)ng
Types
of
Data
Applica)on
Requirements
• User
profile:
preferences
• High
performance
to
meet
and
psychographic
data
limited
ad
serving
budget;
&me
• Ad
serving
history
by
user
allowance
is
typically
<40
msec
• Ad
buying
history
by
• Scalability
to
handle
hundreds
adver&ser
of
millions
of
user
profiles
and
rapidly
growing
amount
of
• Ad
serving
history
by
data
adver&ser
• 24x7x365
availability
to
avoid
ad
revenue
loss
Why
NoSQL
and
Couchbase
• Sub-‐millisecond
reads/writes
means
less
&me
is
needed
for
data
access,
more
&me
is
available
for
ad
logic
processing,
and
more
highly
op&mized
ads
will
be
served
• Ease
of
scalability
ensures
that
the
data
cluster
can
be
grown
seamlessly
as
the
amount
of
user
and
ad
data
grows
• Always-‐on
opera&ons
=
always-‐on
revenue.
You
will
never
miss
the
opportunity
to
serve
an
ad
because
down&me.
31
32. Use
Case:
Content
and
metadata
store
Building
a
self-‐adap&ng,
interac&ve
learning
portal
with
Couchbase
32
33. The Problem
As learning move online in great numbers
Growing need to build interactive learning environments that
0101001001
Scale!! 1101010101
0101001010
101010
Scale
to
millions
of
Serve
MHE
as
well
as
third-‐party
Including
Support
Self-‐adapt
via
learners
content
open
content
learning
apps
usage
data
33
34. The Challenge
Hmmm...this
looks
kinda
Backend is an Interactive Content
like:
+
Content
Caching
(Scale)
Delivery Cloud that must:
+
Social
Gaming
(Stats)
+
Ad
Targe<ng
(Smarts)
• Allow
for
elastic scaling
under
spike
periods
• Ability
to
catalog
&
deliver
content
from
many sources
• Consistent
low-latency
for
metadata
and
stats
access
• Require
full-text
search
support
for
content
discovery
• Offer
tunable
content
ranking & recommendation
func&ons
Experimented with a combination of:
XML
Databases
In-‐memory
Data
Grids
SQL/MR
Engines
Enterprise
Search
Servers
34
36. The Learning Portal
• Designed and built as a
collaboration between MHE Labs
and Couchbase
• Serves as proof-of-concept and
testing harness for Couchbase +
ElasticSearch integration
• Available for download and further
development as open source
code
https://github.com/couchbaselabs/learningportal!
36
38. Couchbase
Server
NoSQL
Distributed
Document
Database
for
interac)ve
web
applica)ons
2.0
38
39. Couchbase
Server
Grow
cluster
without
Easy
applica)on
changes,
without
Scalability
down)me
with
a
single
click
Consistent
sub-‐millisecond
Consistent,
High
read
and
write
response
)mes
Performance
with
consistent
high
throughput
Always
On
No
down)me
for
sowware
24x7x365
upgrades,
hardware
maintenance,
etc.
39
40. Flexible
Data
Model
{
“ID”:
1,
“FIRST”:
“Dip)”,
“LAST”:
“Borkar”,
“ZIP”:
“94040”,
“CITY”:
“MV”,
“STATE”:
“CA”
}
JSON
JSON
JSON
JSON
• No
need
to
worry
about
the
database
when
changing
your
applica&on
• Records
can
have
different
structures,
there
is
no
fixed
schema
• Allows
painless
data
model
changes
for
rapid
applica&on
development
40
42. Couchbase
Server
2.0
Architecture
8092
11211
11210
Query
API
Memcapable
1.0
Memcapable
2.0
Moxi
Query
Engine
REST
management
API/Web
UI
vBucket
state
and
replica&on
manager
Memcached
Global
singleton
supervisor
Rebalance
orchestrator
Configura&on
manager
Node
health
monitor
Process
monitor
Heartbeat
Couchbase
EP
Engine
Data
Manager
Cluster
Manager
storage
interface
New
Persistence
Layer
htp
on
each
node
one
per
cluster
Erlang/OTP
HTTP
Erlang
port
mapper
Distributed
Erlang
8091
4369
21100
-‐
21199
42
43. Couchbase
Server
2.0
Architecture
8092
11211
11210
Query
API
Memcapable
1.0
Memcapable
2.0
Moxi
Query
Engine
REST
management
API/Web
UI
vBucket
state
and
replica&on
manager
Memcached
Global
singleton
supervisor
Rebalance
orchestrator
Configura&on
manager
Node
health
monitor
Process
monitor
Heartbeat
Couchbase
EP
Engine
storage
interface
New
Persistence
Layer
htp
on
each
node
one
per
cluster
Erlang/OTP
HTTP
Erlang
port
mapper
Distributed
Erlang
8091
4369
21100
-‐
21199
43
44. Couchbase
deployment
Web
Applica&on
Couchbase
Client
Library
Data
Flow
Cluster
Management
44
45. Single
node
-‐
Couchbase
Write
Opera&on
2
Doc
1
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Doc
1
Queue
Disk
Queue
Disk
Couchbase
Server
Node
45
46. Single
node
-‐
Couchbase
Update
Opera&on
2
Doc
1’
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Doc
1
Doc
1’
Queue
Disk
Queue
Disk
Doc
1
Couchbase
Server
Node
46
47. Single
node
-‐
Couchbase
Read
Opera&on
2
Doc
1
GET
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Queue
Doc
1
Disk
Queue
Disk
Doc
1
Couchbase
Server
Node
47
48. Single
node
-‐
Couchbase
Cache
Evic&on
2
Doc
6
2
3
4
5
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Queue
Doc
1
Disk
Queue
Disk
Doc
1
Doc
6
Doc
5
Doc
4
Doc
3
Doc
2
Couchbase
Server
Node
48
49. Single
node
–
Couchbase
Cache
Miss
2
Doc
1
GET
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Queue
Doc
1
Doc
5
4
4
Doc
Doc
Doc
3
2
Doc
Disk
Queue
Disk
Doc
1
Doc
6
Doc
5
Doc
4
Doc
3
Doc
2
Couchbase
Server
Node
49
50. Cluster
wide
-‐
Basic
Opera&on
APP
SERVER
1
APP
SERVER
2
COUCHBASE
Client
Library
COUCHBASE
Client
Library
CLUSTER
MAP
CLUSTER
MAP
READ/WRITE/UPDATE
SERVER
1
SERVER
2
SERVER
3
• Docs
distributed
evenly
across
ACTIVE
ACTIVE
ACTIVE
servers
Doc
5
Doc
Doc
4
Doc
Doc
1
Doc
• Each
server
stores
both
ac)ve
and
replica
docs
Doc
2
Doc
Doc
7
Doc
Doc
2
Doc
Only
one
server
ac&ve
at
a
&me
• Client
library
provides
app
with
Doc
9
Doc
Doc
8
Doc
Doc
6
Doc
simple
interface
to
database
REPLICA
REPLICA
REPLICA
• Cluster
map
provides
map
to
which
server
doc
is
on
Doc
4
Doc
Doc
6
Doc
Doc
7
Doc
App
never
needs
to
know
Doc
1
Doc
Doc
3
Doc
Doc
9
Doc
• App
reads,
writes,
updates
docs
Doc
8
Doc
Doc
2
Doc
Doc
5
Doc
• Mul)ple
app
servers
can
access
same
document
at
same
)me
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
50
51. Cluster
wide
-‐
Add
Nodes
to
Cluster
APP
SERVER
1
APP
SERVER
2
COUCHBASE
Client
Library
COUCHBASE
Client
Library
CLUSTER
MAP
CLUSTER
MAP
READ/WRITE/UPDATE
READ/WRITE/UPDATE
SERVER
1
SERVER
2
SERVER
3
SERVER
4
SERVER
5
• Two
servers
added
ACTIVE
ACTIVE
ACTIVE
ACTIVE
ACTIVE
One-‐click
opera)on
Doc
5
Doc
Doc
4
Doc
Doc
1
Doc
• Docs
automa)cally
rebalanced
across
Doc
2
Doc
Doc
7
Doc
Doc
2
Doc
cluster
Even
distribu&on
of
docs
Minimum
doc
movement
Doc
9
Doc
Doc
8
Doc
Doc
6
Doc
• Cluster
map
updated
REPLICA
REPLICA
REPLICA
REPLICA
REPLICA
• App
database
Doc
4
Doc
Doc
6
Doc
Doc
7
Doc
calls
now
distributed
over
larger
number
of
Doc
1
Doc
Doc
3
Doc
Doc
9
Doc
servers
Doc
8
Doc
Doc
2
Doc
Doc
5
Doc
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
51
52. Cluster
wide
-‐
Fail
Over
Node
APP
SERVER
1
APP
SERVER
2
COUCHBASE
Client
Library
COUCHBASE
Client
Library
CLUSTER
MAP
CLUSTER
MAP
SERVER
1
SERVER
2
SERVER
3
SERVER
4
SERVER
5
• App
servers
accessing
docs
ACTIVE
ACTIVE
ACTIVE
ACTIVE
ACTIVE
• Requests
to
Server
3
fail
Doc
5
Doc
Doc
4
Doc
Doc
1
Doc
Doc
9
Doc
Doc
6
Doc
• Cluster
detects
server
failed
Promotes
replicas
of
docs
to
Doc
2
Doc
Doc
7
Doc
Doc
2
Doc
Doc
8
Doc
Doc
ac&ve
Updates
cluster
map
Doc
1
Doc
3
• Requests
for
docs
now
go
to
REPLICA
REPLICA
REPLICA
REPLICA
REPLICA
appropriate
server
Doc
4
Doc
Doc
6
Doc
Doc
7
Doc
Doc
5
Doc
Doc
8
Doc
• Typically
rebalance
would
follow
Doc
1
Doc
Doc
3
Doc
Doc
9
Doc
Doc
2
Doc
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
52
53. Indexing
and
Querying
APP
SERVER
1
APP
SERVER
2
COUCHBASE
Client
Library
COUCHBASE
Client
Library
CLUSTER
MAP
CLUSTER
MAP
Query
SERVER
1
SERVER
2
SERVER
3
• Indexing
work
is
distributed
ACTIVE
ACTIVE
ACTIVE
amongst
nodes
Doc
5
Doc
Doc
5
Doc
Doc
5
Doc
• Large
data
set
possible
Doc
2
Doc
Doc
2
Doc
Doc
2
Doc
• Parallelize
the
effort
Doc
9
Doc
• Each
node
has
index
for
data
stored
Doc
9
Doc
Doc
9
Doc
on
it
REPLICA
REPLICA
REPLICA
• Queries
combine
the
results
from
Doc
4
Doc
required
nodes
Doc
4
Doc
Doc
4
Doc
Doc
1
Doc
Doc
1
Doc
Doc
1
Doc
Doc
8
Doc
Doc
8
Doc
Doc
8
Doc
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
53
54. Cross
Data
Center
Replica&on
(XDCR)
SERVER
1
SERVER
2
SERVER
3
ACTIVE
ACTIVE
ACTIVE
COUCHBASE
SERVER
CLUSTER
Doc
Doc
Doc
NY
DATA
CENTER
Doc
2
Doc
Doc
Doc
9
Doc
Doc
RAM
RAM
RAM
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
DISK
DISK
DISK
SERVER
1
SERVER
2
SERVER
3
ACTIVE
ACTIVE
ACTIVE
Doc
Doc
Doc
Doc
2
Doc
Doc
Doc
9
Doc
Doc
RAM
RAM
RAM
COUCHBASE
SERVER
CLUSTER
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
SF
DATA
CENTER
DISK
DISK
DISK
54