2. 2
Tonight
⢠Membase Overview
⢠Use Cases and Deployment Examples
⢠Membase Architecture
⢠Demo!
⢠Developing with Membase
⢠A Glimpse into the Future
4. Before: Application scales linearly, data hits wall
Application Scales Out
Just add more commodity web servers
Database Scales Up
Get a bigger, more complex server
4
5. Membase is a distributed database
5
Membase Servers
In the data center
Web application server
Application user
On the administrator con
6. Built-in Memcached Caching Layer
6
Memcached
Membase Database
Memcached
Membase Database
Memcached Mode Membase Mode
Membase development team has contributed over half of the
source code to the Memcached project.
7. Deployment options
7
application
logic
OTC
memcached
client
data operations
application
logic
OTC
memcached
client
data operations
cluster operations
11211
server
list
OTC Memcached Server
11211
Membase Server
server
list
proxy vbucket
map
application
logic
OTC
memcached
client
Membase Server
localhost
proxy
vbucket
map
application
logic
NEW
memcached
client
Membase Server
vbucket
map
Embedded proxy Standalone proxy âvBucket-awareâclient
Deployment Option1 Deployment Option2 Deployment Option3
11210
data operations
cluster operations
11211
proxy vbucket
map
11210
data operations
cluster operations
11211
proxy vbucket
map
11210
8. Secure multitenant support
8
Membase data servers
In the data center
Web application server
Application user
On the administrator con
Bucket 1
Bucket 2
Aggregate Cluster Memory and Disk Capacity
9. Five minutes or less to a working cluster
⢠Downloads for Linux and Windows
⢠Start with a single node
⢠One button press joins nodes to a cluster
Easy to develop against
⢠Just SET and GET â no schema required
⢠Drop it in. 10,000+ existing applications
already âspeak membaseâ (via memcached)
⢠Practically every language and application
framework is supported, out of the box
Easy to manage
⢠One-click failover and cluster rebalancing
⢠Graphical and programmatic interfaces
⢠Configurable alerting
Membase is Simple, Fast, Elastic
9
10. Membase is Simple, Fast, Elastic
10
Predictable
⢠âNever keep an application waitingâ
⢠Quasi-deterministic latency and throughput
Low latency
⢠Built-in Memcached technology
⢠Auto-migration of hot data to lowest latency
storage technology (RAM, SSD, Disk)
⢠Selectable write behavior â asynchronous,
synchronous (on replication, persistence)
High throughput
⢠Multi-threaded
⢠Low lock contention
⢠Asynchronous wherever possible
⢠Automatic write de-duplication
11. Membase is Simple, Fast, Elastic
11
Zero-downtime elasticity
⢠Spread I/O and data across commodity servers
(or VMs)
⢠Consistent performance with linear cost
⢠Dynamic rebalancing of a live cluster
All nodes are created equal
⢠No special case nodes
⢠Clone to grow
Extensible
⢠Filtered TAP interface provides hook points for
external systems (e.g. full-text search, backup,
warehouse)
⢠Data bucket â engine API for specialized
container types
⢠Membase NodeCode [FUTURE]
12. Leading cloud service (PAAS)
provider
Over 65,000 hosted applications
Membase Server supporting
over 3,000 Heroku customers
Proven at small, and extra large scale
12
Social game leader â FarmVille,
Mafia Wars, CafĂŠ World
Over 230 million monthly users
Zynga is a core contributor to and
large scale user of Membase
Server
13. After: Data layer scales like application logic layer
Data layer now scales with linear cost and constant performance.
Application Scales Out
Just add more commodity web servers
13
Database Scales Out
Just add more commodity data servers
Scaling out flattens the cost and performance curves.
Membase Servers
14. Membase - A practical path to âNoSQLâ adoption
14
16. 17
Leading cloud service (PAAS)
provider
Over 65,000 hosted applications
Membase Server serving over
1,200 Heroku customers (as of
June 10, 2010)
Deployments Leading Membase
Social game leader â FarmVille,
Mafia Wars, CafĂŠ World
Over 230 million monthly users
⢠Membase Server
is the 500,000 ops-per-second
database behind FarmVille and
CafĂŠ World
17. Use case â Ad targeting
18
events
profiles, campaigns
profiles, real time campaign
statistics
40 milliseconds to come
up with an answer.
2
3
1
20. Largest integrated sharing network
We make sharing simple, engaging & valuable
Powerful Social Analytics & Audience Monetization
About ShareThis
450/mo
million
consumers
~850
thousand sites
50+
social channels
21. This is how it works
Log Files
Search Keywords
Page Views
Sharing Behavior
HDFS
Map/Reduce
Content Analysis
Taxonomy
Ad Server
User Membase
2
24. 25
Clustering
⢠Underlying cluster
functionality based on
erlang OTP
⢠Have a custom, vector
clock based way of storing
and propagating...
â Cluster topology
â vBucket mapping
⢠Collect statistics from many
nodes of the cluster
â Identify hot keys, resource
utilization
25
26. 27
TAP
⢠A generic, scalable method of streaming mutations
from a given server
â As data operations arrive, they can be sent to arbitrary TAP
receivers
⢠Leverages the existing memcached engine interface,
and the non-blocking IO interfaces to send data
⢠Three modes of operation
28. 30
Data buckets are secure membase âslicesâ
Membase data servers
In the data center
Web application server
Application user
On the administrator console
Bucket 1
Bucket 2
Aggregate Cluster Memory and Disk Capacity
30. 34
Disk > Memory
Dataset may have many
items infrequently accessed.
However, memcached has
different behavior (LRU) than
wanted with membase.
Still, traditional (most)
RDBMS implementations are
not 100% correct for us either.
The speed of a miss is very,
very important.
34. 38
Key-Value
(with a replica)
Items have:
Key
Value
Expiration
Flags
CAS (more on this later)
Operations include:
Get/Set
Increment/Decrement
Append/Prepend
35. 39
Membase Datatypes
⢠byte[]
â Does your data have 1s
and 0s?
âAny customer can have
a car painted any colour
that he wants so long as
it is black.â
⢠Items do have flags
â Many clients use flags
â Data type options
⢠Google protobuf
⢠Thrift
⢠Avro
36. 40
Transactions
⢠Lock == slow me down
⢠CAS operations
â Optimistic locking
⢠Very useful with complex
datatypes
â Imagine two clients trying to
update a complex item
⢠Youâre likely using CAS
already... if you use a CPU
User 1 User 2
37. 41
Common Use: Sessions
⢠Web user sessions
â Highly read, less writes in many case
â Protocol advantage of memcached
⢠Options already for PHP, Ruby and Java
⢠Application state
â Not necessarily âentityâ style things
â May be appropriate for a âcacheâ pool
38. 42
Common Use (cache): Rate Limiting
⢠Want to provide API calls
into the system
â Twitter search
â Google search services
⢠Use the atomic increment
â Set an item with a unique ID
â Upon API request,
increment and check
⢠HTTP 420: go away and come
back later
Your Users
Your App
40. 44
Beyond key-value
⢠Indexing/Range Queries
⢠Advanced Data Structures
⢠Sub-object direct manipulation
Validation and In-flight transformation
⢠Block mutations failing validation
⢠Enrich or transform objects
Connectors (Integrate easily with other systems)
⢠Solr
⢠Hadoop
⢠MySQL
NodeCode â Motivation
41. 45
NodeCode - What is it?
Method for extending & customizing Membase
Separate code modules
Defined interface to datapath and cluster manager
Notification on events
⢠Synchronous
⢠Asynchronous
42. 46
Simple
⢠Packaged modules for easy install and enable
⢠Library of âoff the shelfâ modules
⢠Module monitoring
⢠Straight forward development and debugging
Fast
⢠Low latency/high-throughput
⢠Per-bucket process isolation
⢠Donât break data manager performance/correctness
Elastic
⢠Automatically migrate and instantiate on rebalance
⢠Provide support for migration of internal data
⢠Leverage native Membase engine for internal data storage
NodeCode â Drivers
Get better channels at bottom myspace, forbes
About ShareThis, stats/metrics consumer facing company but we collect dataâŚscale...
Data used as signals for user model
Sharing
Social Network used
Search Keywords
Views
Membase stores user cookie -> segments
What we offer display campaigns that are audience targeted,
Example of ads, content, different partners etc.
Weâre very serious about simplicity. By being based on the memcached protocol, membase is already compatible with HUGE number of languages, frameworks and even applications. The verbs the clients in those languages expose are very simple. Set, get, atomic increment/decrement, append and prepend.
membase has persistence and is designed for distributed environments, meaning we need to replicate the data. Keeping to the simple interface and client compatibility while making it simple to define rules about persistence and replication of data items and, as importantly make it simple to grow your capacity while maintaining consistency.
Without changes, many applications can directly use membase as a K/V store