2. whoami
• Sunny Gleason, human
• passion: distributed systems engineering
• previous...
Ning : custom social networks
Amazon.com : infra & web services
• now...
building cloud infrastructure
4. what’s in this presentation?
• MySQL & NoSQL as Inspiration
• HailDB & InnoDB
• JNA: Integration with Java
• St8 : A REST-Enabled Data Store
• A Handful of Nifty Applications
• Results & Next Steps
5. prior art
• Mad props to:
• MySQL & InnoDB teams for creating InnoDB
and Embedded InnoDB
• Stewart Smith & Drizzle folks for leading the
HailDB charge and encouraging plugin apis
• Nokia & Percona for publishing results of their
Voldemort / MySQL integration
• Basho for publishing Riak / InnoStore integration
6. MySQL & InnoDB
• Super-Efficient Database Server
• Tried & True Replication
• Bulletproof Durability (when configured
correctly)
• Fantastic Stability, Predictability & Insight
into Operation
7. motivation
• database on 1 box : ok
• database with master/slave replication : ok
• database on cluster : tricky
• database on SAN : scary
8. NoSQL
• “Not Only” SQL
• What’s the point?
• Proponent: “reaching next level of scale”
• Cynic: “cloud is hype, ops nightmare”
9. what does it gain?
• Higher performance, scalability, availability
• More robust fault-tolerance
• Simplified systems design
• Easier operations
10. what does it lose?
• Reduced / simplified programming model
• No ad-hoc queries, no joins, no txns
• Not ACID: Weakened Atomicity /
Consistency / Isolation / Durability
• Operations / management is still evolving
• Challenging to quantify health of system
• Fewer domain experts
11. NoSQL Map
KV Stores
(volatile) Memcached,
Redis
KV Stores Dynamo,
Key-Value (durable) Voldemort,
Store
Riak
Document
Store
NoSQL CouchDB,
MongoDB
Column
Store Cassandra,
BigTable,
HBase
Graph
Neo4J
Store
12. durable vs. volatile
• RAM is ridiculous speed (ns), not durable
• Disk is persistent and slow (3-7ms)
• RAID eases the pain a bit (4-8x throughput)
• SSD is providing good promise (100-300us)
• FusionIO is redefining the space (30-100us)
13. performance &
operational complexity*
+ Sharding
Complexity
+FusionIO
+SSD
MySQL Voldemort +Cluster
Memcached
1K 10K 100K 1M
Aggregate Operations / Sec
* This is not a real graph
14. just a thought...
What if we could use the highly optimized &
durable ‘guts’ of MySQL without having to go
through JDBC & SQL?
15. enter HailDB
• use case:Voldemort Storage Engine
• let’s evaluate relative to other NoSQL
options
• focus on stability & predictability of
performance
• Graphs are throughput (ops/sec) vs. time
18. BDB-JE
• Log-Structured B-Tree
• Fast Storage When Mostly Cached
• Configured without fsync() by default -
writes are batched and flushed periodically
20. Krati
• Fast Hash-Oriented Storage
• Uses memory-mapped files for speed
• Configured without fsync() by default -
writes are batched and flushed periodically
23. HailDB & Java
• g414-haildb : where the magic happens
• Open Source on GitHub
• uses JNA: Java Native Access
• dynamic binding to libhaildb shared library
• auto-generate initial Java class from .h file
(w/ JNAerator)
• Pointer classes & other shenanigans
24. implementation gotchas
• InnoDB API-level usage is unclear
• Synchronization & locking is unclear
• Therefore... I learned to love reading C
• Error handling is *nasty*
• Native library installation a bit of a pain
(need to configure LD_LIBRARY_PATH)
26. St8 Server
• HTTP-enabled Access to HailDB
• PUT /1.0/t/mytable
{
"columns":[
{"name":"a","type":"INT","length":4},
{"name":"b","type":"INT","length":8},
{"name":"c","type":"BLOB","length":0},
],
"indexes":[
{
"name":"P",
"clustered":true,"unique":true,
"indexColumns":[{"name":"a"}]
}
]
}
27. rest-enabled access
• GET /1.0/d/mytable;a=0
• POST /1.0/d/mytable;a=1;b=42;c=xyz
• PUT /1.0/d/mytable;a=1;b=43;c=abc
• DELETE /1.0/d/mytable;a=0
*This is matrix-param style, can also use form
data style for specifying data
28. cursors & iterators
• GET /1.0/i/mytable.P?q=a+ge+4
• GET /1.0/i/mytable.SecIndex?q=b+le+4
• GET /1.0/i/mytable.SecIndex?q=b+le+4
&s=abce1212121ceeee2120911
• “s” value is opaque index key of next page
of results - way better than LIMIT/OFFSET!
(since HailDB can seek directly to the row)
29. result
• REST API provides fun, straightforward
access from Ruby, Python, Java, Command-
line...
• very easy benchmarking with HTTP-based
performance tools
• range query support, and more efficient
iteration model for large result sets than
MySQL provides
30. high-performance counts
• GET /1.0/counts/mykey
0
• POST /1.0/counts/mykey[?inc=1]
1
• POST /1.0/counts/mykey?inc=42
43
• DELETE /1.0/counts/mykey
31. counts schema
• HailDB count service schema
_id int 8-byte unsigned,
_key_hash int 8-byte unsigned,
_key varchar(80),
_count int 8-byte unsigned
primary key (“_id”)
unique key (“_key_hash”, “key”)
36. operation: graph store
• Social networks, recommendations, any
relation you can think of
• Which would you prefer?
• SQL adjacency list, stored procedure,
custom storage engine, external
(Memcached), ...
• Graph-aware HailDB application in Java
39. nifty recovery tool
(Just an idea)
• for recovery: shut down mysql server
• run HailDB-enabled recovery tool
• export as JSON or whatever
40. wrap-up
• HailDB & InnoDB are phenomenal
• With g414-haildb, can be integrated directly
into applications running on the JVM
• All the InnoDB tuning tricks apply
• Opens up new applications that are tricky
with a traditional SQL database
45. InnoDB tuning
• Skinny columns, skinny rows! (esp. Primary Key)
• Varchar enum ‘bad’, enum, int or smallint ‘good’
• fixed-width rows allow in-place updates
• Use covering indexes strategically
• More data per page means faster index scans,
more efficient buffer pool utilization
• You only get so many trx’s (read & write) on given
CPU/RAM configuration - benchmark this!
• Strategically offload reads to Memcached/Redis
48. online backup
• hot backup of data to other machine /
destination
• test Percona Xtrabackup with HailDB
• next step: backup/export to Hadoop/HDFS
(similar to Cloudera Sqoop tool)
49. JNI bindings
• JNI can get 2-5x perf boost vs. JNA
• ... at the expense of nasty code
• Will go for schema optimizations and
InnoDB tuning tips *first*