2. What’s in this Preso
• What is InnoDB?
• Relation to MySQL & Other Products
• InnoDB Model
• g414-inno: a Java Access Library for InnoDB
3. What else is in this Preso
• Creating a Voldemort Storage Engine with
Embedded InnoDB
• St8: A REST-based Storage Server
• Faban Benchmark Results
4. What is InnoDB?
• High-Performance “guts” of MySQL
• Finely Tuned B-Tree Storage Engine
• MVCC Transactional Store a la Jim Gray
(“Transactional Processing Systems”)
• Available Stand-Alone as Embedded
InnoDB (stagnant) or HailDB (drizzle)
5. Relation to MySQL
• One of many MySQL storage engines
• Transactional, in contrast to MYISAM
• Well-known, Bullet-Proof Backup, Failure &
Recovery Modes
• Advanced Buffer Pool Management
(adaptive hash index, tunable LRU)
• Online Backup Support (Xtrabackup / Hot)
6. Other Products
• Tokyo BDB, Oracle BDB & BDB-JE
• Schema-Free (No Structure / Data Types)
• Lower Concurrency (fewer writers)
• Performance Degradation in Larger DBs
• (TODO: quantify performance gap - in
meantime, see Dynamo & Voldemort)
7. InnoDB Model (Logical)
• Database == Tablespace
• Tablespace has Table(s) and Log(s)
• Table has columns (rich datatypes)
• Tables have a PRIMARY clustered index
• Tables may have SECONDARY indexes
• Row == Tuple
• Tuples are stored / clustered by index sort
• Secondary index stores full Primary Key
8. InnoDB Model (Txns)
• Everything uses a Transaction
• Isolation Level: Serialized, Read Committed, Read
Uncommitted
• Locks: Shared (Read-only), Exclusive (Read/Write)
• Cursors provide access to tables: Lookup by index,
Iteration / Traversal
• Secondary index contains partial Tuples
• Secondary cursor can access primary (full tuple)
9. InnoDB Model (Physical)
• Tablespace is a collection of pages (16K)
• Pages organized as a B-Tree: infimum &
supremum keys, pointers to children
• Pages contain row or index tuple data, or
blob overflow data
• Pages written to log first and flushed to
tablespace based on ‘sync’ policy
10. Physical Considerations
• New pages requested from OS in extend_size increments
• OS Assigns space from file system / partition “free list”
• Temporal Locality (pages close together)
• Spatial Locality / Fragmentation from Updates
• Prefer “narrow” rows / indexes: faster scan, keeps
working set in-memory
• Secondary “covering” indexes can save primary index
access
16. How can we use InnoDB?
• Download Embedded InnoDB or HailDB
• Use C-API for access to InnoDB tables
• Innostore: Erlang library for InnoDB access
(from Basho’s Riak NoSQL project)
• g414-inno: Open-Source Java access library
for Embedded InnoDB
17. g414-inno Foundations
• Uses JNA (Java Native Access): Like JNI, but
doesn’t provoke (as much) insanity
• JNAerate: creates thin Java Class wrapper
from a C-based header file (innodb.h)
• But, complex C API’s are super ugly in Java
• Need to clean that up a bit...
18. g414-inno Library
• Provides a more Object-Oriented API to
mask all of the JNA “Pointer” madness
• Transaction Objects, Cursors, Table Builder,
Tuple Builder, Datatype Validation
• Java Enum Types for ‘int’ enums in C API
• inTransaction() templates (like Spring, JDBI)
• Contains sanity checks to prevent common
errors (mostly C API order of operations)
19. Use Case:Voldemort
• Voldemort: High-Performance Key-Value
Store (Amazon Dynamo clone)
• Nokia: good results with Voldemort on
MySQL with InnoDB
• Typical features of DB (network
connectivity, SQL language) not really
necessary
• Thought: why bother with DB layer?
The g414-inno project is born ...
20. Voldemort Storage Engines
• Trivial to integrate new persistence
mechanisms with Voldemort
• 2 Classes: Config & Storage Engine
• Trivial InnoDB Table:
key_ VARBINARY(200) NOT NULL
version_ VARBINARY(200) NOT NULL
value_ BLOB
PRIMARY KEY (key_, version_)
• 3 Operations: put(k, v), get(k), delete(k)
• Complication: k is Versioned<Key>
21. V Storage Engine: put
• put(byte[] key, byte[] version, byte[] value)
• Start transaction, open table cursor
• Create search tuple for key
• Cursor.find(key)
• Foreach row matching key
if row.version is below, delete row
if row.version is above, throw exception
• Cursor.insert(key, version, value)
22. V Storage Engine: get
• get(byte[] key, byte[] version)
• Start transaction, open table cursor
• Create search tuple for key
• Cursor.find(key)
• Foreach row matching key
add to results
• Return results
23. V Storage Engine: delete
• delete(byte[] key)
• Start transaction, open table cursor
• Create search tuple for key
• Cursor.find(key)
• Foreach row matching key
delete row
24. V Storage Engine: TODO
• Perform Benchmarks (in EC2, local)
• Tuning / Optimization
• Clarify licenses (GPLv2 + Apache == ouch)
• Organize & streamline distribution
25. St8
• Simple, Open Source REST-based Storage Server
• Wraps InnoDB with thin “but pleasant” HTTP API
• Custom Tables using JSON table definitions
• Natural, JSON-based access to tables: CRUD, Index-
based Query & Iteration
• Under the hood: Jetty, Jersey, Guice, Jackson, g414-
inno, Embedded InnoDB