2. Continuuity Proprietary and Confidential
WHO WE ARE
• We’ve built Continuuity Reactor: the world’s first scale-out
application server for Hadoop
• Fast, easy development, deployment and management of
Hadoop and HBase apps
• Continuuity team has years of experience in using and contributing
to Open Source, and we intend to continue doing so.
Thursday, June 6, 13
3. Continuuity Proprietary and Confidential
AGENDA
• Transactions in stream processing: what? why?
• Omid-style transactions explained
• Queues: heart of stream processing
• What’s next?
Thursday, June 6, 13
4. Continuuity Proprietary and Confidential
THE REACTOR
• Continuuity Reactor is an app platform built on Hadoop and HBase
• Collect, Process, Store, and Query data.
• A Flow is a real-time processor with exactly-once guarantee
• A flow is composed of flowlets, connected via queues
• All processing happens with ACID guarantees in transactions
Thursday, June 6, 13
5. Continuuity Proprietary and Confidential
PROCESSING IN A FLOWLET
...Queue ...
...
Flowlet
Thursday, June 6, 13
6. Continuuity Proprietary and Confidential
PROCESSING IN A FLOWLET
...Queue ...
...
Flowlet
DataSet
... ...
Thursday, June 6, 13
7. Continuuity Proprietary and Confidential
TRANSACTIONS: WHY?
...Queue ...
...
Flowlet
DataSet
... ...
Thursday, June 6, 13
8. Continuuity Proprietary and Confidential
PROCESSING WITH TX
...Queue ...
...
Flowlet
DataSet
Thursday, June 6, 13
9. Continuuity Proprietary and Confidential
TRANSACTIONS: WHAT?
• Atomic - Entire transaction is committed as one
• Consistent - No partial state change due to failure
• Isolated - No dirty reads, transaction is only visible after commit
• Durable - Once committed, data is persisted reliably
Thursday, June 6, 13
10. Continuuity Proprietary and Confidential
OMID-STYLE TRANSACTIONS
• Multi-Version Concurrency Control with Version = HBase Timestamp
• All writes in the same transaction use the transaction ID as timestamp
• Reads exclude other, uncommitted transactions (for isolation)
• Optimistic Concurrency Control
• Conflict detection at commit of transaction
• Write Conflict: two overlapping transactions write the same row
• Rollback of one transaction in case of conflict (whichever commits later)
Thursday, June 6, 13
11. Continuuity Proprietary and Confidential
OMID-STYLE TRANSACTIONS
start tx
do work
has conflicts
commit tx
Tx Oracle
get
write pointer
HBase
write with
version=pointer
rollback
abort tx
Thursday, June 6, 13
12. Continuuity Proprietary and Confidential
OPTIMISTIC CONCURRENCY CONTROL
• Optimistic Concurrency Control
• Avoids cost of locking rows and tables
• No deadlocks or lock escalations
• Cost of conflict detection and possible rollback is higher
• Good if conflicts are rare: short transaction, disjoint partitioning of
work
Thursday, June 6, 13
13. Continuuity Proprietary and Confidential
OMID-STYLE TRANSACTIONS
has conflicts
create tx
track tx ops
check conflicts
make tx visible
commit tx
no conflicts
start tx
do work
remove txabort tx
get ops
to rollback
get new
tx
add ops
to tx
try
commit
Tx OracleTx Agent
Thursday, June 6, 13
14. Continuuity Proprietary and Confidential
TRANSACTION ORACLE
• Simple & Fast
• Single point of failure?
• Persist all state to a write-ahead log
• Secondary oracle that subscribes to log
• Failover can happen quickly
Thursday, June 6, 13
15. Continuuity Proprietary and Confidential
QUEUES
• Flowlets pass data to each other on queues
• Every consumer (flowlet) can be partitioned
• More than one consumer (flowlet) can read a queue
• Queues are partitioned to scale throughput
Thursday, June 6, 13
16. Continuuity Proprietary and Confidential
Flowlet
QUEUES & FLOWLETS
...Queue ...
Instance2
Instance1
...
Thursday, June 6, 13
17. Continuuity Proprietary and Confidential
QUEUE DESIGN
• Queue entries are written in sequence
• Write pointer only goes forward
• Queue entries are read sequentially
• Read pointer only goes forward
• Reader waits for entry to be written
• Transactions are used for isolation & consistency guarantees
Thursday, June 6, 13
18. Continuuity Proprietary and Confidential
QUEUE OPERATION
WritePointer
...
enqueue
inc & get entry meta valid?
... ... ...
[data] [data] trueenqueue
Queue
set falsecommit tx abort tx
start tx
write
commit tx abort tx
start tx ReadPointer
Consumer State
Claimed
Entries List
inc & get
read
...
Producer Consumer
put
dequeue
Thursday, June 6, 13
19. Continuuity Proprietary and Confidential
PERFORMANCE CONSIDERATIONS
• Every entry costs at least 4 writes:
• Enqueue = Increment + Put = 2 writes to WAL
• Dequeue = 2 x Get + Put = 1 write to WAL
• Ack = Get + Put = 1 write to WAL
• Caching consumer state in-memory (still persisting every change)
• Dequeue = Get + Put
• Ack = Put
Thursday, June 6, 13
20. Continuuity Proprietary and Confidential
PERFORMANCE IMPROVEMENTS
• Prefetching n entries and caching them in state:
• Dequeue = 1/n x Put
• Batch enqueues + dequeues
• Enqueue = 2/n x Put
• Dequeue = 1/n x Get + 1/n Put
• Ack = 1/n x Put
Thursday, June 6, 13
21. Continuuity Proprietary and Confidential
PERFORMANCE NUMBERS
• enqueue: 10K ops/sec per producer per node
• dequeue: 5K ops/sec per consumer per node
• 2 vs 1 RPC calls comparing to enqueue op
Thursday, June 6, 13
22. Continuuity Proprietary and Confidential
HBASE WISHLIST
• Filters for Get, not just max timestamp (for transactional read)
• Filters for Increment and CheckAndPut (for transactional writes)
• Ability to aggregate writes to WAL in co-processors (for faster queues)
• No-read atomic Append operation
• No-read atomic Increment operation
Thursday, June 6, 13
23. Continuuity Proprietary and Confidential
QS?
Looking for the chance to work with a team that is defining a new category within Big Data?
We are hiring!
careers@continuuity.com
Thursday, June 6, 13