5. Motivation
A system starts out simple…
…but gets complex in the real world
…as you address real requirements
Application
client library
Scale
Failover
Bootstrapping
Call Routing
System
Replica 1 …
Replica 2 …
5
6. Motivation
These are cluster management problems
Helix solves them once…
Scale
…so you can focus on your system
Failover
Bootstrapping
6
7. Outline
What is Helix
Use case 1: distributed data store
Architecture
Use case 2: consumer group
Helix at LinkedIn
Q&A
7
14. Use-case requirements
• Partition constraints
• 1 master per partition
• Balance partitions across cluster
• No single-point-of-failure: replicas on different nodes
• Handle failures: transfer mastership
• Elasticity
• Distribute workload across added nodes
Minimize partition movement
• Meet SLAs
Throttle concurrent data movement
14
30. Outline
What is Helix
Use case 1: distributed data store
Architecture
Use case 2: consumer group
Helix at LinkedIn
Q&A
30
31. Helix usage at LinkedIn (Pictures)
Espresso
– a timeline-consistent, distributed data store
Databus
– a change data capture service
Search as a Service
– a multi-tenant service for multiple search applications
More planned
31
32. Summary
Building Distributed Data Systems is hard
– Abstraction and modularity is key
Helix: A Generic framework for Cluster Management
Simple programming model: declarative state machine
32
33. Helix: Future Roadmap
• Features
• Span multiple data centers
• Load balancing
• Announcement
• Open source: https://github.com/linkedin/helix
• Apache incubation
• New contributors
Partitioned queue consumption, lets say there are 6 queues and some consumers to consume form these queues.The requirement is simple, the number of queues must be equally divided among the consumers. On top of the we need partition affinity while consuming instead of randomly picking up from any queue.