Apache Helix presentation at ApacheCon 2013

Building
distributed
systems
using

Helix

h?p://helix.incubator.apache.org

Apache
IncubaGon
Oct,
2012

@apachehelix

Kishore
Gopalakrishna,
@kishoreg1980
h?p://www.linkedin.com/in/kgopalak

1

Outline

•  Introduc)on

•  Architecture

•  How
to
use
Helix

•  Tools

•  Helix
usage

2

Examples
of
distributed
data
systems

3

Lifecycle

Cluster

Fault
Expansion

tolerance
•  Thro?le
data
movement

MulG
•  Re-‐distribuGon

•  ReplicaGon

node

•  Fault
detecGon

•  Recovery

Single

Node

•  ParGGoning

•  Discovery

•  Co-‐locaGon

4

Typical
Architecture

App.
App.
App.
App.

Cluster

Network

manager

Node
Node
Node
Node

5

Distributed
search
service

INDEX
SHARD

P.1
P.2
P.5
P.6

P.3
P.4

P.3
P.4
P.1
P.2

P.5
P.6

REPLICA

Node
1
Node
2
Node
3

ParGGon

Fault
tolerance
ElasGcity

management

• MulGple
replicas
• Fault
detecGon
• re-‐distribute

• Even
• Auto
create
parGGons

distribuGon
replicas
• Minimize

• Rack
aware
• Controlled
movement

placement
creaGon
of
• Thro?le
data

replicas

movement

Distributed
data
store

P.1
P.2
P.3
P.5
P.6
P.7
P.9
P.10
P.11

P.4
P.5
P.6
P.8
P.1
P.2
P.12
P.3
P.4

P.1

P.9
P.10
P.11
P.12
P.7
P.8

SLAVE

MASTER
Node
1
Node
2
Node
3

ParGGon

Fault
tolerance
ElasGcity

management

• MulGple
replicas
• Fault
detecGon
• Minimize

• 1
designated
• Promote
slave
downGme

master
to
master
• Minimize
data

• Even
• Even
movement

distribuGon
distribuGon
• Thro?le
data

• No
SPOF
movement

Message
consumer
group

•  Similar
to
Message
groups
in
AcGveMQ

–  guaranteed
ordering
of
the
processing
of
related
messages

across
a
single
queue

–  load
balancing
of
the
processing
of
messages
across

mulGple
consumers

–  high
availability
/
auto-‐failover
to
other
consumers
if
a

JVM
goes
down

•  Applicable
to
many
messaging
pub/sub

systems
like
kada,
rabbitmq
etc

8

Message
consumer
group

ASSIGNMENT
SCALING
FAULT
TOLERANCE

9

Zookeeper
provides
low
level
primiGves.

We
need
high
level
primiGves.

ApplicaGon

•  File
system
•  Node

•  Lock
•  ParGGon

•  Ephemeral
•  Replica

•  State

•  TransiGon

ApplicaGon
Framework

Consensus

Zookeeper

System

10

Outline

•  IntroducGon

•  Architecture

•  How
to
use
Helix

•  Tools

•  Helix
usage

12

Terminologies

Node
A
single
machine

Cluster
Set
of
Nodes

Resource
A
logical
en/ty
e.g.
database,
index,
task

ParGGon
Subset
of
the
resource.

Replica
Copy
of
a
parGGon

State
Status
of
a
parGGon
replica,
e.g
Master,
Slave

TransiGon
AcGon
that
lets
replicas
change
status
e.g
Slave
-‐>
Master

13

Core
concept

State
Machine
Constraints
ObjecGves

• States
• States
• ParGGon
Placement

• Oﬄine,
Slave,
Master
• M=1,
S=2
• Failure
semanGcs

• TransiGon
• TransiGons

• O-‐>S,
S-‐>M,S-‐>M,
M-‐>S
• concurrent(0-‐>S)
<
5

COUNT=2 minimize(maxnj∈N
S(nj)
)
t1≤5
S

t1 t2

t3 t4
O
M
COUNT=1 minimize(maxnj∈N
M(nj)
)

14

Helix
soluGon

Message
consumer
group
Distributed
search

Start
consumpGon

MAX=1

MAX
per
node=5

Oﬄine
Online

Stop
consumpGon

MAX=3

(number
of
replicas)

15

IDEALSTATE

P1
P2
P3

ConﬁguraGon
Constraints

• 3
nodes
• 1
Master

• 3
parGGons

• 2
replicas

• 1
Slave

• Even

N1:M
N2:M
N3:M

• StateMachine
distribuGon

Replica

placement

N2:S
N3:S
N1:S

Replica

State

16

CURRENT
STATE

N1
•  P1:OFFLINE

•  P3:OFFLINE

N2
•  P2:MASTER

•  P1:MASTER

N3
•  P3:MASTER

•  P2:SLAVE

17

EXTERNAL
VIEW

P1
P2
P3

N1:O
N2:M
N3:M

N2:M
N3:S
N1:O

18

Helix
Based
System
Roles

PARTICIPANT
IDEAL STATE

SPECTATOR
Controller

Parition routing
logic
CURRENT STATE
RESPONSE COMMAND

P.1
P.2
P.3
P.5
P.6
P.7
P.9
P.10
P.11

P.4
P.5
P.6
P.8
P.1
P.2
P.12
P.3
P.4

P.1

P.9
P.10
P.11
P.12
P.7
P.8

Node
1
Node
2
Node
3

19

Logical
deployment

20

Outline

•  IntroducGon

•  Architecture

•  How
to
use
Helix

•  Tools

•  Helix
usage

21

Helix
based
soluGon

1.  Deﬁne

2.  Conﬁgure

3.  Run

22

Deﬁne:
State
model
deﬁniGon

•  States
•  e.g.
MasterSlave

–  All
possible
states

–  Priority

•  TransiGons

–  Legal
transiGons
S

–  Priority

•  Applicable
to
each
O
M

parGGon
of
a
resource

23

Deﬁne:
state
model

Builder = new StateModelDefinition.Builder(“MASTERSLAVE”);!
// Add states and their rank to indicate priority. !
builder.addState(MASTER, 1);!
builder.addState(SLAVE, 2);!
builder.addState(OFFLINE);!
!
//Set the initial state when the node starts!
builder.initialState(OFFLINE);

//Add transitions between the states.!
builder.addTransition(OFFLINE, SLAVE);!
builder.addTransition(SLAVE, OFFLINE);!
builder.addTransition(SLAVE, MASTER);!
builder.addTransition(MASTER, SLAVE);!
!

24

Deﬁne:
constraints

State
Transi)on

ParGGon
Y
Y

Resource
-‐
Y

Node
Y
Y

COUNT=2
Cluster
-‐
Y

S

COUNT=1
State
Transi)on
O
M

ParGGon
M=1,S=2
-‐

25

Deﬁne:constraints

// static constraint!
builder.upperBound(MASTER, 1);!
!
!
// dynamic constraint!
builder.dynamicUpperBound(SLAVE, "R");!
!
!
!
// Unconstrained !
builder.upperBound(OFFLINE, -1;

26

Deﬁne:
parGcipant
plug-‐in
code

27

Step
2:
conﬁgure

helix-‐admin
–zkSvr
<zkAddress>

CREATE
CLUSTER

-‐-‐addCluster
<clusterName>

ADD
NODE

-‐-‐addNode
<clusterName
instanceId(host:port)>

CONFIGURE
RESOURCE

-‐-‐addResource
<clusterName
resourceName
par;;ons
statemodel>

REBALANCE
èSET
IDEALSTATE

-‐-‐rebalance
<clusterName
resourceName
replicas>

28

zookeeper
view

IDEALSTATE

29

Step
3:
Run

START
CONTROLLER

run-‐helix-‐controller

-‐zkSvr
localhost:2181
–cluster
MyCluster

START
PARTICIPANT

30

Znode
content

CURRENT
STATE
EXTERNAL
VIEW

32

Spectator
Plug-‐in
code

33

Helix
ExecuGon
modes

34

IDEALSTATE

P1
P2
P3

ConﬁguraGon
Constraints

N1:M
N2:M
N3:M

• 3
nodes
• 1
Master

• 3
parGGons
• 1
Slave

• 2
replicas
• Even

• StateMachine
distribuGon

N2:S
N3:S
N1:S

Replica
Replica

placement
State

35

ExecuGon
modes

•  Who
controls
what

AUTO
AUTO
CUSTOM

REBALANCE

Replica
Helix
App
App

placement

Replica

Helix
Helix
App

State

36

Auto
rebalance
v/s
Auto

AUTO
REBALANCE
AUTO

37

In
acGon

Auto
rebalance
Auto

MasterSlave
p=3
r=2
N=3
MasterSlave
p=3
r=2
N=3

Node1
Node2
Node3
Node
1
Node
2
Node
3

P1:M
P2:M
P3:M
P1:M
P2:M
P3:M

P2:S
P3:S
P1:S
P2:S
P3:S
P1:S

On
failure:
Auto
create
replica

On
failure:
Only
change
states
to
saGsfy

and
assign
state
constraint

Node
1
Node
2
Node
3
Node
1
Node
2
Node
3

P1:O
P2:M
P3:M
P1:M
P2:M
P3:M

P2:O
P3:S
P1:S
P2:S
P3:S
P1:M

P1:M
P2:S

38

Custom
mode:
example

39

Custom
mode:
handling
failure

™  Custom
code
invoker

™  Code
that
lives
on
all
nodes,
but
acGve
in
one
place

™  Invoked
when
node
joins/leaves
the
cluster

™  Computes
new
idealstate

™  Helix
controller
ﬁres
the
transiGon
without
viola)ng
constraints

P1
P2
P3
P1
P2
P3
Transi)ons

1
N1
MàS

2
N2
Sà
M

N1:M
N2:M
N3:M
N1:S
N2:M
N3:M

1
&
2
in
parallel
violate
single

master
constraint

N2:S
N3:S
N1:S
N2:M
N3:S
N1:S

Helix
sends
2
aser
1
is
ﬁnished

40

Outline

•  IntroducGon

•  Architecture

•  How
to
use
Helix

•  Tools

•  Helix
usage

41

Tools

•  Chaos
monkey

•  Data
driven
tesGng
and
debugging

•  Rolling
upgrade

•  On
demand
task
scheduling
and
intra-‐cluster

messaging

•  Health
monitoring
and
alerts

42

Data
driven
tesGng

•  Instrument
–

• 
Zookeeper,
controller,
parGcipant
logs

•  Simulate
–
Chaos
monkey

•  Analyze
–
Invariants
are

•  Respect
state
transiGon
constraints

•  Respect
state
count
constraints

•  And
so
on

•  Debugging
made
easy

•  Reproduce
exact
sequence
of
events

43

Structured
Log
File
-‐
sample

timestamp partition instanceName sessionId state

1323312236368 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE




1323312236561 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE









No
more
than
R=2
slaves

Time State Number Slaves Instance
42632 OFFLINE 0 10.117.58.247_12918
42796 SLAVE 1 10.117.58.247_12918
43124 OFFLINE 1 10.202.187.155_12918
43131 OFFLINE 1 10.220.225.153_12918
43275 SLAVE 2 10.220.225.153_12918
43323 SLAVE 3 10.202.187.155_12918
85795 MASTER 2 10.220.225.153_12918

How
long
was
it
out
of
whack?

Number
of
Slaves
Time

Percentage

0
1082319
0.5

1
35578388
16.46

2
179417802
82.99

3
118863
0.05

83%
of
the
Gme,
there
were
2
slaves
to
a
parGGon

93%
of
the
Gme,
there
was
1
master
to
a
parGGon

Number
of
Masters
Time
Percentage

0 15490456 7.164960359
1 200706916 92.83503964

Invariant
2:
State
TransiGons

FROM
TO
COUNT

MASTER SLAVE 55
OFFLINE DROPPED 0
OFFLINE SLAVE 298
SLAVE MASTER 155
SLAVE OFFLINE 0

Outline

•  IntroducGon

•  Architecture

•  How
to
use
Helix

•  Tools

•  Helix
usage

48

Helix
usage
at
LinkedIn

Espresso

49

In
ﬂight

•  Apache
S4

–  ParGGoning,
co-‐locaGon

–  Dynamic
cluster
expansion

•  Archiva

–  ParGGoned
replicated
ﬁle
store

–  Rsync
based
replicaGon

•  Others
in
evaluaGon

–  Bigtop

50

Auto
scaling
sosware
deployment
tool

•  States
Offline
< 100

•  Download,
Configure,
Start
Download

•  AcGve,
Standby
Configure

•  Constraint
for
each
state

Start
•  Download

<
100

•  AcGve
1000
Active 1000

•  Standby
100
Standby 100

51

Summary

•  Helix:
A
Generic
framework
for
building

distributed
systems

•  Modifying/enhancing
system
behavior
is
easy

–  AbstracGon
and
modularity
is
key

•  Simple
programming
model:
declaraGve
state

machine

52

Roadmap

•  Features

•  Span
mulGple
data
centers

•  AutomaGc
Load
balancing

•  Distributed
health
monitoring

•  YARN
Generic
ApplicaGon
master
for
real
Gme

Apps

•  Stand
alone
Helix
agent

website
h?p://helix.incubator.apache.org

user
user@helix.incubator.apache.org

dev
dev@helix.incubator.apache.org

twi?er
@apachehelix,
@kishoreg1980

54

Apache Helix presentation at ApacheCon 2013

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Apache Helix presentation at ApacheCon 2013

Semelhante a Apache Helix presentation at ApacheCon 2013 (20)

Último

Último (20)

Apache Helix presentation at ApacheCon 2013

Notas do Editor