SlideShare a Scribd company logo
1 of 64
Download to read offline
チャットワークのメッセージシステム
を支える
新分散ID発行器の内部
安田裕介/Yusuke Yasuda (@TanUkkii007)
Distributed ID generator in ChatWork
2018/02/27 © ChatWork All rights reserved.
Agenda
● What is our motivation for distributed ID
● How our distributed ID generator works
● Designing distributed ID generator with actor model
● Implementing distributed ID generator with actor model
● Distributed ID generator in the wild
2018/02/27 © ChatWork All rights reserved.
About me
● Yusuke Yasuda / 安田裕介
● @TanUkkii007
● Working for ChatWork for 2 years
● Scala developer
2018/02/27 © ChatWork All rights reserved.
About ChatWork
2018/02/27 © ChatWork All rights reserved.
What is our motivation for
distributed ID
2018/02/27 © ChatWork All rights reserved.
Messaging system architecture overview
You can find more information about our architecture at Kafka summit 2017.
Today’s topic
2018/02/27 © ChatWork All rights reserved.
Motivation
● Migration from MySQL to Kafka/HBase
○ High scalability
○ No single point of failure
● compatibility with existing IDs
○ Time-ordered, sortable integer IDs
● ID space extension
○ 32bit → 64bit
● ID generator itself is required to be scalable and distributed
2018/02/27 © ChatWork All rights reserved.
Snowflake
● Distributed ID generator developed by Twitter
○ https://github.com/twitter/snowflake
● Motivation: migration to Cassandra from MySQL
● Roughly time-ordered, sortable 64bit ID
● > 10k ID/s per process, ~ 2ms response rate
41 bit
timestamp
5 bit
datacenter ID
5 bit
worker ID
12 bit
sequenceNr
… … … …
2018/02/27 © ChatWork All rights reserved.
ZooKeeper
● Developed by Yahoo! Research
○ to simplify distributed system implementation by
supporting common patterns used in distributed systems
● Quorum based coordination
● Total order broadcast
● Filesystem-like API
○ Create, GetData, SetData, Exists, GetChildren, Delete
○ Watching node changes
We use ZooKeeper to coordinate worker IDs in distributed ID generator.
2018/02/27 © ChatWork All rights reserved.
Distributed ID generator system overview
ID worker
ID worker
ID worker
ZooKeeper
ID client
ZooKeeper ensemble
ZooKeeper ZooKeeper
ID generator ID client in Message Write API
2018/02/27 © ChatWork All rights reserved.
How our distributed ID generator
works
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
Command: GetChildren
Path: /id-worker/1
Watch: True
Router
Datacenter 1
root node
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
Command: GetChildren
Path: /id-worker/1
Watch: True
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
Command: Create
Path: /id-worker/1/1
Data: akka.tcp://system@11.2.9.14:***
mode: Ephemeral
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
Event: NodeChildrenChanged
Path: /id-worker/1
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
Command: GetChildren
Path: /id-worker/1
Watch: True
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
Command: GetData
Path: /id-worker/1/1
Watch: False
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
Identify
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
ActorIdentity worker1
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
ID request/response
worker1
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
ID
worker
Command: GetChildren
Path: /id-worker/1
Watch: True
worker1
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
ID
worker
/id-worker/1/2 akka.tcp://system@11.2.9.15:***
Command: Create
Path: /id-worker/1/2
Data: akka.tcp://system@11.2.9.15:***
mode: Ephemeral
worker1
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
ID
worker
/id-worker/1/2 akka.tcp://system@11.2.9.15:***
Event: NodeChildrenChanged
Path: /id-worker/1
worker1
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
ID
worker
/id-worker/1/2 akka.tcp://system@11.2.9.15:***
Command: GetChildren
Path: /id-worker/1
Watch: True
worker1
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
ID
worker
/id-worker/1/2 akka.tcp://system@11.2.9.15:***
Command: GetData
Path: /id-worker/1/2
Watch: False
worker1
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
ID
worker
/id-worker/1/2 akka.tcp://system@11.2.9.15:***
Identify worker1
Router
command/event
watch
ID worker discovery via ZooKeeper
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
worker1ID
worker
/id-worker/1/2 akka.tcp://system@11.2.9.15:***
ActorIdentity
worker2
Router
command/event
watch
Client-side failure handling
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
worker1ID
worker
/id-worker/1/2 akka.tcp://system@11.2.9.15:***
worker2
Router
command/event
watch
Client-side failure handling
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
worker1ID
worker
worker2
Router
Event: NodeChildrenChanged
Path: /id-worker/1
command/event
watch
Client-side failure handling
ZooKeeper
ID
client
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
worker1ID
worker
Router
Command: GetChildren
Path: /id-worker/1
Watch: True
command/event
watch
Worker ID Consensus process
1. No worker ID duplication invariant
ZooKeeper
/id-worker/1
ID
worker
Command: GetChildren
Path: /id-worker/1
Watch: True
command/event
watch
ZooKeeper
/id-worker/1
ID
worker
ID
worker
Command: GetChildren
Path: /id-worker/1
Watch: True
Worker ID Consensus process
1. No worker ID duplication invariant
command/event
watch
ZooKeeper
/id-worker/1
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.14:***
Command: Create
Path: /id-worker/1/1
Data: akka.tcp://system@11.2.9.14:***
mode: Ephemeral
ID
worker
Failed with
NODEEXISTS
Worker ID Consensus process
1. No worker ID duplication invariant
command/event
watch
ZooKeeper
/id-worker/1
ID
worker
/id-worker/1/0 akka.tcp://system@11.2.9.14:***
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.15:***
akka.tcp://system@11.2.9.16:***/id-worker/1/31
ID
worker ID
worker
Command: GetChildren
Path: /id-worker/1
Watch: True
Worker ID Consensus process
2. No out of range worker ID creation
command/event
watch
ZooKeeper
/id-worker/1
ID
worker
/id-worker/1/0 akka.tcp://system@11.2.9.14:***
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.15:***
akka.tcp://system@11.2.9.16:***/id-worker/1/31
ID
worker ID
worker
Worker ID Consensus process
2. No out of range worker ID creation
Suspended because no available
worker IDs.
command/event
watch
ZooKeeper
/id-worker/1
ID
worker
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.15:***
akka.tcp://system@11.2.9.16:***/id-worker/1/31
ID
worker ID
worker
Worker ID Consensus process
2. No out of range worker ID creation
command/event
watch
ZooKeeper
/id-worker/1
ID
worker
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.15:***
akka.tcp://system@11.2.9.16:***/id-worker/1/31
ID
worker ID
worker
Worker ID Consensus process
2. No out of range worker ID creation
Event: NodeChildrenChanged
Path: /id-worker/1
command/event
watch
ZooKeeper
/id-worker/1
ID
worker
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.15:***
akka.tcp://system@11.2.9.16:***/id-worker/1/31
ID
worker ID
worker
Worker ID Consensus process
2. No out of range worker ID creation
Command: GetChildren
Path: /id-worker/1
Watch: True
command/event
watch
ZooKeeper
/id-worker/1
ID
worker
ID
worker
/id-worker/1/1 akka.tcp://system@11.2.9.15:***
akka.tcp://system@11.2.9.16:***/id-worker/1/31
ID
worker ID
worker
Command: Create
Path: /id-worker/1/1
Data: akka.tcp://system@11.2.9.14:***
mode: Ephemeral
/id-worker/1/0 akka.tcp://system@11.2.9.14:***
Worker ID Consensus process
2. No out of range worker ID creation
command/event
watch
2018/02/27 © ChatWork All rights reserved.
Automaton notation of ID worker
Create my worker ID
Initial state
Check available
worker IDs
Wait until a worker
ID gets available
Ready to generate
message IDs
Fetch list of worker IDsStart
Create a worker ID node
Retry
Retry
Something happened
to my worker ID
Retry
Worker ID conflict
No available
worker IDs
Retry
2018/02/27 © ChatWork All rights reserved.
Designing distributed ID generator
with actor model
2018/02/27 © ChatWork All rights reserved.
Processes, Modules, Automata, Steps
Process 1
Module 1
Module 2
Module 3
Process 2
Module 1
Module 2
Module 3
Automaton
local interaction
communication
2018/02/27 © ChatWork All rights reserved.
Distributed algorithm abstraction
● Process: the unit of failure, the unit of communication
● Module: building block of processes
○ communication with modules on peer process
○ local interaction with modules on the same process
● Automata: set of states and transitions that regulates
computation steps
● Step: distributed algorithm consists of sequence of steps
○ e.g. receiving a message, sending a message,
executing a local computation
syntactically
identical but
different notion
Reliable and Secure Distributed Programming 2nd ed.
2018/02/27 © ChatWork All rights reserved.
Algorithm abstraction to
concrete implementation mapping
● Process → Actor (top level actor with remote interface)
● Module → Actor
○ communication → message passing to remote actors
○ local interaction → message passing to local actors
● Layers of modules → Actor hierarchy tree
● Automata
○ State → Receive partial functions and its internal states
○ Transition → context.become()
● Step:
○ receiving a message → case clauses of Receive function
○ sending a message → ! or tell() function
○ executing a local computation: arbitrary computation
syntactically
identical
2018/02/27 © ChatWork All rights reserved.
Modules in ID worker process
● IdGenerator: exports
communication interface
● IdWorker: calculates
Snowflake ID
● ZNodeMaster: manages
worker ID ZNode
● ReactiveZookeeper: Actor
based ZooKeeper client
ID client
process
ZooKeeper
process
IdGenerator
IdWorker ZNodeMaster
ReactiveZookeeper
ID worker process
2018/02/27 © ChatWork All rights reserved.
IdWorker
IdGenerator
ReactiveZookeeper
Designing actor hierarchy based on
message flow simplifies its implementation.
See Akka in Action section 4.3.
Mapping layers of modules to actor
hierarchy based on message flow.
ZNodeMaster
IdGenerator
IdWorker ZNodeMaster
ReactiveZookeeper
ID worker process
message flow
parent child
relationship
Designing actor hierarchy based on
message flow
2018/02/27 © ChatWork All rights reserved.
Designing actor hierarchy based on
failure handling
IdWorker
IdGenerator
ReactiveZookeeper
ZNodeMaster
ZooKeeper application is only valid within a session.
Instead of handling session timeout everywhere,
locate ReactiveZooKeeper at the top of the hierarchy.
Just let it crash on session timeout.
message flow
parent child
relationship
IdGenerator
IdWorker ZNodeMaster
ReactiveZookeeper
ID worker process
2018/02/27 © ChatWork All rights reserved.
Implementing distributed ID
generator with actor model
2018/02/27 © ChatWork All rights reserved.
What ZooKeeper application needs
● Asynchronous
● Event driven
● Passive: Don't call us, we'll call you style
● Stateful
● Complicated state machine
● Retry everything
● Crash immediately in fatal situation (session expiration)
● Recover gracefully
2018/02/27 © ChatWork All rights reserved.
ReactiveZooKeeper
Let-it-crash style ZooKeeper client based on Akka actor
https://github.com/TanUkkii007/reactive-zookeeper
2018/02/27 © ChatWork All rights reserved.
Implementing
automata with
Akka
class ZNodeMaster(reactiveZK: ActorRef) extends Actor {
override def receive: Receive = initial
def initial: Receive = {
case Start =>
self ! CheckWorkerIds
context.become(checkingWorkerIds)
}
def checkingWorkerIds: Receive = {
case CheckWorkerIds =>
reactiveZK ! GetChildren("id-worker/1", watch = true)
case ChildrenGot(path, children, _) if children.length <= maxWorkerId =>
val workerId = chooseId(children)
context.become(creatingWorkerId(workerId))
self ! CreateWorkerId
}
def creatingWorkerId(workerId: Int): Receive = {
case CreateWorkerId =>
reactiveZK ! Create(workerIdPath(workerId), address,
Ids.OPEN_ACL_UNSAFE.asScala.toList, CreateMode.EPHEMERAL)
}
}
Create
my
worker ID
Initial
state
Check
available
worker
IDs
Fetch
list of
worker IDsStart
State 1
State 2
State 3
state transition with
context.become
internal state as a closure variable
that is only visible within the state
ZNodeMaster module
2018/02/27 © ChatWork All rights reserved.
Implementing self-loop of automata
Check
available
worker
IDs
Retry
def checkingWorkerIds: Receive = {
case CheckWorkerIds =>
reactiveZK ! GetChildren("id-worker/1", watch = true)
case ChildrenGot(path, children, _) if children.length <= maxWorkerId =>
val workerId = chooseId(children)
context.become(creatingWorkerId(workerId))
self ! CreateWorkerId
case GetChildrenFailure(e, _, _) if e.code() == Code.CONNECTIONLOSS =>
self ! CheckWorkerIds
}
Retry on connection loss by sending the
same message to self.
ZNodeMaster module
2018/02/27 © ChatWork All rights reserved.
Implementing communication interface
def receive: Receive = {
case GetIdWorkerAddress(WorkerId(workerId)) =>
reactiveZK ! GetData(s"/${settings.rootNode}/${settings.dcId}/$workerId", watch = false, ctx = workerId)
case DataGot(path, data, stat, workerId: Long) =>
val idWorkerZNode = deserializeIdWorkerZNode(data)
val workerPath = idWorkerZNode.fullActorPath
val selection = context.actorSelection(workerPath)
selection ! Identify(workerId)
case ActorIdentity(workerId: Long, Some(workerRef)) =>
workers += WorkerId(workerId) -> context.watch(workerRef)
val newRoutee = IdWorkerRoutee(WorkerId(workerId), workerRef)
if (!router.routees.contains(newRoutee))
router = router.addRoutee(newRoutee)
case msg: GenerateId => router.route(IdWorkerGenerateId(msg.requestId.toString), sender())
}
Remote messages.
ID worker client implementation
Configuring RemoteActorRef
Provider let us write the
communication syntactically same
as the local interaction.
2018/02/27 © ChatWork All rights reserved.
Implementing
crash
recovery
class ZooKeeperSessionActor(childProps: Props) extends Actor with WatcherCallback{
val zookeeper = new ZooKeeper("zookeeper:2181", 5000, watchCallback(self))
val zookeeperOperation: ActorRef =
context.actorOf(ZooKeeperOperationActor.props(zookeeper))
val childActor: ActorRef = context.actorOf(childProps, childName))
def receive: Receive = {
case ZooKeeperWatchEvent(e) =>
e.getState match {
case Expired => throw ZooKeeperSessionRestartException(None)
case _ =>
}
case cmd: ZKOperations.ZKCommand => zookeeperOperation forward cmd
case other => childActor forward other
}
override def postStop(): Unit = {
zookeeper.close()
super.postStop()
}
}
Throw exception on ZooKeeper
session expiration.
Note that ZooKeeper
application is valid only if its
session is valid.
Just let it crash on session
expiration.
Akka automatically restart
child actors. Stale states of
actors are refreshed on
restart.
ReactiveZooKeeper module
2018/02/27 © ChatWork All rights reserved.
Edge case that breaks ID uniqueness
ZooKeeper
ID worker 1
ID worker 2
ID client 1
delete worker ID 1
session expired
NodeChildren
Changed
my worker ID
is 1
my worker ID
is 1
ID client 2
worker 1’s
ID is 1
worker 2’s
ID is 1
worker 1’s
ID is 1
worker 2’s
ID is 1
NodeChildren
Changed
network
partition
create worker ID 1
session timeout
ID duplication
risk!
network
recovered
2018/02/27 © ChatWork All rights reserved.
Mitigate condition that breaks consensus
inferred by FLP impossibility
def receive: Receive = {
case ZooKeeperWatchEvent(e) =>
e.getState match {
case Expired => throw ZooKeeperSessionRestartException(None)
case Disconnected => throw new
ConnectionRecoveryTimeoutException(connectionTimeout)
case _ =>
}
case cmd: ZKOperations.ZKCommand => zookeeperOperation forward cmd
case other => childActor forward other
}
Fail fast on disconnection.
ReactiveZooKeeper module
2018/02/27 © ChatWork All rights reserved.
Distributed ID generator in the wild
2018/02/27 © ChatWork All rights reserved.
ID generation latency and throughput
The stress test used a gatling plugin for Akka Remote protocol.
https://github.com/chatwork/gatling-akka
Single ID worker performance
2018/02/27 © ChatWork All rights reserved.
Effect of
ZooKeeper
node down
ID generation
throughput
The number of ID
workers connected
to the downed
ZooKeeper
Error count of ID
clients
# of affected
workers by
ZooKeeper
down
Effect
1/3 Not observable
2/3 Relatively high error rate
3/3 Throughput decrease
High error rate
We use redundant requests to mitigate effects of ID worker down.
2018/02/27 © ChatWork All rights reserved.
Write API latency (average)
~6ms improvement.
The distributed ID
generator release
2018/02/27 © ChatWork All rights reserved.
Write API latency (100, 95, 50 pt.)
The distributed ID
generator release
95 pt. is not much improved.
2018/02/27 © ChatWork All rights reserved.
ID Generator latency (average)
The distributed ID
generator release
~8ms improvement.
2018/02/27 © ChatWork All rights reserved.
ID Generator latency (100, 95, 50 pt.)
The distributed ID
generator release
Worse 95pt.
Possible fixes:
- use higher throughput for dispatcher
- use pinned dispatcher
- competitive redundant requests with
scatter gather pattern
2018/02/27 © ChatWork All rights reserved.
We are hiring!
https://corp.chatwork.com/ja/recruit/

More Related Content

Similar to Distributed ID generator in ChatWork

Meeting rooms are talking! are you listening?
Meeting rooms are talking! are you listening?Meeting rooms are talking! are you listening?
Meeting rooms are talking! are you listening?Cisco DevNet
 
FIWARE IoT Proposal & Community
FIWARE IoT Proposal & CommunityFIWARE IoT Proposal & Community
FIWARE IoT Proposal & CommunityFIWARE
 
Meeting rooms are talking. Are you listening
Meeting rooms are talking. Are you listeningMeeting rooms are talking. Are you listening
Meeting rooms are talking. Are you listeningCisco DevNet
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Soroosh Khodami
 
Timings API: Performance Assertion during the functional testing
 Timings API: Performance Assertion during the functional testing Timings API: Performance Assertion during the functional testing
Timings API: Performance Assertion during the functional testingPetrosPlakogiannis
 
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...Codemotion
 
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...Codemotion
 
Integrated, Automated Video Room Systems - Webex Devices - Cisco Live Orlando...
Integrated, Automated Video Room Systems - Webex Devices - Cisco Live Orlando...Integrated, Automated Video Room Systems - Webex Devices - Cisco Live Orlando...
Integrated, Automated Video Room Systems - Webex Devices - Cisco Live Orlando...Cisco DevNet
 
jRecruiter - The AJUG Job Posting Service
jRecruiter - The AJUG Job Posting ServicejRecruiter - The AJUG Job Posting Service
jRecruiter - The AJUG Job Posting ServiceGunnar Hillert
 
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...VMware Tanzu
 
Securing your Cloud Environment v2
Securing your Cloud Environment v2Securing your Cloud Environment v2
Securing your Cloud Environment v2ShapeBlue
 
IThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOpsIThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOpsSimon Su
 
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...mfrancis
 
Inspecting iOS App Traffic with JavaScript - JSOxford - Jan 2018
Inspecting iOS App Traffic with JavaScript - JSOxford - Jan 2018Inspecting iOS App Traffic with JavaScript - JSOxford - Jan 2018
Inspecting iOS App Traffic with JavaScript - JSOxford - Jan 2018Andy Davies
 
Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021InfluxData
 
Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015kingsBSD
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieSpring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieVMware Tanzu
 
[CB21] The Lazarus Group's Attack Operations Targeting Japan by Shusei Tomona...
[CB21] The Lazarus Group's Attack Operations Targeting Japan by Shusei Tomona...[CB21] The Lazarus Group's Attack Operations Targeting Japan by Shusei Tomona...
[CB21] The Lazarus Group's Attack Operations Targeting Japan by Shusei Tomona...CODE BLUE
 

Similar to Distributed ID generator in ChatWork (20)

Meeting rooms are talking! are you listening?
Meeting rooms are talking! are you listening?Meeting rooms are talking! are you listening?
Meeting rooms are talking! are you listening?
 
FIWARE IoT Proposal & Community
FIWARE IoT Proposal & CommunityFIWARE IoT Proposal & Community
FIWARE IoT Proposal & Community
 
Meeting rooms are talking. Are you listening
Meeting rooms are talking. Are you listeningMeeting rooms are talking. Are you listening
Meeting rooms are talking. Are you listening
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
 
Timings API: Performance Assertion during the functional testing
 Timings API: Performance Assertion during the functional testing Timings API: Performance Assertion during the functional testing
Timings API: Performance Assertion during the functional testing
 
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
 
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
Stève Sfartz - Meeting rooms are talking! Are you listening? - Codemotion Ber...
 
Sst hackathon express
Sst hackathon expressSst hackathon express
Sst hackathon express
 
Integrated, Automated Video Room Systems - Webex Devices - Cisco Live Orlando...
Integrated, Automated Video Room Systems - Webex Devices - Cisco Live Orlando...Integrated, Automated Video Room Systems - Webex Devices - Cisco Live Orlando...
Integrated, Automated Video Room Systems - Webex Devices - Cisco Live Orlando...
 
jRecruiter - The AJUG Job Posting Service
jRecruiter - The AJUG Job Posting ServicejRecruiter - The AJUG Job Posting Service
jRecruiter - The AJUG Job Posting Service
 
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
 
Securing your Cloud Environment v2
Securing your Cloud Environment v2Securing your Cloud Environment v2
Securing your Cloud Environment v2
 
ql.io at NodePDX
ql.io at NodePDXql.io at NodePDX
ql.io at NodePDX
 
IThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOpsIThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOps
 
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
 
Inspecting iOS App Traffic with JavaScript - JSOxford - Jan 2018
Inspecting iOS App Traffic with JavaScript - JSOxford - Jan 2018Inspecting iOS App Traffic with JavaScript - JSOxford - Jan 2018
Inspecting iOS App Traffic with JavaScript - JSOxford - Jan 2018
 
Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021Getting Started: Intro to Telegraf - July 2021
Getting Started: Intro to Telegraf - July 2021
 
Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieSpring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
 
[CB21] The Lazarus Group's Attack Operations Targeting Japan by Shusei Tomona...
[CB21] The Lazarus Group's Attack Operations Targeting Japan by Shusei Tomona...[CB21] The Lazarus Group's Attack Operations Targeting Japan by Shusei Tomona...
[CB21] The Lazarus Group's Attack Operations Targeting Japan by Shusei Tomona...
 

More from TanUkkii

Non-blocking IO to tame distributed systems ー How and why ChatWork uses async...
Non-blocking IO to tame distributed systems ー How and why ChatWork uses async...Non-blocking IO to tame distributed systems ー How and why ChatWork uses async...
Non-blocking IO to tame distributed systems ー How and why ChatWork uses async...TanUkkii
 
Architecture of Falcon, a new chat messaging backend system build on Scala
Architecture of Falcon,  a new chat messaging backend system  build on ScalaArchitecture of Falcon,  a new chat messaging backend system  build on Scala
Architecture of Falcon, a new chat messaging backend system build on ScalaTanUkkii
 
Akka Clusterの耐障害設計
Akka Clusterの耐障害設計Akka Clusterの耐障害設計
Akka Clusterの耐障害設計TanUkkii
 
スケールするシステムにおけるエンティティの扱いと 分散ID生成
スケールするシステムにおけるエンティティの扱いと 分散ID生成スケールするシステムにおけるエンティティの扱いと 分散ID生成
スケールするシステムにおけるエンティティの扱いと 分散ID生成TanUkkii
 
すべてのアクター プログラマーが知るべき 単一責務原則とは何か
すべてのアクター プログラマーが知るべき 単一責務原則とは何かすべてのアクター プログラマーが知るべき 単一責務原則とは何か
すべてのアクター プログラマーが知るべき 単一責務原則とは何かTanUkkii
 
ディープニューラルネット入門
ディープニューラルネット入門ディープニューラルネット入門
ディープニューラルネット入門TanUkkii
 
プログラミング言語のパラダイムシフト(ダイジェスト)ーScalaから見る関数型と並列性時代の幕開けー
プログラミング言語のパラダイムシフト(ダイジェスト)ーScalaから見る関数型と並列性時代の幕開けープログラミング言語のパラダイムシフト(ダイジェスト)ーScalaから見る関数型と並列性時代の幕開けー
プログラミング言語のパラダイムシフト(ダイジェスト)ーScalaから見る関数型と並列性時代の幕開けーTanUkkii
 
プログラミング言語のパラダイムシフトーScalaから見る関数型と並列性時代の幕開けー
プログラミング言語のパラダイムシフトーScalaから見る関数型と並列性時代の幕開けープログラミング言語のパラダイムシフトーScalaから見る関数型と並列性時代の幕開けー
プログラミング言語のパラダイムシフトーScalaから見る関数型と並列性時代の幕開けーTanUkkii
 
Isomorphic web development with scala and scala.js
Isomorphic web development  with scala and scala.jsIsomorphic web development  with scala and scala.js
Isomorphic web development with scala and scala.jsTanUkkii
 
Scalaによる型安全なエラーハンドリング
Scalaによる型安全なエラーハンドリングScalaによる型安全なエラーハンドリング
Scalaによる型安全なエラーハンドリングTanUkkii
 
ECMAScript6による関数型プログラミング
ECMAScript6による関数型プログラミングECMAScript6による関数型プログラミング
ECMAScript6による関数型プログラミングTanUkkii
 
プログラミング言語Scala
プログラミング言語Scalaプログラミング言語Scala
プログラミング言語ScalaTanUkkii
 
これからのJavaScriptー関数型プログラミングとECMAScript6
これからのJavaScriptー関数型プログラミングとECMAScript6これからのJavaScriptー関数型プログラミングとECMAScript6
これからのJavaScriptー関数型プログラミングとECMAScript6TanUkkii
 

More from TanUkkii (16)

Non-blocking IO to tame distributed systems ー How and why ChatWork uses async...
Non-blocking IO to tame distributed systems ー How and why ChatWork uses async...Non-blocking IO to tame distributed systems ー How and why ChatWork uses async...
Non-blocking IO to tame distributed systems ー How and why ChatWork uses async...
 
Architecture of Falcon, a new chat messaging backend system build on Scala
Architecture of Falcon,  a new chat messaging backend system  build on ScalaArchitecture of Falcon,  a new chat messaging backend system  build on Scala
Architecture of Falcon, a new chat messaging backend system build on Scala
 
JSON CRDT
JSON CRDTJSON CRDT
JSON CRDT
 
Akka Clusterの耐障害設計
Akka Clusterの耐障害設計Akka Clusterの耐障害設計
Akka Clusterの耐障害設計
 
WaveNet
WaveNetWaveNet
WaveNet
 
スケールするシステムにおけるエンティティの扱いと 分散ID生成
スケールするシステムにおけるエンティティの扱いと 分散ID生成スケールするシステムにおけるエンティティの扱いと 分散ID生成
スケールするシステムにおけるエンティティの扱いと 分散ID生成
 
Akka HTTP
Akka HTTPAkka HTTP
Akka HTTP
 
すべてのアクター プログラマーが知るべき 単一責務原則とは何か
すべてのアクター プログラマーが知るべき 単一責務原則とは何かすべてのアクター プログラマーが知るべき 単一責務原則とは何か
すべてのアクター プログラマーが知るべき 単一責務原則とは何か
 
ディープニューラルネット入門
ディープニューラルネット入門ディープニューラルネット入門
ディープニューラルネット入門
 
プログラミング言語のパラダイムシフト(ダイジェスト)ーScalaから見る関数型と並列性時代の幕開けー
プログラミング言語のパラダイムシフト(ダイジェスト)ーScalaから見る関数型と並列性時代の幕開けープログラミング言語のパラダイムシフト(ダイジェスト)ーScalaから見る関数型と並列性時代の幕開けー
プログラミング言語のパラダイムシフト(ダイジェスト)ーScalaから見る関数型と並列性時代の幕開けー
 
プログラミング言語のパラダイムシフトーScalaから見る関数型と並列性時代の幕開けー
プログラミング言語のパラダイムシフトーScalaから見る関数型と並列性時代の幕開けープログラミング言語のパラダイムシフトーScalaから見る関数型と並列性時代の幕開けー
プログラミング言語のパラダイムシフトーScalaから見る関数型と並列性時代の幕開けー
 
Isomorphic web development with scala and scala.js
Isomorphic web development  with scala and scala.jsIsomorphic web development  with scala and scala.js
Isomorphic web development with scala and scala.js
 
Scalaによる型安全なエラーハンドリング
Scalaによる型安全なエラーハンドリングScalaによる型安全なエラーハンドリング
Scalaによる型安全なエラーハンドリング
 
ECMAScript6による関数型プログラミング
ECMAScript6による関数型プログラミングECMAScript6による関数型プログラミング
ECMAScript6による関数型プログラミング
 
プログラミング言語Scala
プログラミング言語Scalaプログラミング言語Scala
プログラミング言語Scala
 
これからのJavaScriptー関数型プログラミングとECMAScript6
これからのJavaScriptー関数型プログラミングとECMAScript6これからのJavaScriptー関数型プログラミングとECMAScript6
これからのJavaScriptー関数型プログラミングとECMAScript6
 

Recently uploaded

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptbibisarnayak0
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxachiever3003
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxNiranjanYadav41
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 

Recently uploaded (20)

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.ppt
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptx
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptx
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 

Distributed ID generator in ChatWork

  • 2. 2018/02/27 © ChatWork All rights reserved. Agenda ● What is our motivation for distributed ID ● How our distributed ID generator works ● Designing distributed ID generator with actor model ● Implementing distributed ID generator with actor model ● Distributed ID generator in the wild
  • 3. 2018/02/27 © ChatWork All rights reserved. About me ● Yusuke Yasuda / 安田裕介 ● @TanUkkii007 ● Working for ChatWork for 2 years ● Scala developer
  • 4. 2018/02/27 © ChatWork All rights reserved. About ChatWork
  • 5. 2018/02/27 © ChatWork All rights reserved. What is our motivation for distributed ID
  • 6. 2018/02/27 © ChatWork All rights reserved. Messaging system architecture overview You can find more information about our architecture at Kafka summit 2017. Today’s topic
  • 7. 2018/02/27 © ChatWork All rights reserved. Motivation ● Migration from MySQL to Kafka/HBase ○ High scalability ○ No single point of failure ● compatibility with existing IDs ○ Time-ordered, sortable integer IDs ● ID space extension ○ 32bit → 64bit ● ID generator itself is required to be scalable and distributed
  • 8. 2018/02/27 © ChatWork All rights reserved. Snowflake ● Distributed ID generator developed by Twitter ○ https://github.com/twitter/snowflake ● Motivation: migration to Cassandra from MySQL ● Roughly time-ordered, sortable 64bit ID ● > 10k ID/s per process, ~ 2ms response rate 41 bit timestamp 5 bit datacenter ID 5 bit worker ID 12 bit sequenceNr … … … …
  • 9. 2018/02/27 © ChatWork All rights reserved. ZooKeeper ● Developed by Yahoo! Research ○ to simplify distributed system implementation by supporting common patterns used in distributed systems ● Quorum based coordination ● Total order broadcast ● Filesystem-like API ○ Create, GetData, SetData, Exists, GetChildren, Delete ○ Watching node changes We use ZooKeeper to coordinate worker IDs in distributed ID generator.
  • 10. 2018/02/27 © ChatWork All rights reserved. Distributed ID generator system overview ID worker ID worker ID worker ZooKeeper ID client ZooKeeper ensemble ZooKeeper ZooKeeper ID generator ID client in Message Write API
  • 11. 2018/02/27 © ChatWork All rights reserved. How our distributed ID generator works
  • 12. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 Command: GetChildren Path: /id-worker/1 Watch: True Router Datacenter 1 root node command/event watch
  • 13. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker Command: GetChildren Path: /id-worker/1 Watch: True Router command/event watch
  • 14. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** Command: Create Path: /id-worker/1/1 Data: akka.tcp://system@11.2.9.14:*** mode: Ephemeral Router command/event watch
  • 15. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** Event: NodeChildrenChanged Path: /id-worker/1 Router command/event watch
  • 16. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** Command: GetChildren Path: /id-worker/1 Watch: True Router command/event watch
  • 17. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** Command: GetData Path: /id-worker/1/1 Watch: False Router command/event watch
  • 18. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** Identify Router command/event watch
  • 19. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** ActorIdentity worker1 Router command/event watch
  • 20. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** ID request/response worker1 Router command/event watch
  • 21. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** ID worker Command: GetChildren Path: /id-worker/1 Watch: True worker1 Router command/event watch
  • 22. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** ID worker /id-worker/1/2 akka.tcp://system@11.2.9.15:*** Command: Create Path: /id-worker/1/2 Data: akka.tcp://system@11.2.9.15:*** mode: Ephemeral worker1 Router command/event watch
  • 23. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** ID worker /id-worker/1/2 akka.tcp://system@11.2.9.15:*** Event: NodeChildrenChanged Path: /id-worker/1 worker1 Router command/event watch
  • 24. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** ID worker /id-worker/1/2 akka.tcp://system@11.2.9.15:*** Command: GetChildren Path: /id-worker/1 Watch: True worker1 Router command/event watch
  • 25. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** ID worker /id-worker/1/2 akka.tcp://system@11.2.9.15:*** Command: GetData Path: /id-worker/1/2 Watch: False worker1 Router command/event watch
  • 26. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** ID worker /id-worker/1/2 akka.tcp://system@11.2.9.15:*** Identify worker1 Router command/event watch
  • 27. ID worker discovery via ZooKeeper ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** worker1ID worker /id-worker/1/2 akka.tcp://system@11.2.9.15:*** ActorIdentity worker2 Router command/event watch
  • 28. Client-side failure handling ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** worker1ID worker /id-worker/1/2 akka.tcp://system@11.2.9.15:*** worker2 Router command/event watch
  • 29. Client-side failure handling ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** worker1ID worker worker2 Router Event: NodeChildrenChanged Path: /id-worker/1 command/event watch
  • 30. Client-side failure handling ZooKeeper ID client /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** worker1ID worker Router Command: GetChildren Path: /id-worker/1 Watch: True command/event watch
  • 31. Worker ID Consensus process 1. No worker ID duplication invariant ZooKeeper /id-worker/1 ID worker Command: GetChildren Path: /id-worker/1 Watch: True command/event watch
  • 32. ZooKeeper /id-worker/1 ID worker ID worker Command: GetChildren Path: /id-worker/1 Watch: True Worker ID Consensus process 1. No worker ID duplication invariant command/event watch
  • 33. ZooKeeper /id-worker/1 ID worker /id-worker/1/1 akka.tcp://system@11.2.9.14:*** Command: Create Path: /id-worker/1/1 Data: akka.tcp://system@11.2.9.14:*** mode: Ephemeral ID worker Failed with NODEEXISTS Worker ID Consensus process 1. No worker ID duplication invariant command/event watch
  • 34. ZooKeeper /id-worker/1 ID worker /id-worker/1/0 akka.tcp://system@11.2.9.14:*** ID worker /id-worker/1/1 akka.tcp://system@11.2.9.15:*** akka.tcp://system@11.2.9.16:***/id-worker/1/31 ID worker ID worker Command: GetChildren Path: /id-worker/1 Watch: True Worker ID Consensus process 2. No out of range worker ID creation command/event watch
  • 35. ZooKeeper /id-worker/1 ID worker /id-worker/1/0 akka.tcp://system@11.2.9.14:*** ID worker /id-worker/1/1 akka.tcp://system@11.2.9.15:*** akka.tcp://system@11.2.9.16:***/id-worker/1/31 ID worker ID worker Worker ID Consensus process 2. No out of range worker ID creation Suspended because no available worker IDs. command/event watch
  • 37. ZooKeeper /id-worker/1 ID worker ID worker /id-worker/1/1 akka.tcp://system@11.2.9.15:*** akka.tcp://system@11.2.9.16:***/id-worker/1/31 ID worker ID worker Worker ID Consensus process 2. No out of range worker ID creation Event: NodeChildrenChanged Path: /id-worker/1 command/event watch
  • 38. ZooKeeper /id-worker/1 ID worker ID worker /id-worker/1/1 akka.tcp://system@11.2.9.15:*** akka.tcp://system@11.2.9.16:***/id-worker/1/31 ID worker ID worker Worker ID Consensus process 2. No out of range worker ID creation Command: GetChildren Path: /id-worker/1 Watch: True command/event watch
  • 39. ZooKeeper /id-worker/1 ID worker ID worker /id-worker/1/1 akka.tcp://system@11.2.9.15:*** akka.tcp://system@11.2.9.16:***/id-worker/1/31 ID worker ID worker Command: Create Path: /id-worker/1/1 Data: akka.tcp://system@11.2.9.14:*** mode: Ephemeral /id-worker/1/0 akka.tcp://system@11.2.9.14:*** Worker ID Consensus process 2. No out of range worker ID creation command/event watch
  • 40. 2018/02/27 © ChatWork All rights reserved. Automaton notation of ID worker Create my worker ID Initial state Check available worker IDs Wait until a worker ID gets available Ready to generate message IDs Fetch list of worker IDsStart Create a worker ID node Retry Retry Something happened to my worker ID Retry Worker ID conflict No available worker IDs Retry
  • 41. 2018/02/27 © ChatWork All rights reserved. Designing distributed ID generator with actor model
  • 42. 2018/02/27 © ChatWork All rights reserved. Processes, Modules, Automata, Steps Process 1 Module 1 Module 2 Module 3 Process 2 Module 1 Module 2 Module 3 Automaton local interaction communication
  • 43. 2018/02/27 © ChatWork All rights reserved. Distributed algorithm abstraction ● Process: the unit of failure, the unit of communication ● Module: building block of processes ○ communication with modules on peer process ○ local interaction with modules on the same process ● Automata: set of states and transitions that regulates computation steps ● Step: distributed algorithm consists of sequence of steps ○ e.g. receiving a message, sending a message, executing a local computation syntactically identical but different notion Reliable and Secure Distributed Programming 2nd ed.
  • 44. 2018/02/27 © ChatWork All rights reserved. Algorithm abstraction to concrete implementation mapping ● Process → Actor (top level actor with remote interface) ● Module → Actor ○ communication → message passing to remote actors ○ local interaction → message passing to local actors ● Layers of modules → Actor hierarchy tree ● Automata ○ State → Receive partial functions and its internal states ○ Transition → context.become() ● Step: ○ receiving a message → case clauses of Receive function ○ sending a message → ! or tell() function ○ executing a local computation: arbitrary computation syntactically identical
  • 45. 2018/02/27 © ChatWork All rights reserved. Modules in ID worker process ● IdGenerator: exports communication interface ● IdWorker: calculates Snowflake ID ● ZNodeMaster: manages worker ID ZNode ● ReactiveZookeeper: Actor based ZooKeeper client ID client process ZooKeeper process IdGenerator IdWorker ZNodeMaster ReactiveZookeeper ID worker process
  • 46. 2018/02/27 © ChatWork All rights reserved. IdWorker IdGenerator ReactiveZookeeper Designing actor hierarchy based on message flow simplifies its implementation. See Akka in Action section 4.3. Mapping layers of modules to actor hierarchy based on message flow. ZNodeMaster IdGenerator IdWorker ZNodeMaster ReactiveZookeeper ID worker process message flow parent child relationship Designing actor hierarchy based on message flow
  • 47. 2018/02/27 © ChatWork All rights reserved. Designing actor hierarchy based on failure handling IdWorker IdGenerator ReactiveZookeeper ZNodeMaster ZooKeeper application is only valid within a session. Instead of handling session timeout everywhere, locate ReactiveZooKeeper at the top of the hierarchy. Just let it crash on session timeout. message flow parent child relationship IdGenerator IdWorker ZNodeMaster ReactiveZookeeper ID worker process
  • 48. 2018/02/27 © ChatWork All rights reserved. Implementing distributed ID generator with actor model
  • 49. 2018/02/27 © ChatWork All rights reserved. What ZooKeeper application needs ● Asynchronous ● Event driven ● Passive: Don't call us, we'll call you style ● Stateful ● Complicated state machine ● Retry everything ● Crash immediately in fatal situation (session expiration) ● Recover gracefully
  • 50. 2018/02/27 © ChatWork All rights reserved. ReactiveZooKeeper Let-it-crash style ZooKeeper client based on Akka actor https://github.com/TanUkkii007/reactive-zookeeper
  • 51. 2018/02/27 © ChatWork All rights reserved. Implementing automata with Akka class ZNodeMaster(reactiveZK: ActorRef) extends Actor { override def receive: Receive = initial def initial: Receive = { case Start => self ! CheckWorkerIds context.become(checkingWorkerIds) } def checkingWorkerIds: Receive = { case CheckWorkerIds => reactiveZK ! GetChildren("id-worker/1", watch = true) case ChildrenGot(path, children, _) if children.length <= maxWorkerId => val workerId = chooseId(children) context.become(creatingWorkerId(workerId)) self ! CreateWorkerId } def creatingWorkerId(workerId: Int): Receive = { case CreateWorkerId => reactiveZK ! Create(workerIdPath(workerId), address, Ids.OPEN_ACL_UNSAFE.asScala.toList, CreateMode.EPHEMERAL) } } Create my worker ID Initial state Check available worker IDs Fetch list of worker IDsStart State 1 State 2 State 3 state transition with context.become internal state as a closure variable that is only visible within the state ZNodeMaster module
  • 52. 2018/02/27 © ChatWork All rights reserved. Implementing self-loop of automata Check available worker IDs Retry def checkingWorkerIds: Receive = { case CheckWorkerIds => reactiveZK ! GetChildren("id-worker/1", watch = true) case ChildrenGot(path, children, _) if children.length <= maxWorkerId => val workerId = chooseId(children) context.become(creatingWorkerId(workerId)) self ! CreateWorkerId case GetChildrenFailure(e, _, _) if e.code() == Code.CONNECTIONLOSS => self ! CheckWorkerIds } Retry on connection loss by sending the same message to self. ZNodeMaster module
  • 53. 2018/02/27 © ChatWork All rights reserved. Implementing communication interface def receive: Receive = { case GetIdWorkerAddress(WorkerId(workerId)) => reactiveZK ! GetData(s"/${settings.rootNode}/${settings.dcId}/$workerId", watch = false, ctx = workerId) case DataGot(path, data, stat, workerId: Long) => val idWorkerZNode = deserializeIdWorkerZNode(data) val workerPath = idWorkerZNode.fullActorPath val selection = context.actorSelection(workerPath) selection ! Identify(workerId) case ActorIdentity(workerId: Long, Some(workerRef)) => workers += WorkerId(workerId) -> context.watch(workerRef) val newRoutee = IdWorkerRoutee(WorkerId(workerId), workerRef) if (!router.routees.contains(newRoutee)) router = router.addRoutee(newRoutee) case msg: GenerateId => router.route(IdWorkerGenerateId(msg.requestId.toString), sender()) } Remote messages. ID worker client implementation Configuring RemoteActorRef Provider let us write the communication syntactically same as the local interaction.
  • 54. 2018/02/27 © ChatWork All rights reserved. Implementing crash recovery class ZooKeeperSessionActor(childProps: Props) extends Actor with WatcherCallback{ val zookeeper = new ZooKeeper("zookeeper:2181", 5000, watchCallback(self)) val zookeeperOperation: ActorRef = context.actorOf(ZooKeeperOperationActor.props(zookeeper)) val childActor: ActorRef = context.actorOf(childProps, childName)) def receive: Receive = { case ZooKeeperWatchEvent(e) => e.getState match { case Expired => throw ZooKeeperSessionRestartException(None) case _ => } case cmd: ZKOperations.ZKCommand => zookeeperOperation forward cmd case other => childActor forward other } override def postStop(): Unit = { zookeeper.close() super.postStop() } } Throw exception on ZooKeeper session expiration. Note that ZooKeeper application is valid only if its session is valid. Just let it crash on session expiration. Akka automatically restart child actors. Stale states of actors are refreshed on restart. ReactiveZooKeeper module
  • 55. 2018/02/27 © ChatWork All rights reserved. Edge case that breaks ID uniqueness ZooKeeper ID worker 1 ID worker 2 ID client 1 delete worker ID 1 session expired NodeChildren Changed my worker ID is 1 my worker ID is 1 ID client 2 worker 1’s ID is 1 worker 2’s ID is 1 worker 1’s ID is 1 worker 2’s ID is 1 NodeChildren Changed network partition create worker ID 1 session timeout ID duplication risk! network recovered
  • 56. 2018/02/27 © ChatWork All rights reserved. Mitigate condition that breaks consensus inferred by FLP impossibility def receive: Receive = { case ZooKeeperWatchEvent(e) => e.getState match { case Expired => throw ZooKeeperSessionRestartException(None) case Disconnected => throw new ConnectionRecoveryTimeoutException(connectionTimeout) case _ => } case cmd: ZKOperations.ZKCommand => zookeeperOperation forward cmd case other => childActor forward other } Fail fast on disconnection. ReactiveZooKeeper module
  • 57. 2018/02/27 © ChatWork All rights reserved. Distributed ID generator in the wild
  • 58. 2018/02/27 © ChatWork All rights reserved. ID generation latency and throughput The stress test used a gatling plugin for Akka Remote protocol. https://github.com/chatwork/gatling-akka Single ID worker performance
  • 59. 2018/02/27 © ChatWork All rights reserved. Effect of ZooKeeper node down ID generation throughput The number of ID workers connected to the downed ZooKeeper Error count of ID clients # of affected workers by ZooKeeper down Effect 1/3 Not observable 2/3 Relatively high error rate 3/3 Throughput decrease High error rate We use redundant requests to mitigate effects of ID worker down.
  • 60. 2018/02/27 © ChatWork All rights reserved. Write API latency (average) ~6ms improvement. The distributed ID generator release
  • 61. 2018/02/27 © ChatWork All rights reserved. Write API latency (100, 95, 50 pt.) The distributed ID generator release 95 pt. is not much improved.
  • 62. 2018/02/27 © ChatWork All rights reserved. ID Generator latency (average) The distributed ID generator release ~8ms improvement.
  • 63. 2018/02/27 © ChatWork All rights reserved. ID Generator latency (100, 95, 50 pt.) The distributed ID generator release Worse 95pt. Possible fixes: - use higher throughput for dispatcher - use pinned dispatcher - competitive redundant requests with scatter gather pattern
  • 64. 2018/02/27 © ChatWork All rights reserved. We are hiring! https://corp.chatwork.com/ja/recruit/