SlideShare uma empresa Scribd logo
1 de 102
Baixar para ler offline
Writing Networking Clients in Go
The Design and Implementation of the client from
Waldemar Quevedo /
GopherCon 2017
@wallyqs
1 . 1
Waldemar Quevedo /
So ware Developer at
Development of the Apcera Platform
NATS clients maintainer
Using NATS based systems since 2012
Go favorite feature: Performance!
About me
@wallyqs
Apcera
2 . 1
About this talk
Brief intro to the NATS project and its protocol
Deep dive into how the NATS Go client is implemented
3 . 1
We'll cover…
Techniques used and trade-offs in the Go client:
Backpressure of I/O and writes coalescing
Fast protocol parsing
Graceful reconnection on server failover
Usage & Non-usage of channels for communicating
Closures, callbacks & channels for events/custom logic
4 . 1
Agenda
Intro to the NATS Project
The NATS Protocol
NATS Client Deep Dive
IO Engine
Connecting & Graceful Reconnection
Sync & Async Subscriptions
Request/Response
Performance
5 . 1
Agenda
Intro to the NATS Project
The NATS Protocol
NATS Client Deep Dive
IO Engine
Connecting & Graceful Reconnection
Sync & Async Subscriptions
Request/Response
Performance
6 . 1
Brief Intro to the NATS Project
High Performance Messaging System
Open Source, MIT License
Created by
Github:
Website:
Development sponsored by
Derek Collison
https://github.com/nats-io
http://nats.io/
7 . 1
Used in production by thousands of users for…
Building Microservices Control Planes
Internal communication among components
Service Discovery
Low Latency Request Response RPC
Fire and Forget PubSub
8 . 1
Simple & Lightweight Design
TCP/IP based
Plain Text protocol with few commands
Easy to use API
Small binary of ~7MB
Little config
Just fire and forget, no built-in persistence
At-most-once delivery guarantees
9 . 1
On Delivery Guarantees…
End-To-End Arguments In System Design (J.H. Saltzer, D.P. Reed and D.D. Clark)
"...a lower-level subsystem that supports a distributed application
may be wasting its effort providing a function that must by nature be
implemented at the application level anyway can be applied to a
variety of functions in addition to reliable data transmission."
"Perhaps the oldest and most widely known form of the argument
concerns acknowledgement of delivery. A data communication
network can easily return an acknowledgement to the sender for
every message delivered to a recipient."
10 . 1
End-To-End Arguments In System Design (J.H. Saltzer, D.P. Reed and D.D. Clark)
Although this acknowledgement may be useful within the network as a
form of congestion control (...) it was never found to be very helpful to
applications(...). The reason is that knowing for sure that the message
was delivered to the target host is not very important.
What the application wants to know is whether or not the target host
acted on the message...
11 . 1
End-To-End Arguments In System Design (J.H. Saltzer, D.P. Reed and D.D. Clark)
More info
Recommended Talk! ⭐⭐⭐⭐⭐
"Design Philosophy in Networked Systems" by Justine Sherry (PWLConf'16)
https://en.wikipedia.org/wiki/End-to-end_principle
http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.txt
https://www.youtube.com/watch?v=aR_UOSGEizE
13 . 1
NATS Streaming
For at-least-once delivery check (also OSS, MIT License).
It is a layer on top of NATS core which enhances it with message redelivery
features and persistence.
NATS Streaming
https://github.com/nats-io/nats-streaming-server
14 . 1
Agenda
Intro to the NATS Project
The NATS Protocol
NATS Client Deep Dive
IO Engine
Connecting & Graceful Reconnection
Sync & Async Subscriptions
Request/Response
Performance
15 . 1
The NATS Protocol
Client -> Server
| PUB | SUB | UNSUB | CONNECT |
Client <- Server
| INFO | MSG | -ERR | +OK |
Client <-> Server
| PING | PONG |
16 . 1
As soon as client connects…
17 . 1
…it receives an INFO string from server
18 . 1
Clients can customize connection via CONNECT
19 . 1
We can send PING to the server…
20 . 1
…to which we will get a PONG back.
21 . 1
Server periodically sends PING to clients too…
22 . 1
…to which clients have to PONG back.
23 . 1
Otherwise, server terminates connection eventually.
24 . 1
To announce interest in a subject, we use SUB…
25 . 1
…and PUB to send a command to connected clients.
26 . 1
Then, a MSG will be delivered to interested clients…
27 . 1
Interest in subject can be limited via UNSUB…
28 . 1
…then if many messages are sent, we only get one.
29 . 1
At any time, the server can send us an error…
30 . 1
…and disconnect us a er sending the error.
31 . 1
Example Telnet session
32 . 1
Agenda
Intro to the NATS Project
The NATS Protocol
NATS Client Deep Dive
IO Engine
Optimized Flushing
Fast Protocol Parsing Engine
Connecting & Graceful Reconnection
Sync & Async Subscriptions
Request / Response APIs
Performance
33 . 1
NATS Client Engine
The client is built around a couple of goroutines which cooperate to read / write
messages as fast as possible asynchronously.
34 . 1
NATS Client Engine
// spinUpGoRoutines will launch the goroutines responsible for
// reading and writing to the socket. This will be launched via a
// go routine itself to release any locks that may be held.
// We also use a WaitGroup to make sure we only start them on a
// reconnect when the previous ones have exited.
func (nc *Conn) spinUpGoRoutines() {
// Make sure everything has exited.
nc.waitForExits(nc.wg)
// Create a new waitGroup instance for this run.
nc.wg = &sync.WaitGroup{}
// We will wait on both.
nc.wg.Add(2)
// Spin up the readLoop and the socket flusher.
go nc.readLoop(nc.wg)
go nc.flusher(nc.wg)
35 . 1
NATS Client Engine
36 . 1
Flushing mechanism
Implemented via a buffered channel which is used to signal a flush of any
pending data in the bufio.Writer.
nc.fch = make(chan struct{}, 1024)
// flusher is a separate goroutines that will process flush requests for the write
// bufio. This allows coalescing of writes to the underlying socket.
func (nc *Conn) flusher(wg *sync.WaitGroup) {
defer wg.Done()
// ...
for {
if _, ok := <-fch; !ok {
return
}
// ...
if bw.Buffered() > 0 {
bw.Flush()
}
37 . 1
Flushing mechanism
On each command being sent to the server, we prepare a flush of any pending
data in case there are no pending flushes already.
func (nc *Conn) publish(subj, reply string, data []byte) error {
// ...
_, err := nc.bw.Write(msgh)
if err != nil {
return err
}
// Schedule pending flush in case there are none.
if len(nc.fch) == 0 {
select {
case nc.fch <- struct{}{}:
default:
}
}
}
38 . 1
Flushing mechanism
This helps adding backpressure into the client and do writes coalescing to spend
less time writing into the socket.
39 . 1
Reader loop
The client uses a zero allocation parser under a read loop based on the stack
// readLoop() will sit on the socket reading and processing the
// protocol from the server. It will dispatch appropriately based
// on the op type.
func (nc *Conn) readLoop(wg *sync.WaitGroup) {
// ...
// Stack based buffer
b := make([]byte, 32768)
for {
// ...
n, err := conn.Read(b)
if err != nil {
nc.processOpErr(err)
break
}
if err := nc.parse(b[:n]); err != nil {
nc.processOpErr(err)
break
}
// ...
40 . 1
Fast Protocol Parsing Engine
func (nc *Conn) parse(buf []byte) error {
var i int
var b byte
for i = 0; i < len(buf); i++ {
b = buf[i]
switch nc.ps.state {
case OP_START:
switch b {
case 'M', 'm':
nc.ps.state = OP_M
case 'P', 'p':
nc.ps.state = OP_P
case 'I', 'i':
nc.ps.state = OP_I
// ...
case MSG_PAYLOAD:
if nc.ps.msgBuf != nil {
if len(nc.ps.msgBuf) >= nc.ps.ma.size {
nc.processMsg(nc.ps.msgBuf)
nc.ps.argBuf, nc.ps.msgBuf, nc.ps.state = nil, n
// ...
41 . 1
Fast Protocol Parsing Engine
const (
OP_START = iota
OP_PLUS // +
OP_PLUS_O // +O
OP_PLUS_OK // +OK
OP_MINUS // -
OP_MINUS_E // -E
OP_MINUS_ER // -ER
OP_MINUS_ERR // -ERR
OP_MINUS_ERR_SPC // -ERR
MINUS_ERR_ARG // -ERR '
OP_M // M
OP_MS // MS
OP_MSG // MSG
OP_MSG_SPC // MSG
MSG_ARG // MSG
MSG_PAYLOAD // MSG foo bar 1 rn
MSG_END // MSG foo bar 1 rn
OP_P // P
OP_PI // PI
OP_PIN // PIN
OP_PING // PING
OP_PO // PO
OP_PON // PON
OP_PONG // PONG
OP_I // I
OP_IN // IN
OP_INF // INF
OP_INFO // INFO
OP_INFO_SPC // INFO
INFO_ARG // INFO
)
42 . 1
Commands are sent asynchronously!
Publishing/Subscribing actions do not block, instead we rely on the client
internal engine to eventually flush and send to the server.
In example above, there is no guarantee that foo subscription will be receiving
the message (may not have flushed yet).
nc1, _ := nats.Connect("nats://demo.nats.io:4222")
nc2, _ := nats.Connect("nats://demo.nats.io:4222")
nc1.Subscribe("foo", func(m *nats.Msg){
log.Printf("[Received] %+v", m)
})
nc2.Publish("foo", []byte("Hello World"))
43 . 1
FAQ
If commands are sent asynchronously, how to ensure that a command sent has
been processed by the server?
44 . 1
Ensure ordering via PING / PONG
Ordering is guaranteed per connection, thus sending a PING first immediately
a er SUB would guarantee us receiving PONG before any other matching MSG on
the subscription.
45 . 1
NATS Flush
The client provides a Flush API which both flushes the bufio.Writer and does the
server roundtrip for us.
nc1, _ := nats.Connect("nats://demo.nats.io:4222")
nc2, _ := nats.Connect("nats://demo.nats.io:4222")
nc1.Subscribe("foo", func(m *nats.Msg){
log.Printf("[Received] %+v", m)
})
// Sends PING, flushes buffer, and blocks until receiving PONG reply.
nc1.Flush()
// Would be received by subscription
nc2.Publish("foo", []byte("Hello World"))
46 . 1
NATS Flush
We create an unbuffered channel per each pending pong reply, and append into
an array of channels which represent the pending replies.
nc.pongs = make([]chan struct{}, 0, 8)
func (nc *Conn) FlushTimeout(timeout time.Duration) (err error) {
// ...
ch := make(chan struct{})
nc.pongs = append(nc.pongs, ch)
nc.bw.WriteString("PINGrn")
nc.bw.Flush()
// ...
select {
case <-ch:
close(ch)
case <-time.After(timeout):
return ErrTimeout
}
47 . 1
NATS Flush
Then when we receive the PONG back, we signal the client that we have received
it so that it stops blocking.
// processPong is used to process responses to the client's ping
// messages. We use pings for the flush mechanism as well.
func (nc *Conn) processPong() {
var ch chan struct{}
nc.mu.Lock()
if len(nc.pongs) > 0 {
ch = nc.pongs[0]
nc.pongs = nc.pongs[1:]
}
nc.pout = 0
nc.mu.Unlock()
if ch != nil {
// Signal that PONG was received.
ch <- struct{}{}
}
}
48 . 1
Intro to the NATS Project
The NATS Protocol
NATS Client Deep Dive
IO Engine
Connecting & Graceful Reconnection
Synchronous & Asynchronous Subscriptions
Request/Response
Performance
49 . 1
Connecting Options
For connecting to NATS, we use the functional options pattern to be able to
extend the customizations.
// Connect will attempt to connect to the NATS system.
// The url can contain username/password semantics. e.g. nats://derek:pass@localhost:422
// Comma separated arrays are also supported, e.g. urlA, urlB.
// Options start with the defaults but can be overridden.
func Connect(url string, options ...Option) (*Conn, error) {
opts := DefaultOptions
opts.Servers = processUrlString(url)
for _, opt := range options {
if err := opt(&opts); err != nil {
return nil, err
}
}
return opts.Connect()
}
https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis
50 . 1
Connecting Options
For example, to customize set of credentials sent on CONNECT…
CONNECT {"user":"secret", "pass":"password"}rn
// UserInfo is an Option to set the username and password to
// use when not included directly in the URLs.
func UserInfo(user, password string) Option {
return func(o *Options) error {
o.User = user
o.Password = password
return nil
}
}
// Usage:
//
nats.Connect("nats://127.0.0.1:4222", nats.UserInfo("secret", "password"))
51 . 1
Connecting Options
Or to customize TLS:
nc, err = nats.Connect(secureURL,
nats.RootCAs("./ca.pem"),
nats.ClientCert("./client-cert.pem", "./client-key.pem"))
if err != nil {
t.Fatalf("Failed to create (TLS) connection: %v", err)
}
defer nc.Close()
52 . 1
Connecting & Reconnecting flow
53 . 1
Connecting & Reconnecting flow
54 . 1
Reconnecting Options
By default, the client has reconnecting enabled but can be customized too.
This means that if there is ever an issue with the server that we were connected
to, then it will try to connect to another one in the pool.
// ReconnectWait is an Option to set the wait time between reconnect attempts.
func ReconnectWait(t time.Duration) Option {
return func(o *Options) error {
o.ReconnectWait = t
return nil
}
}
// MaxReconnects is an Option to set the maximum number of reconnect attempts.
func MaxReconnects(max int) Option {
return func(o *Options) error {
o.MaxReconnect = max
return nil
}
}
55 . 1
Reconnection logic
While the client is in the RECONNECTING state, the underlying bufio.Writer
is replaced with a bytes.Buffer instead of the connection.
56 . 1
Reconnection logic
if nc.Opts.AllowReconnect && nc.status == CONNECTED {
// Set our new status
nc.status = RECONNECTING
if nc.ptmr != nil {
nc.ptmr.Stop()
}
if nc.conn != nil {
nc.bw.Flush()
nc.conn.Close()
nc.conn = nil
}
// Create a new pending buffer to underpin the bufio Writer while
// we are reconnecting.
nc.pending = &bytes.Buffer{}
nc.bw = bufio.NewWriterSize(nc.pending, nc.Opts.ReconnectBufSize)
go nc.doReconnect()
nc.mu.Unlock()
return
}
57 . 1
Reconnection logic
Then, on reconnect it is flushed once preparing to send all the subscriptions
again.
CONNECT...rnPINGrnSUB foo 1rnSUB bar 2rnPUB hello
5rnworldrn
func (nc *Conn) doReconnect() { // note: a lot was elided here
for len(nc.srvPool) > 0 {
server, _ := nc.selectNextServer()
nc.createConn()
nc.resendSubscriptions()
if nc.pending == nil {
if nc.pending.Len() > 0 {
nc.bw.Write(nc.pending.Bytes())
}
}
// Flush the buffer and retry next server on failure
nc.err = nc.bw.Flush()
if nc.err != nil {
nc.status = RECONNECTING
continue
}
58 . 1
Disconnection and Reconnection Callbacks
We can set event callbacks via connect options which could be helpful for
debugging or adding custom logic.
nc, err = nats.Connect("nats://demo.nats.io:4222",
nats.MaxReconnects(5),
nats.ReconnectWait(2 * time.Second),
nats.DisconnectHandler(func(nc *nats.Conn) {
log.Printf("Got disconnected!n")
}),
nats.ReconnectHandler(func(nc *nats.Conn) {
log.Printf("Got reconnected to %v!n", nc.ConnectedUrl())
}),
nats.ClosedHandler(func(nc *nats.Conn) {
log.Printf("Connection closed. Reason: %qn", nc.LastError())
}),
nats.ErrorHandler(func(nc *nats.Conn, sub *nats.Subscription, err error) {
log.Printf("Error: %sn", err)
})
)
59 . 1
Disconnection and Reconnection Callbacks
60 . 1
Async Error & Event Callbacks
In order to ensure ordering of event callbacks, a channel is used…
// Create the async callback channel.
nc.ach = make(chan asyncCB, 32)
// Slow consumer error in case our buffered channel blocked in a Subscription.
if nc.Opts.AsyncErrorCB != nil {
nc.ach <- func() { nc.Opts.AsyncErrorCB(nc, sub, ErrSlowConsumer) }
}
// On disconnection events
if nc.Opts.DisconnectedCB != nil && nc.conn != nil {
nc.ach <- func() { nc.Opts.DisconnectedCB(nc) }
}
if nc.Opts.ClosedCB != nil {
nc.ach <- func() { nc.Opts.ClosedCB(nc) }
}
// On reconnect event
if nc.Opts.ReconnectedCB != nil {
nc.ach <- func() { nc.Opts.ReconnectedCB(nc) }
}
61 . 1
Async Error & Event Callbacks
…then an asyncDispatch goroutine will be responsible of dispatching them in
sequence until channel closed.
// Connect will attempt to connect to a NATS server with multiple options.
func (o Options) Connect() (*Conn, error) {
nc := &Conn{Opts: o}
// ...
// Spin up the async cb dispatcher on success
go nc.asyncDispatch()
// asyncDispatch is responsible for calling any async callbacks
func (nc *Conn) asyncDispatch() {
// ...
// Loop on the channel and process async callbacks.
for {
if f, ok := <-ach; !ok {
return
} else {
f()
}
}
}
62 . 1
Intro to the NATS Project
The NATS Protocol
NATS Client Deep Dive
IO Engine
Connecting & Graceful Reconnection
Sync & Async Subscriptions
Channels and Callback based APIs
Request / Response APIs
Performance
63 . 1
Subscription Types
There are multiple APIs for processing the messages received from server.
// Async Subscriber
nc.Subscribe("foo", func(m *nats.Msg) {
log.Printf("Received a message: %sn", string(m.Data))
})
// Sync Subscriber which blocks
sub, err := nc.SubscribeSync("foo")
m, err := sub.NextMsg(timeout)
// Channel Subscriber
ch := make(chan *nats.Msg, 64)
sub, err := nc.ChanSubscribe("foo", ch)
msg := <- ch
64 . 1
Subscription Types
Each type of subscription has a QueueSubscribe variation too.
// Async Queue Subscriber
nc.QueueSubscribe("foo", "group", func(m *nats.Msg) {
log.Printf("Received a message: %sn", string(m.Data))
})
// Sync Queue Subscriber which blocks
sub, err := nc.QueueSubscribeSync("foo", "group")
m, err := sub.NextMsg(timeout)
// Channel Subscriber
ch := make(chan *nats.Msg, 64)
sub, err := nc.ChanQueueSubscribe("foo", "group", ch)
msg := <- ch
65 . 1
Async Subscribers
Subscribe takes a subject and callback and then dispatches each of the
messages to the callback, processing them sequentially.
nc.Subscribe("foo", func(m *nats.Msg) {
log.Printf("Received a message: %sn", string(m.Data))
})
66 . 1
Async Subscribers
Then, we need to keep in mind that when multiple messages are received on the
same subscription, there is potential of head of line blocking issues.
nc.Subscribe("foo", func(m *nats.Msg) {
log.Printf("Received a message: %sn", string(m.Data))
// Next message would not be processed until
// this sleep is done.
time.Sleep(5 * time.Second)
})
67 . 1
Async Subscribers
If that is an issue, user can opt into parallelize the processing by spinning up
goroutines when receiving a message.
Determining whether to use a goroutine and implementing backpressure is up to
the user.
nc.Subscribe("foo", func(m *nats.Msg) {
log.Printf("Received a message: %sn", string(m.Data))
go func(){
// Next message would not block because
// of this sleep.
time.Sleep(5 * time.Second)
}()
})
68 . 1
Async Subscribers
Each Async Subscription has its own goroutine for processing so other
subscriptions can still be processing messages (they do not block each other).
69 . 1
Async Subscribers
Async subscribers are implemented via a linked list of messages as they were
received, in combination with use of conditional variables (no channels)
One main benefit of this approach, is that the list can grow only as needed when
not sure of how many messages will be received.
This way when having many subscriptions we do not have to allocate a channel
of a large size in order to not be worried of dropping messages.
type Subscription struct {
// ...
pHead *Msg
pTail *Msg
pCond *sync.Cond
// ...
70 . 1
Async Subscribers
When subscribing, a goroutine is spun for the subscription, waiting for the parser
to feed us new messages.
func (nc *Conn) subscribe(...) {
// ...
// If we have an async callback, start up a sub specific
// goroutine to deliver the messages.
if cb != nil {
sub.typ = AsyncSubscription
sub.pCond = sync.NewCond(&sub.mu)
go nc.waitForMsgs(sub)
71 . 1
Async Subscribers
Then we wait for the conditional…
func (nc *Conn) waitForMsgs(s *Subscription) {
var closed bool
var delivered, max uint64
for {
s.mu.Lock()
if s.pHead == nil && !s.closed {
// Parser will signal us when a message for
// this subscription is received.
s.pCond.Wait()
}
// ...
mcb := s.mcb
// ...
// Deliver the message.
if m != nil && (max == 0 || delivered <= max) {
mcb(m)
}
72 . 1
Async Subscribers
…until signaled by the parser when getting a message.
func (nc *Conn) processMsg(data []byte) {
// ...
sub := nc.subs[nc.ps.ma.sid]
// ...
if sub.pHead == nil {
sub.pHead = m
sub.pTail = m
sub.pCond.Signal() //
} else {
sub.pTail.next = m
sub.pTail = m
}
73 . 1
Sync & Channel Subscribers
Internally, they are fairly similar since both are channel based.
// Sync Subscriber which blocks until receiving next message.
sub, err := nc.SubscribeSync("foo")
m, err := sub.NextMsg(timeout)
// Channel Subscriber
ch := make(chan *nats.Msg, 64)
sub, err := nc.ChanSubscribe("foo", ch)
msg := <- ch
74 . 1
Sync & Channel Subscribers
Once parser yields a message, it is sent to the channel.
Client needs to be ready to receive, otherwise if it blocks then it would drop the
message and report a slow consumer error in the async error callback.
func (nc *Conn) processMsg(data []byte) {
// ...
sub := nc.subs[nc.ps.ma.sid]
// ...
// We have two modes of delivery. One is the channel, used by channel
// subscribers and syncSubscribers, the other is a linked list for async.
if sub.mch != nil {
select {
case sub.mch <- m:
default:
goto slowConsumer
}
75 . 1
Intro to the NATS Project
The NATS Protocol
NATS Client Deep Dive
IO Engine
Connecting & Graceful Reconnection
Sync & Async Subscriptions
Request/Response
Unique inboxes generation
Context support for cancellation
Performance
76 . 1
Request / Response
Building on top of PubSub mechanisms available, the client has support for 1:1
style of communication.
func main() {
nc, err := nats.Connect("nats://127.0.0.1:4222")
if err != nil {
log.Fatalf("Error: %s", err)
}
nc.Subscribe("help", func(m *nats.Msg) {
log.Printf("[Received] %+v", m)
nc.Publish(m.Reply, []byte("I can help"))
})
response, err := nc.Request("help", []byte("please"), 2*time.Second)
if err != nil {
log.Fatalf("Error: %s", err)
}
log.Printf("[Response] %+v", response)
}
77 . 1
Request / Response
Protocol-wise, this is achieved by using ephemeral subscriptions.
Requestor
Responder
SUB _INBOX.95jM7MgZoxZt94fhLpewjg 2
UNSUB 2 1
PUB help _INBOX.95jM7MgZoxZt94fhLpewjg 6
please
MSG _INBOX.95jM7MgZoxZt94fhLpewjg 2 10
I can help
SUB help 1
MSG help 1 _INBOX.95jM7MgZoxZt94fhLpewjg 6
please
PUB _INBOX.95jM7MgZoxZt94fhLpewjg 10
I can help
78 . 1
Unique Inboxes Generation
The subscriptions identifiers are generated via the NUID which can generate identifiers as
fast as ~16 M per second.
https://github.com/nats-io/nuid
79 . 1
Request / Response
func (nc *Conn) Request(subj string, data []byte, timeout time.Duration) (
*Msg,
error,
) {
inbox := NewInbox()
ch := make(chan *Msg, RequestChanLen)
s, err := nc.subscribe(inbox, _EMPTY_, nil, ch)
if err != nil {
return nil, err
}
s.AutoUnsubscribe(1) // UNSUB after receiving 1
defer s.Unsubscribe()
// PUB help _INBOX.nwOOaWSe...
err = nc.PublishRequest(subj, inbox, data)
if err != nil {
return nil, err
}
// Block until MSG is received or timeout.
return s.NextMsg(timeout)
}
80 . 1
Request / Response (Context variation)
func (nc *Conn) RequestWithContext(ctx context.Context, subj string, data []byte) (
*Msg,
error,
) {
inbox := NewInbox()
ch := make(chan *Msg, RequestChanLen)
s, err := nc.subscribe(inbox, _EMPTY_, nil, ch)
if err != nil {
return nil, err
}
s.AutoUnsubscribe(1) // UNSUB after receiving 1
defer s.Unsubscribe()
// PUB help _INBOX.nwOOaWSe...
err = nc.PublishRequest(subj, inbox, data)
if err != nil {
return nil, err
}
// Block until MSG is received or context canceled.
return s.NextMsgWithContext(ctx)
}
81 . 1
Request / Response (+v1.3.0)
In next release of the client, a less chatty version of the request/response mechanism is available.
Requestor
Responder
SUB _INBOX.nwOOaWSeWrt0ok20pKFfNz.* 2
PUB help _INBOX.nwOOaWSeWrt0ok20pKFfNz.nwOOaWSeWrt0ok20pKFfPo 6
please
MSG _INBOX.nwOOaWSeWrt0ok20pKFfNz.nwOOaWSeWrt0ok20pKFfPo 2 10
I can help
SUB help 1
MSG help 1 _INBOX.nwOOaWSeWrt0ok20pKFfNz.nwOOaWSeWrt0ok20pKFfPo 6
please
PUB _INBOX.nwOOaWSeWrt0ok20pKFfNz.nwOOaWSeWrt0ok20pKFfPo 10
I can help
82 . 1
Request / Response (new version)
func (nc *Conn) Request(subj string, data []byte, timeout time.Duration) (*Msg, error) {
// ... (note: not actual implementation)
nc.mu.Lock()
if nc.respMap == nil {
// Wildcard subject for all requests, e.g. INBOX.abcd.*
nc.respSub = fmt.Sprintf("%s.*", NewInbox())
nc.respMap = make(map[string]chan *Msg)
}
mch := make(chan *Msg, RequestChanLen)
respInbox := nc.newRespInbox() // Await for reply on subject: _INBOX.abcd.xyz
token := respToken(respInbox) // Last token identifying this request: xyz
nc.respMap[token] = mch
// ...
nc.mu.Unlock()
// Broadcast request, then wait for reply or timeout.
if err := nc.PublishRequest(subj, respInbox, data); err != nil {
return nil, err
}
// (Wait for message or timer timeout)
select {
case msg, ok := <-mch:
// Message received or Connection Closed
case <-time.After(timeout):
// Discard request and yield error
return nil, ErrTimeout
}
83 . 1
Request / Response (new version+context)
func (nc *Conn) RequestWithContext(ctx context.Context, subj string, data []byte) (*Msg,
// ... (note: not actual implementation)
nc.mu.Lock()
if nc.respMap == nil {
// Wildcard subject for all requests, e.g. INBOX.abcd.*
nc.respSub = fmt.Sprintf("%s.*", NewInbox())
nc.respMap = make(map[string]chan *Msg)
}
mch := make(chan *Msg, RequestChanLen)
respInbox := nc.newRespInbox() // Await for reply on subject: _INBOX.abcd.xyz
token := respToken(respInbox) // Last token identifying this request: xyz
nc.respMap[token] = mch
// ...
nc.mu.Unlock()
// Broadcast request, then wait for reply or timeout.
if err := nc.PublishRequest(subj, respInbox, data); err != nil {
return nil, err
}
// (Wait for message or timer timeout)
select {
case msg, ok := <-mch:
// Message received or Connection Closed
case <-ctx.Done():
// Discard request and yield error
return nil, ErrTimeout
}
84 . 1
Request / Response (new version)
// In Request(...)
//
// Make sure scoped subscription is setup only once via sync.Once
var err error
nc.respSetup.Do(func() {
var s *Subscription
// Create global subscription: INBOX.abcd.*
s, err = nc.Subscribe(ginbox, func(m *Msg){
// Get token: xyz ← INBOX.abcd.xyz
rt := respToken(m.Subject)
nc.mu.Lock()
// Signal channel that we received the message.
mch := nc.respMap[rt]
delete(nc.respMap, rt)
nc.mu.Unlock()
select {
case mch <- m:
default:
return
}
})
nc.mu.Lock()
nc.respMux = s
nc.mu.Unlock()
})
85 . 1
Intro to the NATS Project
The NATS Protocol
NATS Client Deep Dive
IO Engine
Connecting & Graceful Reconnection
Synchronous & Asynchronous Subscriptions
Request / Response APIs
Performance
86 . 1
nats-bench
Current implementation of the client/server show a performance of ~10M
messages per second (1B payload)
87 . 1
Let's try escape analysis on a simple snippet
10 | func main() {
11 | nc, err := nats.Connect("nats://127.0.0.1:4222")
12 | if err != nil {
13 | os.Exit(1)
14 | }
15 |
16 | nc.Subscribe("help", func(m *nats.Msg) {
17 | println("[Received]", m)
18 | })
19 | nc.Flush()
20 |
21 | nc.Request("help", []byte("hello world"), 1*time.Second)
22 |
23 | nc.Publish("help", []byte("please"))
24 | nc.Flush()
25 |
26 | time.Sleep(1 * time.Second)
27 |}
88 . 1
go run -gcflags "-m" example.go
# ...
./example.go:16: can inline main.func1
./example.go:16: func literal escapes to heap
./example.go:16: func literal escapes to heap
./example.go:21: ([]byte)("hello world") escapes to heap # ❗
./example.go:23: ([]byte)("please") escapes to heap # ❗
./example.go:16: main.func1 m does not escape
[Received] 0xc420100050
[Received] 0xc4200de050
89 . 1
90 . 1
Turns out that all the messages being published escape and end up in the heap,
why??
21 | nc.Request("help", []byte("hello world"), 1*time.Second)
22 |
23 | nc.Publish("help", []byte("please"))
91 . 1
When publishing, we are just doing bufio.Writer a er all…
func (nc *Conn) publish(subj, reply string, data []byte) error {
// ...
_, err := nc.bw.Write(msgh)
if err == nil {
_, err = nc.bw.Write(data)
}
if err == nil {
_, err = nc.bw.WriteString(_CRLF_)
}
https://github.com/nats-io/go-nats/blob/master/nats.go#L1956-L1965
92 . 1
https://github.com/golang/go/issues/5492
93 . 1
// Write writes the contents of p into the buffer.
// It returns the number of bytes written.
// If nn < len(p), it also returns an error explaining
// why the write is short.
func (b *Writer) Write(p []byte) (nn int, err error) {
for len(p) > b.Available() && b.err == nil {
var n int
if b.Buffered() == 0 {
// Large write, empty buffer.
// Write directly from p to avoid copy.
n, b.err = b.wr.Write(p) // <---- Interface causes alloc!
} else {
n = copy(b.buf[b.n:], p)
b.n += n
b.Flush()
}
nn += n
p = p[n:]
}
https://github.com/golang/go/blob/ae238688d2813e83f16050408487ea34ba1c2fff/src/bufio/bufio.go#L600-L602
94 . 1
We could swap out the bufio.Writer implementation with a custom one, use
concrete types, etc. then prevent the escaping…
+ var bwr *bufioWriter
+ if conn, ok := w.(*net.TCPConn); ok {
+ bwr = &bufioWriter{
+ buf: make([]byte, size),
+ wr: w,
+ conn: conn,
+ }
+ } else if buffer, ok := w.(*bytes.Buffer); ok {
+ bwr = &bufioWriter{
+ buf: make([]byte, size),
+ wr: w,
+ pending: buffer,
+ }
+ } else if sconn, ok := w.(*tls.Conn); ok {
+ bwr = &bufioWriter{
+ buf: make([]byte, size),
+ wr: w,
+ sconn: sconn,
+ }
+ }
95 . 1
Performance matters, but is it worth it the complexity?
96 . 1
Although for converting number to byte, approach not using
strconv.AppendInt was taken since much more performant.
// msgh = strconv.AppendInt(msgh, int64(len(data)), 10)
var b [12]byte
var i = len(b)
if len(data) > 0 {
for l := len(data); l > 0; l /= 10 {
i -= 1
b[i] = digits[l%10]
}
} else {
i -= 1
b[i] = digits[0]
}
97 . 1
Wrapping up
98 . 1
✅
Techniques used and trade-offs in the Go client:
Backpressure of I/O and writes coalescing
Fast protocol parsing
Graceful reconnection on server failover
Usage & Non-usage of channels for communicating
Closures, callbacks & channels for events/custom logic
99 . 1
Conclusions
NATS approach to simplicity fits really well with what Go provides and has
helped the project greatly.
Go is a very flexible systems programming language, allowing us to choose our
own trade offs.
There is great tooling in Go which makes it possible to analyze when to take
them.
100 . 1
Thanks!
/
Twitter:
github.com/nats-io @nats_io
@wallyqs
101 . 1

Mais conteúdo relacionado

Mais procurados

Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersSadique Puthen
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsFederico Michele Facca
 
NATS + Docker meetup talk Oct - 2016
NATS + Docker meetup talk Oct - 2016NATS + Docker meetup talk Oct - 2016
NATS + Docker meetup talk Oct - 2016wallyqs
 
Kubernetes networking - basics
Kubernetes networking - basicsKubernetes networking - basics
Kubernetes networking - basicsJuraj Hantak
 
Commication Framework in OpenStack
Commication Framework in OpenStackCommication Framework in OpenStack
Commication Framework in OpenStackSean Chang
 
Serverless for the Cloud Native Era with Fission
Serverless for the Cloud Native Era with FissionServerless for the Cloud Native Era with Fission
Serverless for the Cloud Native Era with FissionNATS
 
Building scalable network applications with Netty (as presented on NLJUG JFal...
Building scalable network applications with Netty (as presented on NLJUG JFal...Building scalable network applications with Netty (as presented on NLJUG JFal...
Building scalable network applications with Netty (as presented on NLJUG JFal...Jaap ter Woerds
 
Preview of Apache Pulsar 2.5.0
Preview of Apache Pulsar 2.5.0Preview of Apache Pulsar 2.5.0
Preview of Apache Pulsar 2.5.0StreamNative
 
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATSKubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATSNATS
 
Using RabbitMQ and Netty library to implement RPC protocol
Using RabbitMQ and Netty library to implement RPC protocolUsing RabbitMQ and Netty library to implement RPC protocol
Using RabbitMQ and Netty library to implement RPC protocolTho Q Luong Luong
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 

Mais procurados (13)

Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoorters
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platforms
 
NATS + Docker meetup talk Oct - 2016
NATS + Docker meetup talk Oct - 2016NATS + Docker meetup talk Oct - 2016
NATS + Docker meetup talk Oct - 2016
 
Kubernetes networking - basics
Kubernetes networking - basicsKubernetes networking - basics
Kubernetes networking - basics
 
Commication Framework in OpenStack
Commication Framework in OpenStackCommication Framework in OpenStack
Commication Framework in OpenStack
 
Serverless for the Cloud Native Era with Fission
Serverless for the Cloud Native Era with FissionServerless for the Cloud Native Era with Fission
Serverless for the Cloud Native Era with Fission
 
Building scalable network applications with Netty (as presented on NLJUG JFal...
Building scalable network applications with Netty (as presented on NLJUG JFal...Building scalable network applications with Netty (as presented on NLJUG JFal...
Building scalable network applications with Netty (as presented on NLJUG JFal...
 
Reactive server with netty
Reactive server with nettyReactive server with netty
Reactive server with netty
 
Preview of Apache Pulsar 2.5.0
Preview of Apache Pulsar 2.5.0Preview of Apache Pulsar 2.5.0
Preview of Apache Pulsar 2.5.0
 
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATSKubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
 
Using RabbitMQ and Netty library to implement RPC protocol
Using RabbitMQ and Netty library to implement RPC protocolUsing RabbitMQ and Netty library to implement RPC protocol
Using RabbitMQ and Netty library to implement RPC protocol
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Building Netty Servers
Building Netty ServersBuilding Netty Servers
Building Netty Servers
 

Semelhante a GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementation of the NATS Client

Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm NATS
 
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
Simple and Scalable Microservices: Using NATS with Docker Compose and SwarmSimple and Scalable Microservices: Using NATS with Docker Compose and Swarm
Simple and Scalable Microservices: Using NATS with Docker Compose and SwarmApcera
 
TCP Sockets Tutor maXbox starter26
TCP Sockets Tutor maXbox starter26TCP Sockets Tutor maXbox starter26
TCP Sockets Tutor maXbox starter26Max Kleiner
 
Simple, Secure, Scalable Messaging for the Cloud Native Era - AllThingsOpen 2...
Simple, Secure, Scalable Messaging for the Cloud Native Era - AllThingsOpen 2...Simple, Secure, Scalable Messaging for the Cloud Native Era - AllThingsOpen 2...
Simple, Secure, Scalable Messaging for the Cloud Native Era - AllThingsOpen 2...NATS
 
NATS: Simple, Secure, and Scalable Messaging for the Cloud Native Era
NATS: Simple, Secure, and Scalable Messaging for the Cloud Native EraNATS: Simple, Secure, and Scalable Messaging for the Cloud Native Era
NATS: Simple, Secure, and Scalable Messaging for the Cloud Native EraAll Things Open
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to KafkaDucas Francis
 
Remote Procedure Call
Remote Procedure CallRemote Procedure Call
Remote Procedure CallNadia Nahar
 
Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021
Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021
Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021StreamNative
 
Network-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQNetwork-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQICS
 
Hunting for APT in network logs workshop presentation
Hunting for APT in network logs workshop presentationHunting for APT in network logs workshop presentation
Hunting for APT in network logs workshop presentationOlehLevytskyi1
 
Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Kasper Nissen
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageDamien Dallimore
 

Semelhante a GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementation of the NATS Client (20)

Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
 
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
Simple and Scalable Microservices: Using NATS with Docker Compose and SwarmSimple and Scalable Microservices: Using NATS with Docker Compose and Swarm
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
 
TCP Sockets Tutor maXbox starter26
TCP Sockets Tutor maXbox starter26TCP Sockets Tutor maXbox starter26
TCP Sockets Tutor maXbox starter26
 
Simple, Secure, Scalable Messaging for the Cloud Native Era - AllThingsOpen 2...
Simple, Secure, Scalable Messaging for the Cloud Native Era - AllThingsOpen 2...Simple, Secure, Scalable Messaging for the Cloud Native Era - AllThingsOpen 2...
Simple, Secure, Scalable Messaging for the Cloud Native Era - AllThingsOpen 2...
 
NATS: Simple, Secure, and Scalable Messaging for the Cloud Native Era
NATS: Simple, Secure, and Scalable Messaging for the Cloud Native EraNATS: Simple, Secure, and Scalable Messaging for the Cloud Native Era
NATS: Simple, Secure, and Scalable Messaging for the Cloud Native Era
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
Remote Procedure Call
Remote Procedure CallRemote Procedure Call
Remote Procedure Call
 
NP-lab-manual (1).pdf
NP-lab-manual (1).pdfNP-lab-manual (1).pdf
NP-lab-manual (1).pdf
 
NP-lab-manual.pdf
NP-lab-manual.pdfNP-lab-manual.pdf
NP-lab-manual.pdf
 
NP-lab-manual.docx
NP-lab-manual.docxNP-lab-manual.docx
NP-lab-manual.docx
 
Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021
Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021
Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021
 
Network-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQNetwork-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQ
 
PyNet
PyNetPyNet
PyNet
 
R bernardino hand_in_assignment_week_1
R bernardino hand_in_assignment_week_1R bernardino hand_in_assignment_week_1
R bernardino hand_in_assignment_week_1
 
Hunting for APT in network logs workshop presentation
Hunting for APT in network logs workshop presentationHunting for APT in network logs workshop presentation
Hunting for APT in network logs workshop presentation
 
Introduction to ns3
Introduction to ns3Introduction to ns3
Introduction to ns3
 
thesis
thesisthesis
thesis
 
Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"
 
KrakenD API Gateway
KrakenD API GatewayKrakenD API Gateway
KrakenD API Gateway
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the message
 

Mais de wallyqs

GoSF: Decoupling Services from IP networks with NATS
GoSF: Decoupling Services from IP networks with NATSGoSF: Decoupling Services from IP networks with NATS
GoSF: Decoupling Services from IP networks with NATSwallyqs
 
OSCON: Building Cloud Native Apps with NATS
OSCON:  Building Cloud Native Apps with NATSOSCON:  Building Cloud Native Apps with NATS
OSCON: Building Cloud Native Apps with NATSwallyqs
 
SF Python Meetup - Introduction to NATS Messaging with Python3
SF Python Meetup - Introduction to NATS Messaging with Python3SF Python Meetup - Introduction to NATS Messaging with Python3
SF Python Meetup - Introduction to NATS Messaging with Python3wallyqs
 
Connect Everything with NATS - Cloud Expo Europe
Connect Everything with NATS - Cloud Expo EuropeConnect Everything with NATS - Cloud Expo Europe
Connect Everything with NATS - Cloud Expo Europewallyqs
 
KubeCon NA 2018 - NATS Deep Dive: The Evolution of NATS
KubeCon NA 2018 - NATS Deep Dive: The Evolution of NATSKubeCon NA 2018 - NATS Deep Dive: The Evolution of NATS
KubeCon NA 2018 - NATS Deep Dive: The Evolution of NATSwallyqs
 
NATS for Rubyists - Tokyo Rubyist Meetup
NATS for Rubyists - Tokyo Rubyist MeetupNATS for Rubyists - Tokyo Rubyist Meetup
NATS for Rubyists - Tokyo Rubyist Meetupwallyqs
 
KubeConEU - NATS Deep Dive
KubeConEU - NATS Deep DiveKubeConEU - NATS Deep Dive
KubeConEU - NATS Deep Divewallyqs
 
RustなNATSのClientを作ってみた
RustなNATSのClientを作ってみたRustなNATSのClientを作ってみた
RustなNATSのClientを作ってみたwallyqs
 
サルでもわかるMesos schedulerの作り方
サルでもわかるMesos schedulerの作り方サルでもわかるMesos schedulerの作り方
サルでもわかるMesos schedulerの作り方wallyqs
 

Mais de wallyqs (9)

GoSF: Decoupling Services from IP networks with NATS
GoSF: Decoupling Services from IP networks with NATSGoSF: Decoupling Services from IP networks with NATS
GoSF: Decoupling Services from IP networks with NATS
 
OSCON: Building Cloud Native Apps with NATS
OSCON:  Building Cloud Native Apps with NATSOSCON:  Building Cloud Native Apps with NATS
OSCON: Building Cloud Native Apps with NATS
 
SF Python Meetup - Introduction to NATS Messaging with Python3
SF Python Meetup - Introduction to NATS Messaging with Python3SF Python Meetup - Introduction to NATS Messaging with Python3
SF Python Meetup - Introduction to NATS Messaging with Python3
 
Connect Everything with NATS - Cloud Expo Europe
Connect Everything with NATS - Cloud Expo EuropeConnect Everything with NATS - Cloud Expo Europe
Connect Everything with NATS - Cloud Expo Europe
 
KubeCon NA 2018 - NATS Deep Dive: The Evolution of NATS
KubeCon NA 2018 - NATS Deep Dive: The Evolution of NATSKubeCon NA 2018 - NATS Deep Dive: The Evolution of NATS
KubeCon NA 2018 - NATS Deep Dive: The Evolution of NATS
 
NATS for Rubyists - Tokyo Rubyist Meetup
NATS for Rubyists - Tokyo Rubyist MeetupNATS for Rubyists - Tokyo Rubyist Meetup
NATS for Rubyists - Tokyo Rubyist Meetup
 
KubeConEU - NATS Deep Dive
KubeConEU - NATS Deep DiveKubeConEU - NATS Deep Dive
KubeConEU - NATS Deep Dive
 
RustなNATSのClientを作ってみた
RustなNATSのClientを作ってみたRustなNATSのClientを作ってみた
RustなNATSのClientを作ってみた
 
サルでもわかるMesos schedulerの作り方
サルでもわかるMesos schedulerの作り方サルでもわかるMesos schedulerの作り方
サルでもわかるMesos schedulerの作り方
 

Último

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementation of the NATS Client

  • 1. Writing Networking Clients in Go The Design and Implementation of the client from Waldemar Quevedo / GopherCon 2017 @wallyqs 1 . 1
  • 2. Waldemar Quevedo / So ware Developer at Development of the Apcera Platform NATS clients maintainer Using NATS based systems since 2012 Go favorite feature: Performance! About me @wallyqs Apcera 2 . 1
  • 3. About this talk Brief intro to the NATS project and its protocol Deep dive into how the NATS Go client is implemented 3 . 1
  • 4. We'll cover… Techniques used and trade-offs in the Go client: Backpressure of I/O and writes coalescing Fast protocol parsing Graceful reconnection on server failover Usage & Non-usage of channels for communicating Closures, callbacks & channels for events/custom logic 4 . 1
  • 5. Agenda Intro to the NATS Project The NATS Protocol NATS Client Deep Dive IO Engine Connecting & Graceful Reconnection Sync & Async Subscriptions Request/Response Performance 5 . 1
  • 6. Agenda Intro to the NATS Project The NATS Protocol NATS Client Deep Dive IO Engine Connecting & Graceful Reconnection Sync & Async Subscriptions Request/Response Performance 6 . 1
  • 7. Brief Intro to the NATS Project High Performance Messaging System Open Source, MIT License Created by Github: Website: Development sponsored by Derek Collison https://github.com/nats-io http://nats.io/ 7 . 1
  • 8. Used in production by thousands of users for… Building Microservices Control Planes Internal communication among components Service Discovery Low Latency Request Response RPC Fire and Forget PubSub 8 . 1
  • 9. Simple & Lightweight Design TCP/IP based Plain Text protocol with few commands Easy to use API Small binary of ~7MB Little config Just fire and forget, no built-in persistence At-most-once delivery guarantees 9 . 1
  • 10. On Delivery Guarantees… End-To-End Arguments In System Design (J.H. Saltzer, D.P. Reed and D.D. Clark) "...a lower-level subsystem that supports a distributed application may be wasting its effort providing a function that must by nature be implemented at the application level anyway can be applied to a variety of functions in addition to reliable data transmission." "Perhaps the oldest and most widely known form of the argument concerns acknowledgement of delivery. A data communication network can easily return an acknowledgement to the sender for every message delivered to a recipient." 10 . 1
  • 11. End-To-End Arguments In System Design (J.H. Saltzer, D.P. Reed and D.D. Clark) Although this acknowledgement may be useful within the network as a form of congestion control (...) it was never found to be very helpful to applications(...). The reason is that knowing for sure that the message was delivered to the target host is not very important. What the application wants to know is whether or not the target host acted on the message... 11 . 1
  • 12.
  • 13. End-To-End Arguments In System Design (J.H. Saltzer, D.P. Reed and D.D. Clark) More info Recommended Talk! ⭐⭐⭐⭐⭐ "Design Philosophy in Networked Systems" by Justine Sherry (PWLConf'16) https://en.wikipedia.org/wiki/End-to-end_principle http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.txt https://www.youtube.com/watch?v=aR_UOSGEizE 13 . 1
  • 14. NATS Streaming For at-least-once delivery check (also OSS, MIT License). It is a layer on top of NATS core which enhances it with message redelivery features and persistence. NATS Streaming https://github.com/nats-io/nats-streaming-server 14 . 1
  • 15. Agenda Intro to the NATS Project The NATS Protocol NATS Client Deep Dive IO Engine Connecting & Graceful Reconnection Sync & Async Subscriptions Request/Response Performance 15 . 1
  • 16. The NATS Protocol Client -> Server | PUB | SUB | UNSUB | CONNECT | Client <- Server | INFO | MSG | -ERR | +OK | Client <-> Server | PING | PONG | 16 . 1
  • 17. As soon as client connects… 17 . 1
  • 18. …it receives an INFO string from server 18 . 1
  • 19. Clients can customize connection via CONNECT 19 . 1
  • 20. We can send PING to the server… 20 . 1
  • 21. …to which we will get a PONG back. 21 . 1
  • 22. Server periodically sends PING to clients too… 22 . 1
  • 23. …to which clients have to PONG back. 23 . 1
  • 24. Otherwise, server terminates connection eventually. 24 . 1
  • 25. To announce interest in a subject, we use SUB… 25 . 1
  • 26. …and PUB to send a command to connected clients. 26 . 1
  • 27. Then, a MSG will be delivered to interested clients… 27 . 1
  • 28. Interest in subject can be limited via UNSUB… 28 . 1
  • 29. …then if many messages are sent, we only get one. 29 . 1
  • 30. At any time, the server can send us an error… 30 . 1
  • 31. …and disconnect us a er sending the error. 31 . 1
  • 33. Agenda Intro to the NATS Project The NATS Protocol NATS Client Deep Dive IO Engine Optimized Flushing Fast Protocol Parsing Engine Connecting & Graceful Reconnection Sync & Async Subscriptions Request / Response APIs Performance 33 . 1
  • 34. NATS Client Engine The client is built around a couple of goroutines which cooperate to read / write messages as fast as possible asynchronously. 34 . 1
  • 35. NATS Client Engine // spinUpGoRoutines will launch the goroutines responsible for // reading and writing to the socket. This will be launched via a // go routine itself to release any locks that may be held. // We also use a WaitGroup to make sure we only start them on a // reconnect when the previous ones have exited. func (nc *Conn) spinUpGoRoutines() { // Make sure everything has exited. nc.waitForExits(nc.wg) // Create a new waitGroup instance for this run. nc.wg = &sync.WaitGroup{} // We will wait on both. nc.wg.Add(2) // Spin up the readLoop and the socket flusher. go nc.readLoop(nc.wg) go nc.flusher(nc.wg) 35 . 1
  • 37. Flushing mechanism Implemented via a buffered channel which is used to signal a flush of any pending data in the bufio.Writer. nc.fch = make(chan struct{}, 1024) // flusher is a separate goroutines that will process flush requests for the write // bufio. This allows coalescing of writes to the underlying socket. func (nc *Conn) flusher(wg *sync.WaitGroup) { defer wg.Done() // ... for { if _, ok := <-fch; !ok { return } // ... if bw.Buffered() > 0 { bw.Flush() } 37 . 1
  • 38. Flushing mechanism On each command being sent to the server, we prepare a flush of any pending data in case there are no pending flushes already. func (nc *Conn) publish(subj, reply string, data []byte) error { // ... _, err := nc.bw.Write(msgh) if err != nil { return err } // Schedule pending flush in case there are none. if len(nc.fch) == 0 { select { case nc.fch <- struct{}{}: default: } } } 38 . 1
  • 39. Flushing mechanism This helps adding backpressure into the client and do writes coalescing to spend less time writing into the socket. 39 . 1
  • 40. Reader loop The client uses a zero allocation parser under a read loop based on the stack // readLoop() will sit on the socket reading and processing the // protocol from the server. It will dispatch appropriately based // on the op type. func (nc *Conn) readLoop(wg *sync.WaitGroup) { // ... // Stack based buffer b := make([]byte, 32768) for { // ... n, err := conn.Read(b) if err != nil { nc.processOpErr(err) break } if err := nc.parse(b[:n]); err != nil { nc.processOpErr(err) break } // ... 40 . 1
  • 41. Fast Protocol Parsing Engine func (nc *Conn) parse(buf []byte) error { var i int var b byte for i = 0; i < len(buf); i++ { b = buf[i] switch nc.ps.state { case OP_START: switch b { case 'M', 'm': nc.ps.state = OP_M case 'P', 'p': nc.ps.state = OP_P case 'I', 'i': nc.ps.state = OP_I // ... case MSG_PAYLOAD: if nc.ps.msgBuf != nil { if len(nc.ps.msgBuf) >= nc.ps.ma.size { nc.processMsg(nc.ps.msgBuf) nc.ps.argBuf, nc.ps.msgBuf, nc.ps.state = nil, n // ... 41 . 1
  • 42. Fast Protocol Parsing Engine const ( OP_START = iota OP_PLUS // + OP_PLUS_O // +O OP_PLUS_OK // +OK OP_MINUS // - OP_MINUS_E // -E OP_MINUS_ER // -ER OP_MINUS_ERR // -ERR OP_MINUS_ERR_SPC // -ERR MINUS_ERR_ARG // -ERR ' OP_M // M OP_MS // MS OP_MSG // MSG OP_MSG_SPC // MSG MSG_ARG // MSG MSG_PAYLOAD // MSG foo bar 1 rn MSG_END // MSG foo bar 1 rn OP_P // P OP_PI // PI OP_PIN // PIN OP_PING // PING OP_PO // PO OP_PON // PON OP_PONG // PONG OP_I // I OP_IN // IN OP_INF // INF OP_INFO // INFO OP_INFO_SPC // INFO INFO_ARG // INFO ) 42 . 1
  • 43. Commands are sent asynchronously! Publishing/Subscribing actions do not block, instead we rely on the client internal engine to eventually flush and send to the server. In example above, there is no guarantee that foo subscription will be receiving the message (may not have flushed yet). nc1, _ := nats.Connect("nats://demo.nats.io:4222") nc2, _ := nats.Connect("nats://demo.nats.io:4222") nc1.Subscribe("foo", func(m *nats.Msg){ log.Printf("[Received] %+v", m) }) nc2.Publish("foo", []byte("Hello World")) 43 . 1
  • 44. FAQ If commands are sent asynchronously, how to ensure that a command sent has been processed by the server? 44 . 1
  • 45. Ensure ordering via PING / PONG Ordering is guaranteed per connection, thus sending a PING first immediately a er SUB would guarantee us receiving PONG before any other matching MSG on the subscription. 45 . 1
  • 46. NATS Flush The client provides a Flush API which both flushes the bufio.Writer and does the server roundtrip for us. nc1, _ := nats.Connect("nats://demo.nats.io:4222") nc2, _ := nats.Connect("nats://demo.nats.io:4222") nc1.Subscribe("foo", func(m *nats.Msg){ log.Printf("[Received] %+v", m) }) // Sends PING, flushes buffer, and blocks until receiving PONG reply. nc1.Flush() // Would be received by subscription nc2.Publish("foo", []byte("Hello World")) 46 . 1
  • 47. NATS Flush We create an unbuffered channel per each pending pong reply, and append into an array of channels which represent the pending replies. nc.pongs = make([]chan struct{}, 0, 8) func (nc *Conn) FlushTimeout(timeout time.Duration) (err error) { // ... ch := make(chan struct{}) nc.pongs = append(nc.pongs, ch) nc.bw.WriteString("PINGrn") nc.bw.Flush() // ... select { case <-ch: close(ch) case <-time.After(timeout): return ErrTimeout } 47 . 1
  • 48. NATS Flush Then when we receive the PONG back, we signal the client that we have received it so that it stops blocking. // processPong is used to process responses to the client's ping // messages. We use pings for the flush mechanism as well. func (nc *Conn) processPong() { var ch chan struct{} nc.mu.Lock() if len(nc.pongs) > 0 { ch = nc.pongs[0] nc.pongs = nc.pongs[1:] } nc.pout = 0 nc.mu.Unlock() if ch != nil { // Signal that PONG was received. ch <- struct{}{} } } 48 . 1
  • 49. Intro to the NATS Project The NATS Protocol NATS Client Deep Dive IO Engine Connecting & Graceful Reconnection Synchronous & Asynchronous Subscriptions Request/Response Performance 49 . 1
  • 50. Connecting Options For connecting to NATS, we use the functional options pattern to be able to extend the customizations. // Connect will attempt to connect to the NATS system. // The url can contain username/password semantics. e.g. nats://derek:pass@localhost:422 // Comma separated arrays are also supported, e.g. urlA, urlB. // Options start with the defaults but can be overridden. func Connect(url string, options ...Option) (*Conn, error) { opts := DefaultOptions opts.Servers = processUrlString(url) for _, opt := range options { if err := opt(&opts); err != nil { return nil, err } } return opts.Connect() } https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis 50 . 1
  • 51. Connecting Options For example, to customize set of credentials sent on CONNECT… CONNECT {"user":"secret", "pass":"password"}rn // UserInfo is an Option to set the username and password to // use when not included directly in the URLs. func UserInfo(user, password string) Option { return func(o *Options) error { o.User = user o.Password = password return nil } } // Usage: // nats.Connect("nats://127.0.0.1:4222", nats.UserInfo("secret", "password")) 51 . 1
  • 52. Connecting Options Or to customize TLS: nc, err = nats.Connect(secureURL, nats.RootCAs("./ca.pem"), nats.ClientCert("./client-cert.pem", "./client-key.pem")) if err != nil { t.Fatalf("Failed to create (TLS) connection: %v", err) } defer nc.Close() 52 . 1
  • 55. Reconnecting Options By default, the client has reconnecting enabled but can be customized too. This means that if there is ever an issue with the server that we were connected to, then it will try to connect to another one in the pool. // ReconnectWait is an Option to set the wait time between reconnect attempts. func ReconnectWait(t time.Duration) Option { return func(o *Options) error { o.ReconnectWait = t return nil } } // MaxReconnects is an Option to set the maximum number of reconnect attempts. func MaxReconnects(max int) Option { return func(o *Options) error { o.MaxReconnect = max return nil } } 55 . 1
  • 56. Reconnection logic While the client is in the RECONNECTING state, the underlying bufio.Writer is replaced with a bytes.Buffer instead of the connection. 56 . 1
  • 57. Reconnection logic if nc.Opts.AllowReconnect && nc.status == CONNECTED { // Set our new status nc.status = RECONNECTING if nc.ptmr != nil { nc.ptmr.Stop() } if nc.conn != nil { nc.bw.Flush() nc.conn.Close() nc.conn = nil } // Create a new pending buffer to underpin the bufio Writer while // we are reconnecting. nc.pending = &bytes.Buffer{} nc.bw = bufio.NewWriterSize(nc.pending, nc.Opts.ReconnectBufSize) go nc.doReconnect() nc.mu.Unlock() return } 57 . 1
  • 58. Reconnection logic Then, on reconnect it is flushed once preparing to send all the subscriptions again. CONNECT...rnPINGrnSUB foo 1rnSUB bar 2rnPUB hello 5rnworldrn func (nc *Conn) doReconnect() { // note: a lot was elided here for len(nc.srvPool) > 0 { server, _ := nc.selectNextServer() nc.createConn() nc.resendSubscriptions() if nc.pending == nil { if nc.pending.Len() > 0 { nc.bw.Write(nc.pending.Bytes()) } } // Flush the buffer and retry next server on failure nc.err = nc.bw.Flush() if nc.err != nil { nc.status = RECONNECTING continue } 58 . 1
  • 59. Disconnection and Reconnection Callbacks We can set event callbacks via connect options which could be helpful for debugging or adding custom logic. nc, err = nats.Connect("nats://demo.nats.io:4222", nats.MaxReconnects(5), nats.ReconnectWait(2 * time.Second), nats.DisconnectHandler(func(nc *nats.Conn) { log.Printf("Got disconnected!n") }), nats.ReconnectHandler(func(nc *nats.Conn) { log.Printf("Got reconnected to %v!n", nc.ConnectedUrl()) }), nats.ClosedHandler(func(nc *nats.Conn) { log.Printf("Connection closed. Reason: %qn", nc.LastError()) }), nats.ErrorHandler(func(nc *nats.Conn, sub *nats.Subscription, err error) { log.Printf("Error: %sn", err) }) ) 59 . 1
  • 60. Disconnection and Reconnection Callbacks 60 . 1
  • 61. Async Error & Event Callbacks In order to ensure ordering of event callbacks, a channel is used… // Create the async callback channel. nc.ach = make(chan asyncCB, 32) // Slow consumer error in case our buffered channel blocked in a Subscription. if nc.Opts.AsyncErrorCB != nil { nc.ach <- func() { nc.Opts.AsyncErrorCB(nc, sub, ErrSlowConsumer) } } // On disconnection events if nc.Opts.DisconnectedCB != nil && nc.conn != nil { nc.ach <- func() { nc.Opts.DisconnectedCB(nc) } } if nc.Opts.ClosedCB != nil { nc.ach <- func() { nc.Opts.ClosedCB(nc) } } // On reconnect event if nc.Opts.ReconnectedCB != nil { nc.ach <- func() { nc.Opts.ReconnectedCB(nc) } } 61 . 1
  • 62. Async Error & Event Callbacks …then an asyncDispatch goroutine will be responsible of dispatching them in sequence until channel closed. // Connect will attempt to connect to a NATS server with multiple options. func (o Options) Connect() (*Conn, error) { nc := &Conn{Opts: o} // ... // Spin up the async cb dispatcher on success go nc.asyncDispatch() // asyncDispatch is responsible for calling any async callbacks func (nc *Conn) asyncDispatch() { // ... // Loop on the channel and process async callbacks. for { if f, ok := <-ach; !ok { return } else { f() } } } 62 . 1
  • 63. Intro to the NATS Project The NATS Protocol NATS Client Deep Dive IO Engine Connecting & Graceful Reconnection Sync & Async Subscriptions Channels and Callback based APIs Request / Response APIs Performance 63 . 1
  • 64. Subscription Types There are multiple APIs for processing the messages received from server. // Async Subscriber nc.Subscribe("foo", func(m *nats.Msg) { log.Printf("Received a message: %sn", string(m.Data)) }) // Sync Subscriber which blocks sub, err := nc.SubscribeSync("foo") m, err := sub.NextMsg(timeout) // Channel Subscriber ch := make(chan *nats.Msg, 64) sub, err := nc.ChanSubscribe("foo", ch) msg := <- ch 64 . 1
  • 65. Subscription Types Each type of subscription has a QueueSubscribe variation too. // Async Queue Subscriber nc.QueueSubscribe("foo", "group", func(m *nats.Msg) { log.Printf("Received a message: %sn", string(m.Data)) }) // Sync Queue Subscriber which blocks sub, err := nc.QueueSubscribeSync("foo", "group") m, err := sub.NextMsg(timeout) // Channel Subscriber ch := make(chan *nats.Msg, 64) sub, err := nc.ChanQueueSubscribe("foo", "group", ch) msg := <- ch 65 . 1
  • 66. Async Subscribers Subscribe takes a subject and callback and then dispatches each of the messages to the callback, processing them sequentially. nc.Subscribe("foo", func(m *nats.Msg) { log.Printf("Received a message: %sn", string(m.Data)) }) 66 . 1
  • 67. Async Subscribers Then, we need to keep in mind that when multiple messages are received on the same subscription, there is potential of head of line blocking issues. nc.Subscribe("foo", func(m *nats.Msg) { log.Printf("Received a message: %sn", string(m.Data)) // Next message would not be processed until // this sleep is done. time.Sleep(5 * time.Second) }) 67 . 1
  • 68. Async Subscribers If that is an issue, user can opt into parallelize the processing by spinning up goroutines when receiving a message. Determining whether to use a goroutine and implementing backpressure is up to the user. nc.Subscribe("foo", func(m *nats.Msg) { log.Printf("Received a message: %sn", string(m.Data)) go func(){ // Next message would not block because // of this sleep. time.Sleep(5 * time.Second) }() }) 68 . 1
  • 69. Async Subscribers Each Async Subscription has its own goroutine for processing so other subscriptions can still be processing messages (they do not block each other). 69 . 1
  • 70. Async Subscribers Async subscribers are implemented via a linked list of messages as they were received, in combination with use of conditional variables (no channels) One main benefit of this approach, is that the list can grow only as needed when not sure of how many messages will be received. This way when having many subscriptions we do not have to allocate a channel of a large size in order to not be worried of dropping messages. type Subscription struct { // ... pHead *Msg pTail *Msg pCond *sync.Cond // ... 70 . 1
  • 71. Async Subscribers When subscribing, a goroutine is spun for the subscription, waiting for the parser to feed us new messages. func (nc *Conn) subscribe(...) { // ... // If we have an async callback, start up a sub specific // goroutine to deliver the messages. if cb != nil { sub.typ = AsyncSubscription sub.pCond = sync.NewCond(&sub.mu) go nc.waitForMsgs(sub) 71 . 1
  • 72. Async Subscribers Then we wait for the conditional… func (nc *Conn) waitForMsgs(s *Subscription) { var closed bool var delivered, max uint64 for { s.mu.Lock() if s.pHead == nil && !s.closed { // Parser will signal us when a message for // this subscription is received. s.pCond.Wait() } // ... mcb := s.mcb // ... // Deliver the message. if m != nil && (max == 0 || delivered <= max) { mcb(m) } 72 . 1
  • 73. Async Subscribers …until signaled by the parser when getting a message. func (nc *Conn) processMsg(data []byte) { // ... sub := nc.subs[nc.ps.ma.sid] // ... if sub.pHead == nil { sub.pHead = m sub.pTail = m sub.pCond.Signal() // } else { sub.pTail.next = m sub.pTail = m } 73 . 1
  • 74. Sync & Channel Subscribers Internally, they are fairly similar since both are channel based. // Sync Subscriber which blocks until receiving next message. sub, err := nc.SubscribeSync("foo") m, err := sub.NextMsg(timeout) // Channel Subscriber ch := make(chan *nats.Msg, 64) sub, err := nc.ChanSubscribe("foo", ch) msg := <- ch 74 . 1
  • 75. Sync & Channel Subscribers Once parser yields a message, it is sent to the channel. Client needs to be ready to receive, otherwise if it blocks then it would drop the message and report a slow consumer error in the async error callback. func (nc *Conn) processMsg(data []byte) { // ... sub := nc.subs[nc.ps.ma.sid] // ... // We have two modes of delivery. One is the channel, used by channel // subscribers and syncSubscribers, the other is a linked list for async. if sub.mch != nil { select { case sub.mch <- m: default: goto slowConsumer } 75 . 1
  • 76. Intro to the NATS Project The NATS Protocol NATS Client Deep Dive IO Engine Connecting & Graceful Reconnection Sync & Async Subscriptions Request/Response Unique inboxes generation Context support for cancellation Performance 76 . 1
  • 77. Request / Response Building on top of PubSub mechanisms available, the client has support for 1:1 style of communication. func main() { nc, err := nats.Connect("nats://127.0.0.1:4222") if err != nil { log.Fatalf("Error: %s", err) } nc.Subscribe("help", func(m *nats.Msg) { log.Printf("[Received] %+v", m) nc.Publish(m.Reply, []byte("I can help")) }) response, err := nc.Request("help", []byte("please"), 2*time.Second) if err != nil { log.Fatalf("Error: %s", err) } log.Printf("[Response] %+v", response) } 77 . 1
  • 78. Request / Response Protocol-wise, this is achieved by using ephemeral subscriptions. Requestor Responder SUB _INBOX.95jM7MgZoxZt94fhLpewjg 2 UNSUB 2 1 PUB help _INBOX.95jM7MgZoxZt94fhLpewjg 6 please MSG _INBOX.95jM7MgZoxZt94fhLpewjg 2 10 I can help SUB help 1 MSG help 1 _INBOX.95jM7MgZoxZt94fhLpewjg 6 please PUB _INBOX.95jM7MgZoxZt94fhLpewjg 10 I can help 78 . 1
  • 79. Unique Inboxes Generation The subscriptions identifiers are generated via the NUID which can generate identifiers as fast as ~16 M per second. https://github.com/nats-io/nuid
  • 81. Request / Response func (nc *Conn) Request(subj string, data []byte, timeout time.Duration) ( *Msg, error, ) { inbox := NewInbox() ch := make(chan *Msg, RequestChanLen) s, err := nc.subscribe(inbox, _EMPTY_, nil, ch) if err != nil { return nil, err } s.AutoUnsubscribe(1) // UNSUB after receiving 1 defer s.Unsubscribe() // PUB help _INBOX.nwOOaWSe... err = nc.PublishRequest(subj, inbox, data) if err != nil { return nil, err } // Block until MSG is received or timeout. return s.NextMsg(timeout) } 80 . 1
  • 82. Request / Response (Context variation) func (nc *Conn) RequestWithContext(ctx context.Context, subj string, data []byte) ( *Msg, error, ) { inbox := NewInbox() ch := make(chan *Msg, RequestChanLen) s, err := nc.subscribe(inbox, _EMPTY_, nil, ch) if err != nil { return nil, err } s.AutoUnsubscribe(1) // UNSUB after receiving 1 defer s.Unsubscribe() // PUB help _INBOX.nwOOaWSe... err = nc.PublishRequest(subj, inbox, data) if err != nil { return nil, err } // Block until MSG is received or context canceled. return s.NextMsgWithContext(ctx) } 81 . 1
  • 83. Request / Response (+v1.3.0) In next release of the client, a less chatty version of the request/response mechanism is available. Requestor Responder SUB _INBOX.nwOOaWSeWrt0ok20pKFfNz.* 2 PUB help _INBOX.nwOOaWSeWrt0ok20pKFfNz.nwOOaWSeWrt0ok20pKFfPo 6 please MSG _INBOX.nwOOaWSeWrt0ok20pKFfNz.nwOOaWSeWrt0ok20pKFfPo 2 10 I can help SUB help 1 MSG help 1 _INBOX.nwOOaWSeWrt0ok20pKFfNz.nwOOaWSeWrt0ok20pKFfPo 6 please PUB _INBOX.nwOOaWSeWrt0ok20pKFfNz.nwOOaWSeWrt0ok20pKFfPo 10 I can help 82 . 1
  • 84. Request / Response (new version) func (nc *Conn) Request(subj string, data []byte, timeout time.Duration) (*Msg, error) { // ... (note: not actual implementation) nc.mu.Lock() if nc.respMap == nil { // Wildcard subject for all requests, e.g. INBOX.abcd.* nc.respSub = fmt.Sprintf("%s.*", NewInbox()) nc.respMap = make(map[string]chan *Msg) } mch := make(chan *Msg, RequestChanLen) respInbox := nc.newRespInbox() // Await for reply on subject: _INBOX.abcd.xyz token := respToken(respInbox) // Last token identifying this request: xyz nc.respMap[token] = mch // ... nc.mu.Unlock() // Broadcast request, then wait for reply or timeout. if err := nc.PublishRequest(subj, respInbox, data); err != nil { return nil, err } // (Wait for message or timer timeout) select { case msg, ok := <-mch: // Message received or Connection Closed case <-time.After(timeout): // Discard request and yield error return nil, ErrTimeout } 83 . 1
  • 85. Request / Response (new version+context) func (nc *Conn) RequestWithContext(ctx context.Context, subj string, data []byte) (*Msg, // ... (note: not actual implementation) nc.mu.Lock() if nc.respMap == nil { // Wildcard subject for all requests, e.g. INBOX.abcd.* nc.respSub = fmt.Sprintf("%s.*", NewInbox()) nc.respMap = make(map[string]chan *Msg) } mch := make(chan *Msg, RequestChanLen) respInbox := nc.newRespInbox() // Await for reply on subject: _INBOX.abcd.xyz token := respToken(respInbox) // Last token identifying this request: xyz nc.respMap[token] = mch // ... nc.mu.Unlock() // Broadcast request, then wait for reply or timeout. if err := nc.PublishRequest(subj, respInbox, data); err != nil { return nil, err } // (Wait for message or timer timeout) select { case msg, ok := <-mch: // Message received or Connection Closed case <-ctx.Done(): // Discard request and yield error return nil, ErrTimeout } 84 . 1
  • 86. Request / Response (new version) // In Request(...) // // Make sure scoped subscription is setup only once via sync.Once var err error nc.respSetup.Do(func() { var s *Subscription // Create global subscription: INBOX.abcd.* s, err = nc.Subscribe(ginbox, func(m *Msg){ // Get token: xyz ← INBOX.abcd.xyz rt := respToken(m.Subject) nc.mu.Lock() // Signal channel that we received the message. mch := nc.respMap[rt] delete(nc.respMap, rt) nc.mu.Unlock() select { case mch <- m: default: return } }) nc.mu.Lock() nc.respMux = s nc.mu.Unlock() }) 85 . 1
  • 87. Intro to the NATS Project The NATS Protocol NATS Client Deep Dive IO Engine Connecting & Graceful Reconnection Synchronous & Asynchronous Subscriptions Request / Response APIs Performance 86 . 1
  • 88. nats-bench Current implementation of the client/server show a performance of ~10M messages per second (1B payload) 87 . 1
  • 89. Let's try escape analysis on a simple snippet 10 | func main() { 11 | nc, err := nats.Connect("nats://127.0.0.1:4222") 12 | if err != nil { 13 | os.Exit(1) 14 | } 15 | 16 | nc.Subscribe("help", func(m *nats.Msg) { 17 | println("[Received]", m) 18 | }) 19 | nc.Flush() 20 | 21 | nc.Request("help", []byte("hello world"), 1*time.Second) 22 | 23 | nc.Publish("help", []byte("please")) 24 | nc.Flush() 25 | 26 | time.Sleep(1 * time.Second) 27 |} 88 . 1
  • 90. go run -gcflags "-m" example.go # ... ./example.go:16: can inline main.func1 ./example.go:16: func literal escapes to heap ./example.go:16: func literal escapes to heap ./example.go:21: ([]byte)("hello world") escapes to heap # ❗ ./example.go:23: ([]byte)("please") escapes to heap # ❗ ./example.go:16: main.func1 m does not escape [Received] 0xc420100050 [Received] 0xc4200de050 89 . 1
  • 92. Turns out that all the messages being published escape and end up in the heap, why?? 21 | nc.Request("help", []byte("hello world"), 1*time.Second) 22 | 23 | nc.Publish("help", []byte("please")) 91 . 1
  • 93. When publishing, we are just doing bufio.Writer a er all… func (nc *Conn) publish(subj, reply string, data []byte) error { // ... _, err := nc.bw.Write(msgh) if err == nil { _, err = nc.bw.Write(data) } if err == nil { _, err = nc.bw.WriteString(_CRLF_) } https://github.com/nats-io/go-nats/blob/master/nats.go#L1956-L1965 92 . 1
  • 95. // Write writes the contents of p into the buffer. // It returns the number of bytes written. // If nn < len(p), it also returns an error explaining // why the write is short. func (b *Writer) Write(p []byte) (nn int, err error) { for len(p) > b.Available() && b.err == nil { var n int if b.Buffered() == 0 { // Large write, empty buffer. // Write directly from p to avoid copy. n, b.err = b.wr.Write(p) // <---- Interface causes alloc! } else { n = copy(b.buf[b.n:], p) b.n += n b.Flush() } nn += n p = p[n:] } https://github.com/golang/go/blob/ae238688d2813e83f16050408487ea34ba1c2fff/src/bufio/bufio.go#L600-L602 94 . 1
  • 96. We could swap out the bufio.Writer implementation with a custom one, use concrete types, etc. then prevent the escaping… + var bwr *bufioWriter + if conn, ok := w.(*net.TCPConn); ok { + bwr = &bufioWriter{ + buf: make([]byte, size), + wr: w, + conn: conn, + } + } else if buffer, ok := w.(*bytes.Buffer); ok { + bwr = &bufioWriter{ + buf: make([]byte, size), + wr: w, + pending: buffer, + } + } else if sconn, ok := w.(*tls.Conn); ok { + bwr = &bufioWriter{ + buf: make([]byte, size), + wr: w, + sconn: sconn, + } + } 95 . 1
  • 97. Performance matters, but is it worth it the complexity? 96 . 1
  • 98. Although for converting number to byte, approach not using strconv.AppendInt was taken since much more performant. // msgh = strconv.AppendInt(msgh, int64(len(data)), 10) var b [12]byte var i = len(b) if len(data) > 0 { for l := len(data); l > 0; l /= 10 { i -= 1 b[i] = digits[l%10] } } else { i -= 1 b[i] = digits[0] } 97 . 1
  • 100. ✅ Techniques used and trade-offs in the Go client: Backpressure of I/O and writes coalescing Fast protocol parsing Graceful reconnection on server failover Usage & Non-usage of channels for communicating Closures, callbacks & channels for events/custom logic 99 . 1
  • 101. Conclusions NATS approach to simplicity fits really well with what Go provides and has helped the project greatly. Go is a very flexible systems programming language, allowing us to choose our own trade offs. There is great tooling in Go which makes it possible to analyze when to take them. 100 . 1