SlideShare uma empresa Scribd logo
1 de 83
Baixar para ler offline
Microservices: What’s Missing?
Adrian Cockcroft @adrianco
Technology Fellow - Battery Ventures
April 2016
What does @adrianco do?
@adrianco
Technology Due
Diligence on Deals
Presentations at
Conferences
Presentations at
Companies
Technical
Advice for Portfolio
Companies
Program
Committee for
Conferences
Networking with
Interesting PeopleTinkering with
Technologies
Maintain
Relationship with
Cloud Vendors
Previously: Netflix, eBay, Sun Microsystems, CCL, TCU London BSc Applied Physics
http://www.infoq.com/presentations/Twitter-Timeline-Scalability
http://www.infoq.com/presentations/twitter-soa
http://www.infoq.com/presentations/Zipkin
http://www.infoq.com/presentations/scale-gilt
Go-Kit https://www.youtube.com/watch?v=aL6sd4d4hxk
http://www.infoq.com/presentations/circuit-breaking-distributed-systems
https://speakerdeck.com/mattheath/scaling-micro-services-in-go-highload-plus-plus-2014
State of the Art in Web Scale
Microservice Architectures
AWS Re:Invent : Asgard to Zuul https://www.youtube.com/watch?v=p7ysHhs5hl0
Resiliency at Massive Scale https://www.youtube.com/watch?v=ZfYJHtVL1_w
Microservice Architecture https://www.youtube.com/watch?v=CriDUYtfrjs
New projects for 2015 and Docker Packaging https://www.youtube.com/watch?v=hi7BDAtjfKY
Spinnaker deployment pipeline https://www.youtube.com/watch?v=dwdVwE52KkU
http://www.infoq.com/presentations/spring-cloud-2015
Microservice Architectures
ConfigurationTooling Discovery Routing Observability
Development: Languages and Container
Operational: Orchestration and Deployment Infrastructure
Datastores
Policy: Architectural and Security Compliance
Microservices
Edda
Archaius
Configuration
Spinnaker
SpringCloud
Tooling
Eureka
Prana
Discovery
Denominator
Zuul
Ribbon
Routing
Hystrix
Pytheus
Atlas
Observability
Development using Java, Groovy, Scala, Clojure, Python with AMI and Docker Containers
Orchestration with Autoscalers on AWS, Titus exploring Mesos & ECS for Docker
Ephemeral datastores using Dynomite, Memcached, Astyanax, Staash, Priam, Cassandra
Policy via the Simian Army - Chaos Monkey, Chaos Gorilla, Conformity Monkey, Security Monkey
@adrianco
Discussion Points
Failure injection testing
Versioning, routing
Protocols and interfaces
Timeouts and retries
Denormalized data models
Monitoring, tracing
Simplicity through symmetry
@adrianco
Failure Injection Testing
Netflix Chaos Monkey and Simian Army
http://techblog.netflix.com/2011/07/netflix-simian-army.html
http://techblog.netflix.com/2014/10/fit-failure-injection-testing.html
http://techblog.netflix.com/2016/01/automated-failure-testing.html
! Chaos Monkey - enforcing stateless business logic
! Chaos Gorilla - enforcing zone isolation/replication
! Chaos Kong - enforcing region isolation/replication
! Security Monkey - watching for insecure configuration settings
! Latency Monkey & FIT - inject errors to enforce robust dependencies
! See over 100 NetflixOSS projects at netflix.github.com
! Get “Technical Indigestion” reading techblog.netflix.com
Trust with Verification
@adrianco
Benefits of version aware routing
Immediately and safely introduce a new version
Canary test in production
Use feature flags n
Route clients to a version so they can’t get disrupted
Change client or dependencies but not both at once
Eventually remove old versions
Incremental or infrequent “break the build” garbage collection
@adrianco
Versioning, Routing
Version numbering: Interface.Feature.Bugfix
V1.2.3 to V1.2.4 - Canary test then remove old version
V1.2.x to V1.3.x - Canary test then remove or keep both
Route V1.3.x clients to new version to get new feature
Remove V1.2.x only after V1.3.x is found to work for V1.2.x clients
V1.x.x to V2.x.x - Route clients to specific versions
Remove old server version when all old clients are gone
@adrianco
Protocols
Measure serialization, transmission, deserialization costs
“Sending a megabyte of XML between microservices will
make you sad…”
Use Thrift, Protobuf/gRPC, Avro, SBE internally
Use JSON for external/public interfaces
https://github.com/real-logic/simple-binary-encoding
@adrianco
Interfaces
When you build a service, build a “driver” client for it
Reference implementation error handling and serialization
Release automation stress test using client
Validate that service interface is usable!
Minimize additional dependencies
Swagger - OpenAPI Specification
Datawire Quark adds behaviors to API spec
@adrianco
Interfaces
@adrianco
Interfaces
Client
Code
Object
Model
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Object
Model
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Object
Model
Cache
Code
Object
Model
Decoupled
object
models
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Service
Driver
Service
Handler
Object
Model
Cache
Code
Object
Model
Decoupled
object
models
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Cache
Driver
Service
Driver
Service
Handler
Object
Model
Cache
Code
Cache
Handler
Object
Model
Decoupled
object
models
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Cache
Driver
Service
Driver
Platform Platform
Service
Handler
Object
Model
Cache
Code
Platform
Cache
Handler
Object
Model
Decoupled
object
models
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Cache
Driver
Service
Driver
Platform Platform
Service
Handler
Object
Model
Cache
Code
Platform
Cache
Handler
Object
Model
Versioned
dependency
interfacesDecoupled
object
models
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Cache
Driver
Service
Driver
Platform Platform
Service
Handler
Object
Model
Cache
Code
Platform
Cache
Handler
Object
Model
Versioned
dependency
interfaces
Versioned
platform
interface
Decoupled
object
models
@adrianco
Interfaces
Service
Code
Client
Code
Object
Model
Cache
Driver
Service
Driver
Platform Platform
Service
Handler
Object
Model
Cache
Code
Platform
Cache
Handler
Object
Model
Versioned
dependency
interfaces
Versioned
platform
interface
Decoupled
object
models
Versioned routing
@adrianco
Interface Version Pinning
Change one thing at a time!
Pin the version of everything else
Incremental build/test/deploy pipeline
Deploy existing app code with new platform
Deploy existing app code with new dependencies
Deploy new app code with pinned platform/dependencies
@adrianco
Timeouts and Retries
Connection timeout vs. request timeout confusion
Usually setup incorrectly, global defaults
Systems collapse with “retry storms”
Timeouts too long, too many retries
Services doing work that can never be used
@adrianco
Connections and Requests
TCP makes a connection, HTTP makes a request
HTTP hopefully reuses connections for several requests
Both have different timeout and retry needs!
TCP timeout is purely a property of one network latency hop
HTTP timeout depends on the service and its dependencies
connection path
request path
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Good
Service
Bad config: Every service defaults to 2 second timeout, two retries
Edge
Service not
responding
Overloaded
service not
responding
Failed
Service
If anything breaks, everything upstream stops responding
Retries add unproductive work
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Good
Service
Bad config: Every service defaults to 2 second timeout, two retries
Edge
Service not
responding
Overloaded
service not
responding
Failed
Service
If anything breaks, everything upstream stops responding
Retries add unproductive work
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Good
Service
Bad config: Every service defaults to 2 second timeout, two retries
Edge
Service not
responding
Overloaded
service not
responding
Failed
Service
If anything breaks, everything upstream stops responding
Retries add unproductive work
@adrianco
Timeouts and Retries
Bad config: Every service defaults to 2 second timeout, two retries
Edge
service
responds
slowly
Overloaded
service
Partially
failed
service
@adrianco
Timeouts and Retries
Bad config: Every service defaults to 2 second timeout, two retries
Edge
service
responds
slowly
Overloaded
service
Partially
failed
service
First request from Edge timed out so it ignores the successful
response and keeps retrying. Middle service load increases as
it’s doing work that isn’t being consumed
@adrianco
Timeouts and Retries
Bad config: Every service defaults to 2 second timeout, two retries
Edge
service
responds
slowly
Overloaded
service
Partially
failed
service
First request from Edge timed out so it ignores the successful
response and keeps retrying. Middle service load increases as
it’s doing work that isn’t being consumed
@adrianco
Timeout and Retry Fixes
Cascading timeout budget
Static settings that decrease from the edge
or dynamic budget passed with request
How often do retries actually succeed?
Don’t ask the same instance the same thing
Only retry on a different connection
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Budgeted timeout, one retry
Failed
Service
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Budgeted timeout, one retry
Failed
Service
3s
1s
1s
Fast fail
response
after 2s
Upstream timeout must always be longer than
total downstream timeout * retries delay
No unproductive work while fast failing
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Budgeted timeout, failover retry
Failed
Service
For replicated services with multiple instances
never retry against a failed instance
No extra retries or unproductive work
Good
Service
@adrianco
Timeouts and Retries
Edge
Service
Good
Service
Budgeted timeout, failover retry
Failed
Service
3s 1s
For replicated services with multiple instances
never retry against a failed instance
No extra retries or unproductive work
Good
Service
Successful
response
delayed 1s
@adrianco
Manage Inconsistency
ACM Paper: "The Network is Reliable"
Distributed systems are inconsistent by nature
Clients are inconsistent with servers
Most caches are inconsistent
Versions are inconsistent
Get over it and
Deal with it
@adrianco
Denormalized Data Models
Any non-trivial organization has many databases
Cross references exist, inconsistencies exist
Microservices work best with individual simple stores
Scale, operate, mutate, fail them independently
NoSQL allows flexible schema/object versions
@adrianco
Denormalized Data Models
Build custom cross-datasource check/repair processes
Ensure all cross references are up to date
Immutability Changes Everything
http://highscalability.com/blog/2015/1/26/paper-immutability-changes-everything-by-pat-helland.html
Memories, Guesses and Apologies
https://blogs.msdn.microsoft.com/pathelland/2007/05/15/memories-guesses-and-apologies/
Cloud Native
Monitoring and
Microservices
Cloud Native Microservices
! High rate of change
Code pushes can cause floods of new instances and metrics
Short baseline for alert threshold analysis – everything looks unusual
! Ephemeral Configurations
Short lifetimes make it hard to aggregate historical views
Hand tweaked monitoring tools take too much work to keep running
! Microservices with complex calling patterns
End-to-end request flow measurements are very important
Request flow visualizations get overwhelmed
Low Latency SaaS Based Monitors
https://www.datadoghq.com/ http://www.instana.com/ www.bigpanda.io www.vividcortex.com signalfx.com wavefront.com sysdig.com
See www.battery.com for a list of portfolio investments
Managing Scale
A Possible Hierarchy
Continents
Regions
Zones
Services
Versions
Containers
Instances
How Many?
3 to 5
2-4 per Continent
1-5 per Region
100’s per Zone
Many per Service
1000’s per Version
10,000’s
It’s much more challenging
than just a large number of
machines
Flow
Some tools can show
the request flow
across a few services
Interesting
architectures have a
lot of microservices!
Flow visualization is
a big challenge.
See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
Simulated Microservices
Model and visualize microservices
Simulate interesting architectures
Generate large scale configurations
Eventually stress test real tools
Code: github.com/adrianco/spigo
Simulate Protocol Interactions in Go
Visualize with D3
See for yourself: http://simianviz.surge.sh
Follow @simianviz for updates
ELB Load Balancer
Zuul
API Proxy
Karyon
Business Logic
Staash
Data Access Layer
Priam
Cassandra Datastore
Three
Availability
Zones
Denominator
DNS Endpoint
Spigo Nanoservice Structure
func Start(listener chan gotocol.Message) {
...
for {
select {
case msg := <-listener:
flow.Instrument(msg, name, hist)
switch msg.Imposition {
case gotocol.Hello: // get named by parent
...
case gotocol.NameDrop: // someone new to talk to
...
case gotocol.Put: // upstream request handler
...
outmsg := gotocol.Message{gotocol.Replicate, listener, time.Now(),
msg.Ctx.NewParent(), msg.Intention}
flow.AnnotateSend(outmsg, name)
outmsg.GoSend(replicas)
}
case <-eurekaTicker.C: // poll the service registry
...
}
}
}
Skeleton code for replicating a Put message
Instrument incoming requests
Instrument outgoing requests
update trace context
Flow Trace Records
riak2
us-east-1
zoneC
riak9
us-west-2
zoneA
Put s896
Replicate
riak3
us-east-1
zoneA
riak8
us-west-2
zoneC
riak4
us-east-1
zoneB
riak10
us-west-2
zoneB
us-east-1.zoneC.riak2 t98p895s896 Put
us-east-1.zoneA.riak3 t98p896s908 Replicate
us-east-1.zoneB.riak4 t98p896s909 Replicate
us-west-2.zoneA.riak9 t98p896s910 Replicate
us-west-2.zoneB.riak10 t98p910s912 Replicate
us-west-2.zoneC.riak8 t98p910s913 Replicate
staash
us-east-1
zoneC
s910 s908s913
s909s912
Replicate Put
Open Zipkin
A common format for trace annotations
A Java tool for visualizing traces
Standardization effort to fold in other formats
Driven by Adrian Cole (currently at Pivotal)
Extended to load Spigo generated trace files
Trace for one Spigo Flow
Definition of an architecture
{
"arch": "lamp",
"description":"Simple LAMP stack",
"version": "arch-0.0",
"victim": "webserver",
"services": [
{ "name": "rds-mysql", "package": "store", "count": 2, "regions": 1, "dependencies": [] },
{ "name": "memcache", "package": "store", "count": 1, "regions": 1, "dependencies": [] },
{ "name": "webserver", "package": "monolith", "count": 18, "regions": 1, "dependencies": ["memcache", "rds-mysql"] },
{ "name": "webserver-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["webserver"] },
{ "name": "www", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["webserver-elb"] }
]
}
Header includes
chaos monkey victim
New tier
name
Tier
package
0 = non
Regional
Node
count
List of tier
dependencies
See for yourself: http://simianviz.surge.sh/lamp
Running Spigo
$ ./spigo -a lamp -j -d 2
2016/01/26 23:04:05 Loading architecture from json_arch/lamp_arch.json
2016/01/26 23:04:05 lamp.edda: starting
2016/01/26 23:04:05 Architecture: lamp Simple LAMP stack
2016/01/26 23:04:05 architecture: scaling to 100%
2016/01/26 23:04:05 lamp.us-east-1.zoneB.eureka01....eureka.eureka: starting
2016/01/26 23:04:05 lamp.us-east-1.zoneA.eureka00....eureka.eureka: starting
2016/01/26 23:04:05 lamp.us-east-1.zoneC.eureka02....eureka.eureka: starting
2016/01/26 23:04:05 Starting: {rds-mysql store 1 2 []}
2016/01/26 23:04:05 Starting: {memcache store 1 1 []}
2016/01/26 23:04:05 Starting: {webserver monolith 1 18 [memcache rds-mysql]}
2016/01/26 23:04:05 Starting: {webserver-elb elb 1 0 [webserver]}
2016/01/26 23:04:05 Starting: {www denominator 0 0 [webserver-elb]}
2016/01/26 23:04:05 lamp.*.*.www00....www.denominator activity rate 10ms
2016/01/26 23:04:06 chaosmonkey delete: lamp.us-east-1.zoneC.webserver02....webserver.monolith
2016/01/26 23:04:07 asgard: Shutdown
2016/01/26 23:04:07 lamp.us-east-1.zoneB.eureka01....eureka.eureka: closing
2016/01/26 23:04:07 lamp.us-east-1.zoneA.eureka00....eureka.eureka: closing
2016/01/26 23:04:07 lamp.us-east-1.zoneC.eureka02....eureka.eureka: closing
2016/01/26 23:04:07 spigo: complete
2016/01/26 23:04:07 lamp.edda: closing
-a architecture lamp
-j graph json/lamp.json
-d run for 2 seconds
Riak IoT Architecture
{
"arch": "riak",
"description":"Riak IoT ingestion example for the RICON 2015 presentation",
"version": "arch-0.0",
"victim": "",
"services": [
{ "name": "riakTS", "package": "riak", "count": 6, "regions": 1, "dependencies": ["riakTS", "eureka"]},
{ "name": "ingester", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakTS"]},
{ "name": "ingestMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["ingester"]},
{ "name": "riakKV", "package": "riak", "count": 3, "regions": 1, "dependencies": ["riakKV"]},
{ "name": "enricher", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakKV", "ingestMQ"]},
{ "name": "enrichMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["enricher"]},
{ "name": "analytics", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingester"]},
{ "name": "analytics-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["analytics"]},
{ "name": "analytics-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["analytics-elb"]},
{ "name": "normalization", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["enrichMQ"]},
{ "name": "iot-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["normalization"]},
{ "name": "iot-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["iot-elb"]},
{ "name": "stream", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingestMQ"]},
{ "name": "stream-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["stream"]},
{ "name": "stream-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["stream-elb"]}
]
}
New tier
name
Tier
package
Node
count
List of tier
dependencies
0 = non
Regional
Single Region Riak IoT
See for yourself: http://simianviz.surge.sh/riak
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
See for yourself: http://simianviz.surge.sh/riak
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Load Balancer
Load Balancer
See for yourself: http://simianviz.surge.sh/riak
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Load Balancer
Load Balancer
Stream Service
Analytics Service
See for yourself: http://simianviz.surge.sh/riak
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Enrich Message Queue
Riak KV
Enricher Services
Load Balancer
Load Balancer
Stream Service
Analytics Service
See for yourself: http://simianviz.surge.sh/riak
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Enrich Message Queue
Riak KV
Enricher Services
Ingest Message Queue
Load Balancer
Load Balancer
Stream Service
Analytics Service
See for yourself: http://simianviz.surge.sh/riak
Single Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
Load Balancer
Normalization Services
Enrich Message Queue
Riak KV
Enricher Services
Ingest Message Queue
Load Balancer
Load Balancer
Stream Service Riak TS
Analytics Service
Ingester Service
See for yourself: http://simianviz.surge.sh/riak
Two Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
East Region Ingestion
West Region Ingestion
Multi Region TS Analytics
See for yourself: http://simianviz.surge.sh/riak
Two Region Riak IoT
IoT Ingestion Endpoint
Stream Endpoint
Analytics Endpoint
East Region Ingestion
West Region Ingestion
Multi Region TS Analytics
What’s the response
time of the stream
endpoint?
See for yourself: http://simianviz.surge.sh/riak
Response Times
What’s the response time distribution of a very
simple storage backed web service?
memcached
mysql
disk volume
web
service
load
generator
memcached
See http://www.getguesstimate.com/models/1307
memcached hit %
memcached response mysql response
service cpu time
memcached hit mode
mysql cache hit mode
mysql disk access mode
Hit rates: memcached 40% mysql 70%
Hit rates: memcached 60% mysql 70%
Hit rates: memcached 20% mysql 90%
Measuring
Response Time With
Histograms
Changes made to codahale/hdrhistogram
Changes made to go-kit/kit/metrics
Implementation in adrianco/spigo/collect
What to measure?
Client Server
GetRequest
GetResponse
Client
Time
Client Send CS
Server Receive SR
Server Send SS
Client Receive CR
Server
Time
What to measure?
Client Server
GetRequest
GetResponse
Client
Time
Client Send CS
Server Receive SR
Server Send SS
Client Receive CR
Response
CR-CS
Service
SS-SR
Network
SR-CS
Network
CR-SS
Net Round Trip
(SR-CS) + (CR-SS)
(CR-CS) - (SS-SR)
Server
Time
Spigo Histogram Results
Collected with: % spigo -d 60 -j -a storage -c
name: storage.*.*..load00...load.denominator_serv
quantiles: [{50 47103} {99 139263}]
From To Count Prob Bar
20480 21503 2 0.0007 :
21504 22527 2 0.0007 |
23552 24575 1 0.0003 :
24576 25599 5 0.0017 |
25600 26623 5 0.0017 |
26624 27647 1 0.0003 |
27648 28671 3 0.0010 |
28672 29695 5 0.0017 |
29696 30719 127 0.0421 |####
30720 31743 126 0.0418 |####
31744 32767 74 0.0246 |##
32768 34815 281 0.0932 |#########
34816 36863 201 0.0667 |######
36864 38911 156 0.0518 |#####
38912 40959 185 0.0614 |######
40960 43007 147 0.0488 |####
43008 45055 161 0.0534 |#####
45056 47103 125 0.0415 |####
47104 49151 135 0.0448 |####
49152 51199 99 0.0328 |###
51200 53247 82 0.0272 |##
53248 55295 77 0.0255 |##
55296 57343 66 0.0219 |##
57344 59391 54 0.0179 |#
59392 61439 37 0.0123 |#
61440 63487 45 0.0149 |#
63488 65535 33 0.0109 |#
65536 69631 63 0.0209 |##
69632 73727 98 0.0325 |###
73728 77823 92 0.0305 |###
77824 81919 112 0.0372 |###
81920 86015 88 0.0292 |##
86016 90111 55 0.0182 |#
90112 94207 38 0.0126 |#
94208 98303 51 0.0169 |#
98304 102399 32 0.0106 |#
102400 106495 35 0.0116 |#
106496 110591 17 0.0056 |
110592 114687 19 0.0063 |
114688 118783 18 0.0060 |
118784 122879 6 0.0020 |
122880 126975 8 0.0027 |
Normalized probability
Response time distribution
measured in nanoseconds
using High Dynamic
Range Histogram
:# Zero counts skipped
|# Contiguous buckets
Median and 99th
percentile values
service time for
load generator
Cache hit Cache miss
Go-Kit Histogram Example
const (
maxHistObservable = 1000000
sampleCount = 500
)
func NewHist(name string) metrics.Histogram {
var h metrics.Histogram
if name != "" && archaius.Conf.Collect {
h = expvar.NewHistogram(name, 1000, maxHistObservable, 1, []int{50, 99}...)
if sampleMap == nil {
sampleMap = make(map[metrics.Histogram][]int64)
}
sampleMap[h] = make([]int64, 0, sampleCount)
return h
}
return nil
}
func Measure(h metrics.Histogram, d time.Duration) {
if h != nil && archaius.Conf.Collect {
if d > maxHistObservable {
h.Observe(int64(maxHistObservable))
} else {
h.Observe(int64(d))
}
s := sampleMap[h]
if s != nil && len(s) < sampleCount {
sampleMap[h] = append(s, int64(d))
}
}
}
Nanoseconds!
Median and 99%ile
Slice for first 500
values as samples for
export to Guesstimate
Golang Guesstimate Interface
https://github.com/adrianco/goguesstimate
{
"space": {
"name": "gotest",
"description": "Testing",
"is_private": "true",
"graph": {
"metrics": [
{"id": "AB", "readableId": "AB", "name": "memcached", "location": {"row": 2, "column":4}},
{"id": "AC", "readableId": "AC", "name": "memcached percent", "location": {"row": 2, "column":
3}},
{"id": "AD", "readableId": "AD", "name": "staash cpu", "location": {"row": 3, "column":3}},
{"id": "AE", "readableId": "AE", "name": "staash", "location": {"row": 3, "column":2}}
],
"guesstimates": [
{"metric": "AB", "input": null, "guesstimateType": "DATA", "data":
[119958,6066,13914,9595,6773,5867,2347,1333,9900,9404,13518,9021,7915,3733,10244,5461,12243,7931,9044,11706,
5706,22861,9022,48661,15158,28995,16885,9564,17915,6610,7080,7065,12992,35431,11910,11465,14455,25790,8339,9
991]},
{"metric": "AC", "input": "40", "guesstimateType": "POINT"},
{"metric": "AD", "input": "[1000,4000]", "guesstimateType": "LOGNORMAL"},
{"metric": "AE", "input": "=100+((randomInt(0,100)>AC)?AB:AD)", "guesstimateType": "FUNCTION"}
]
}
}
}
See http://www.getguesstimate.com
% cd json_metrics; sh guesstimate.sh storage
@adrianco
Simplicity through symmetry
Symmetry
Invariants
Stable assertions
No special cases
@adrianco
“We see the world as increasingly more complex and chaotic
because we use inadequate concepts to explain it. When we
understand something, we no longer see it as chaotic or complex.”
Jamshid Gharajedaghi - 2011
Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture
Learn More…
Q&A
Adrian Cockcroft @adrianco
http://slideshare.com/adriancockcroft
Technology Fellow - Battery Ventures
See www.battery.com for a list of portfolio investments
Security
Visit http://www.battery.com/our-companies/ for a full list of all portfolio companies in which all Battery Funds have invested.
Palo Alto Networks
Enterprise IT
Operations &
Management
Big DataCompute
Networking
Storage

Mais conteúdo relacionado

Mais procurados

Micro Service Architecture
Micro Service ArchitectureMicro Service Architecture
Micro Service Architecture
Eduards Sizovs
 

Mais procurados (20)

Epidemic Failures
Epidemic FailuresEpidemic Failures
Epidemic Failures
 
Delivering with Microservices - How to Iterate Towards Sophistication
Delivering with Microservices - How to Iterate Towards SophisticationDelivering with Microservices - How to Iterate Towards Sophistication
Delivering with Microservices - How to Iterate Towards Sophistication
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Openstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock InOpenstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock In
 
Microservices the Good Bad and the Ugly
Microservices the Good Bad and the UglyMicroservices the Good Bad and the Ugly
Microservices the Good Bad and the Ugly
 
GameDay - Achieving resilience through Chaos Engineering
GameDay - Achieving resilience through Chaos EngineeringGameDay - Achieving resilience through Chaos Engineering
GameDay - Achieving resilience through Chaos Engineering
 
From Monolith to Microservices – and Beyond!
From Monolith to Microservices – and Beyond!From Monolith to Microservices – and Beyond!
From Monolith to Microservices – and Beyond!
 
Microxchg Microservices
Microxchg MicroservicesMicroxchg Microservices
Microxchg Microservices
 
When Developers Operate and Operators Develop
When Developers Operate and Operators DevelopWhen Developers Operate and Operators Develop
When Developers Operate and Operators Develop
 
It’s Not Just Request/Response: Understanding Event-driven Microservices
It’s Not Just Request/Response: Understanding Event-driven MicroservicesIt’s Not Just Request/Response: Understanding Event-driven Microservices
It’s Not Just Request/Response: Understanding Event-driven Microservices
 
PagerDuty + Rundeck = Shorter Incidents, Fewer Escalations
PagerDuty + Rundeck = Shorter Incidents, Fewer EscalationsPagerDuty + Rundeck = Shorter Incidents, Fewer Escalations
PagerDuty + Rundeck = Shorter Incidents, Fewer Escalations
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Cloud Native Architectures for Devops
Cloud Native Architectures for DevopsCloud Native Architectures for Devops
Cloud Native Architectures for Devops
 
Cloud-native Data
Cloud-native DataCloud-native Data
Cloud-native Data
 
Goto Berlin - Migrating to Microservices (Fast Delivery)
Goto Berlin - Migrating to Microservices (Fast Delivery)Goto Berlin - Migrating to Microservices (Fast Delivery)
Goto Berlin - Migrating to Microservices (Fast Delivery)
 
The Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitThe Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud Summit
 
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
 
Micro Service Architecture
Micro Service ArchitectureMicro Service Architecture
Micro Service Architecture
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
 
Cloud Native Cost Optimization UCC
Cloud Native Cost Optimization UCCCloud Native Cost Optimization UCC
Cloud Native Cost Optimization UCC
 

Destaque

Principles of microservices velocity
Principles of microservices   velocityPrinciples of microservices   velocity
Principles of microservices velocity
Sam Newman
 

Destaque (7)

Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
DevOps Days Boston 2017: Developer first workflows for Kubernetes
DevOps Days Boston 2017: Developer first workflows for KubernetesDevOps Days Boston 2017: Developer first workflows for Kubernetes
DevOps Days Boston 2017: Developer first workflows for Kubernetes
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!
 
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
 
Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016
 
Principles of microservices velocity
Principles of microservices   velocityPrinciples of microservices   velocity
Principles of microservices velocity
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017
 

Semelhante a Microservices: What's Missing - O'Reilly Software Architecture New York

Semelhante a Microservices: What's Missing - O'Reilly Software Architecture New York (20)

What's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at CiscoWhat's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at Cisco
 
Architecting Microservices in .Net
Architecting Microservices in .NetArchitecting Microservices in .Net
Architecting Microservices in .Net
 
Microservices: State of the Union
Microservices: State of the UnionMicroservices: State of the Union
Microservices: State of the Union
 
Metrics driven dev ops 2017
Metrics driven dev ops 2017Metrics driven dev ops 2017
Metrics driven dev ops 2017
 
Top Performance Problems in Distributed Architectures
Top Performance Problems in Distributed ArchitecturesTop Performance Problems in Distributed Architectures
Top Performance Problems in Distributed Architectures
 
Supercharging Optimizely Performance by Moving Decisions to the Edge
Supercharging Optimizely Performance by Moving Decisions to the EdgeSupercharging Optimizely Performance by Moving Decisions to the Edge
Supercharging Optimizely Performance by Moving Decisions to the Edge
 
Managing microservices with Istio Service Mesh
Managing microservices with Istio Service MeshManaging microservices with Istio Service Mesh
Managing microservices with Istio Service Mesh
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentation
 
Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013
 
“Shift-Left.” Performance And Architecture Validation with Continuous Integra...
“Shift-Left.” Performance And Architecture Validation with Continuous Integra...“Shift-Left.” Performance And Architecture Validation with Continuous Integra...
“Shift-Left.” Performance And Architecture Validation with Continuous Integra...
 
"Shift-Left." Performance And Architecture Validation with Continuous Integra...
"Shift-Left." Performance And Architecture Validation with Continuous Integra..."Shift-Left." Performance And Architecture Validation with Continuous Integra...
"Shift-Left." Performance And Architecture Validation with Continuous Integra...
 
Microservices with Spring
Microservices with SpringMicroservices with Spring
Microservices with Spring
 
Apply best parts of microservices to serverless
Apply best parts of microservices to serverlessApply best parts of microservices to serverless
Apply best parts of microservices to serverless
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentation
 
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
 
DevOps Pipelines and Metrics Driven Feedback Loops
DevOps Pipelines and Metrics Driven Feedback LoopsDevOps Pipelines and Metrics Driven Feedback Loops
DevOps Pipelines and Metrics Driven Feedback Loops
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
 
Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017
 
Reactive Application Using METEOR
Reactive Application Using METEORReactive Application Using METEOR
Reactive Application Using METEOR
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 

Mais de Adrian Cockcroft

Mais de Adrian Cockcroft (13)

Gophercon 2016 Communicating Sequential Goroutines
Gophercon 2016 Communicating Sequential GoroutinesGophercon 2016 Communicating Sequential Goroutines
Gophercon 2016 Communicating Sequential Goroutines
 
Microservices Workshop - Craft Conference
Microservices Workshop - Craft ConferenceMicroservices Workshop - Craft Conference
Microservices Workshop - Craft Conference
 
In Search of Segmentation
In Search of SegmentationIn Search of Segmentation
In Search of Segmentation
 
Microxchg Analyzing Response Time Distributions for Microservices
Microxchg Analyzing Response Time Distributions for MicroservicesMicroxchg Analyzing Response Time Distributions for Microservices
Microxchg Analyzing Response Time Distributions for Microservices
 
Innovation and Architecture
Innovation and ArchitectureInnovation and Architecture
Innovation and Architecture
 
Cloud Trends Nov2015 Structure
Cloud Trends Nov2015 StructureCloud Trends Nov2015 Structure
Cloud Trends Nov2015 Structure
 
Dockercon 2015 - Faster Cheaper Safer
Dockercon 2015 - Faster Cheaper SaferDockercon 2015 - Faster Cheaper Safer
Dockercon 2015 - Faster Cheaper Safer
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A Challenge
 
Dockercon State of the Art in Microservices
Dockercon State of the Art in MicroservicesDockercon State of the Art in Microservices
Dockercon State of the Art in Microservices
 
Cloud Native Cost Optimization
Cloud Native Cost OptimizationCloud Native Cost Optimization
Cloud Native Cost Optimization
 
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
 
Disrupting the Storage Industry talk at SNIA Data Storage Innovation Conference
Disrupting the Storage Industry talk at SNIA Data Storage Innovation ConferenceDisrupting the Storage Industry talk at SNIA Data Storage Innovation Conference
Disrupting the Storage Industry talk at SNIA Data Storage Innovation Conference
 
Hack Kid Con - Learn to be a Data Scientist for $1
Hack Kid Con - Learn to be a Data Scientist for $1Hack Kid Con - Learn to be a Data Scientist for $1
Hack Kid Con - Learn to be a Data Scientist for $1
 

Último

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Último (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 

Microservices: What's Missing - O'Reilly Software Architecture New York

  • 1. Microservices: What’s Missing? Adrian Cockcroft @adrianco Technology Fellow - Battery Ventures April 2016
  • 2. What does @adrianco do? @adrianco Technology Due Diligence on Deals Presentations at Conferences Presentations at Companies Technical Advice for Portfolio Companies Program Committee for Conferences Networking with Interesting PeopleTinkering with Technologies Maintain Relationship with Cloud Vendors Previously: Netflix, eBay, Sun Microsystems, CCL, TCU London BSc Applied Physics
  • 3. http://www.infoq.com/presentations/Twitter-Timeline-Scalability http://www.infoq.com/presentations/twitter-soa http://www.infoq.com/presentations/Zipkin http://www.infoq.com/presentations/scale-gilt Go-Kit https://www.youtube.com/watch?v=aL6sd4d4hxk http://www.infoq.com/presentations/circuit-breaking-distributed-systems https://speakerdeck.com/mattheath/scaling-micro-services-in-go-highload-plus-plus-2014 State of the Art in Web Scale Microservice Architectures AWS Re:Invent : Asgard to Zuul https://www.youtube.com/watch?v=p7ysHhs5hl0 Resiliency at Massive Scale https://www.youtube.com/watch?v=ZfYJHtVL1_w Microservice Architecture https://www.youtube.com/watch?v=CriDUYtfrjs New projects for 2015 and Docker Packaging https://www.youtube.com/watch?v=hi7BDAtjfKY Spinnaker deployment pipeline https://www.youtube.com/watch?v=dwdVwE52KkU http://www.infoq.com/presentations/spring-cloud-2015
  • 4. Microservice Architectures ConfigurationTooling Discovery Routing Observability Development: Languages and Container Operational: Orchestration and Deployment Infrastructure Datastores Policy: Architectural and Security Compliance
  • 5. Microservices Edda Archaius Configuration Spinnaker SpringCloud Tooling Eureka Prana Discovery Denominator Zuul Ribbon Routing Hystrix Pytheus Atlas Observability Development using Java, Groovy, Scala, Clojure, Python with AMI and Docker Containers Orchestration with Autoscalers on AWS, Titus exploring Mesos & ECS for Docker Ephemeral datastores using Dynomite, Memcached, Astyanax, Staash, Priam, Cassandra Policy via the Simian Army - Chaos Monkey, Chaos Gorilla, Conformity Monkey, Security Monkey
  • 6. @adrianco Discussion Points Failure injection testing Versioning, routing Protocols and interfaces Timeouts and retries Denormalized data models Monitoring, tracing Simplicity through symmetry
  • 7. @adrianco Failure Injection Testing Netflix Chaos Monkey and Simian Army http://techblog.netflix.com/2011/07/netflix-simian-army.html http://techblog.netflix.com/2014/10/fit-failure-injection-testing.html http://techblog.netflix.com/2016/01/automated-failure-testing.html
  • 8. ! Chaos Monkey - enforcing stateless business logic ! Chaos Gorilla - enforcing zone isolation/replication ! Chaos Kong - enforcing region isolation/replication ! Security Monkey - watching for insecure configuration settings ! Latency Monkey & FIT - inject errors to enforce robust dependencies ! See over 100 NetflixOSS projects at netflix.github.com ! Get “Technical Indigestion” reading techblog.netflix.com Trust with Verification
  • 9. @adrianco Benefits of version aware routing Immediately and safely introduce a new version Canary test in production Use feature flags n Route clients to a version so they can’t get disrupted Change client or dependencies but not both at once Eventually remove old versions Incremental or infrequent “break the build” garbage collection
  • 10. @adrianco Versioning, Routing Version numbering: Interface.Feature.Bugfix V1.2.3 to V1.2.4 - Canary test then remove old version V1.2.x to V1.3.x - Canary test then remove or keep both Route V1.3.x clients to new version to get new feature Remove V1.2.x only after V1.3.x is found to work for V1.2.x clients V1.x.x to V2.x.x - Route clients to specific versions Remove old server version when all old clients are gone
  • 11. @adrianco Protocols Measure serialization, transmission, deserialization costs “Sending a megabyte of XML between microservices will make you sad…” Use Thrift, Protobuf/gRPC, Avro, SBE internally Use JSON for external/public interfaces https://github.com/real-logic/simple-binary-encoding
  • 12. @adrianco Interfaces When you build a service, build a “driver” client for it Reference implementation error handling and serialization Release automation stress test using client Validate that service interface is usable! Minimize additional dependencies Swagger - OpenAPI Specification Datawire Quark adds behaviors to API spec
  • 23. @adrianco Interface Version Pinning Change one thing at a time! Pin the version of everything else Incremental build/test/deploy pipeline Deploy existing app code with new platform Deploy existing app code with new dependencies Deploy new app code with pinned platform/dependencies
  • 24. @adrianco Timeouts and Retries Connection timeout vs. request timeout confusion Usually setup incorrectly, global defaults Systems collapse with “retry storms” Timeouts too long, too many retries Services doing work that can never be used
  • 25. @adrianco Connections and Requests TCP makes a connection, HTTP makes a request HTTP hopefully reuses connections for several requests Both have different timeout and retry needs! TCP timeout is purely a property of one network latency hop HTTP timeout depends on the service and its dependencies connection path request path
  • 26. @adrianco Timeouts and Retries Edge Service Good Service Good Service Bad config: Every service defaults to 2 second timeout, two retries Edge Service not responding Overloaded service not responding Failed Service If anything breaks, everything upstream stops responding Retries add unproductive work
  • 27. @adrianco Timeouts and Retries Edge Service Good Service Good Service Bad config: Every service defaults to 2 second timeout, two retries Edge Service not responding Overloaded service not responding Failed Service If anything breaks, everything upstream stops responding Retries add unproductive work
  • 28. @adrianco Timeouts and Retries Edge Service Good Service Good Service Bad config: Every service defaults to 2 second timeout, two retries Edge Service not responding Overloaded service not responding Failed Service If anything breaks, everything upstream stops responding Retries add unproductive work
  • 29. @adrianco Timeouts and Retries Bad config: Every service defaults to 2 second timeout, two retries Edge service responds slowly Overloaded service Partially failed service
  • 30. @adrianco Timeouts and Retries Bad config: Every service defaults to 2 second timeout, two retries Edge service responds slowly Overloaded service Partially failed service First request from Edge timed out so it ignores the successful response and keeps retrying. Middle service load increases as it’s doing work that isn’t being consumed
  • 31. @adrianco Timeouts and Retries Bad config: Every service defaults to 2 second timeout, two retries Edge service responds slowly Overloaded service Partially failed service First request from Edge timed out so it ignores the successful response and keeps retrying. Middle service load increases as it’s doing work that isn’t being consumed
  • 32. @adrianco Timeout and Retry Fixes Cascading timeout budget Static settings that decrease from the edge or dynamic budget passed with request How often do retries actually succeed? Don’t ask the same instance the same thing Only retry on a different connection
  • 34. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, one retry Failed Service 3s 1s 1s Fast fail response after 2s Upstream timeout must always be longer than total downstream timeout * retries delay No unproductive work while fast failing
  • 35. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, failover retry Failed Service For replicated services with multiple instances never retry against a failed instance No extra retries or unproductive work Good Service
  • 36. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, failover retry Failed Service 3s 1s For replicated services with multiple instances never retry against a failed instance No extra retries or unproductive work Good Service Successful response delayed 1s
  • 37. @adrianco Manage Inconsistency ACM Paper: "The Network is Reliable" Distributed systems are inconsistent by nature Clients are inconsistent with servers Most caches are inconsistent Versions are inconsistent Get over it and Deal with it
  • 38. @adrianco Denormalized Data Models Any non-trivial organization has many databases Cross references exist, inconsistencies exist Microservices work best with individual simple stores Scale, operate, mutate, fail them independently NoSQL allows flexible schema/object versions
  • 39. @adrianco Denormalized Data Models Build custom cross-datasource check/repair processes Ensure all cross references are up to date Immutability Changes Everything http://highscalability.com/blog/2015/1/26/paper-immutability-changes-everything-by-pat-helland.html Memories, Guesses and Apologies https://blogs.msdn.microsoft.com/pathelland/2007/05/15/memories-guesses-and-apologies/
  • 41. Cloud Native Microservices ! High rate of change Code pushes can cause floods of new instances and metrics Short baseline for alert threshold analysis – everything looks unusual ! Ephemeral Configurations Short lifetimes make it hard to aggregate historical views Hand tweaked monitoring tools take too much work to keep running ! Microservices with complex calling patterns End-to-end request flow measurements are very important Request flow visualizations get overwhelmed
  • 42. Low Latency SaaS Based Monitors https://www.datadoghq.com/ http://www.instana.com/ www.bigpanda.io www.vividcortex.com signalfx.com wavefront.com sysdig.com See www.battery.com for a list of portfolio investments
  • 44. A Possible Hierarchy Continents Regions Zones Services Versions Containers Instances How Many? 3 to 5 2-4 per Continent 1-5 per Region 100’s per Zone Many per Service 1000’s per Version 10,000’s It’s much more challenging than just a large number of machines
  • 45. Flow
  • 46. Some tools can show the request flow across a few services
  • 47. Interesting architectures have a lot of microservices! Flow visualization is a big challenge. See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
  • 48. Simulated Microservices Model and visualize microservices Simulate interesting architectures Generate large scale configurations Eventually stress test real tools Code: github.com/adrianco/spigo Simulate Protocol Interactions in Go Visualize with D3 See for yourself: http://simianviz.surge.sh Follow @simianviz for updates ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Three Availability Zones Denominator DNS Endpoint
  • 49. Spigo Nanoservice Structure func Start(listener chan gotocol.Message) { ... for { select { case msg := <-listener: flow.Instrument(msg, name, hist) switch msg.Imposition { case gotocol.Hello: // get named by parent ... case gotocol.NameDrop: // someone new to talk to ... case gotocol.Put: // upstream request handler ... outmsg := gotocol.Message{gotocol.Replicate, listener, time.Now(), msg.Ctx.NewParent(), msg.Intention} flow.AnnotateSend(outmsg, name) outmsg.GoSend(replicas) } case <-eurekaTicker.C: // poll the service registry ... } } } Skeleton code for replicating a Put message Instrument incoming requests Instrument outgoing requests update trace context
  • 50. Flow Trace Records riak2 us-east-1 zoneC riak9 us-west-2 zoneA Put s896 Replicate riak3 us-east-1 zoneA riak8 us-west-2 zoneC riak4 us-east-1 zoneB riak10 us-west-2 zoneB us-east-1.zoneC.riak2 t98p895s896 Put us-east-1.zoneA.riak3 t98p896s908 Replicate us-east-1.zoneB.riak4 t98p896s909 Replicate us-west-2.zoneA.riak9 t98p896s910 Replicate us-west-2.zoneB.riak10 t98p910s912 Replicate us-west-2.zoneC.riak8 t98p910s913 Replicate staash us-east-1 zoneC s910 s908s913 s909s912 Replicate Put
  • 51. Open Zipkin A common format for trace annotations A Java tool for visualizing traces Standardization effort to fold in other formats Driven by Adrian Cole (currently at Pivotal) Extended to load Spigo generated trace files
  • 52. Trace for one Spigo Flow
  • 53. Definition of an architecture { "arch": "lamp", "description":"Simple LAMP stack", "version": "arch-0.0", "victim": "webserver", "services": [ { "name": "rds-mysql", "package": "store", "count": 2, "regions": 1, "dependencies": [] }, { "name": "memcache", "package": "store", "count": 1, "regions": 1, "dependencies": [] }, { "name": "webserver", "package": "monolith", "count": 18, "regions": 1, "dependencies": ["memcache", "rds-mysql"] }, { "name": "webserver-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["webserver"] }, { "name": "www", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["webserver-elb"] } ] } Header includes chaos monkey victim New tier name Tier package 0 = non Regional Node count List of tier dependencies See for yourself: http://simianviz.surge.sh/lamp
  • 54. Running Spigo $ ./spigo -a lamp -j -d 2 2016/01/26 23:04:05 Loading architecture from json_arch/lamp_arch.json 2016/01/26 23:04:05 lamp.edda: starting 2016/01/26 23:04:05 Architecture: lamp Simple LAMP stack 2016/01/26 23:04:05 architecture: scaling to 100% 2016/01/26 23:04:05 lamp.us-east-1.zoneB.eureka01....eureka.eureka: starting 2016/01/26 23:04:05 lamp.us-east-1.zoneA.eureka00....eureka.eureka: starting 2016/01/26 23:04:05 lamp.us-east-1.zoneC.eureka02....eureka.eureka: starting 2016/01/26 23:04:05 Starting: {rds-mysql store 1 2 []} 2016/01/26 23:04:05 Starting: {memcache store 1 1 []} 2016/01/26 23:04:05 Starting: {webserver monolith 1 18 [memcache rds-mysql]} 2016/01/26 23:04:05 Starting: {webserver-elb elb 1 0 [webserver]} 2016/01/26 23:04:05 Starting: {www denominator 0 0 [webserver-elb]} 2016/01/26 23:04:05 lamp.*.*.www00....www.denominator activity rate 10ms 2016/01/26 23:04:06 chaosmonkey delete: lamp.us-east-1.zoneC.webserver02....webserver.monolith 2016/01/26 23:04:07 asgard: Shutdown 2016/01/26 23:04:07 lamp.us-east-1.zoneB.eureka01....eureka.eureka: closing 2016/01/26 23:04:07 lamp.us-east-1.zoneA.eureka00....eureka.eureka: closing 2016/01/26 23:04:07 lamp.us-east-1.zoneC.eureka02....eureka.eureka: closing 2016/01/26 23:04:07 spigo: complete 2016/01/26 23:04:07 lamp.edda: closing -a architecture lamp -j graph json/lamp.json -d run for 2 seconds
  • 55. Riak IoT Architecture { "arch": "riak", "description":"Riak IoT ingestion example for the RICON 2015 presentation", "version": "arch-0.0", "victim": "", "services": [ { "name": "riakTS", "package": "riak", "count": 6, "regions": 1, "dependencies": ["riakTS", "eureka"]}, { "name": "ingester", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakTS"]}, { "name": "ingestMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["ingester"]}, { "name": "riakKV", "package": "riak", "count": 3, "regions": 1, "dependencies": ["riakKV"]}, { "name": "enricher", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakKV", "ingestMQ"]}, { "name": "enrichMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["enricher"]}, { "name": "analytics", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingester"]}, { "name": "analytics-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["analytics"]}, { "name": "analytics-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["analytics-elb"]}, { "name": "normalization", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["enrichMQ"]}, { "name": "iot-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["normalization"]}, { "name": "iot-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["iot-elb"]}, { "name": "stream", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingestMQ"]}, { "name": "stream-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["stream"]}, { "name": "stream-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["stream-elb"]} ] } New tier name Tier package Node count List of tier dependencies 0 = non Regional
  • 56. Single Region Riak IoT See for yourself: http://simianviz.surge.sh/riak
  • 57. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint See for yourself: http://simianviz.surge.sh/riak
  • 58. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Load Balancer Load Balancer See for yourself: http://simianviz.surge.sh/riak
  • 59. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Load Balancer Load Balancer Stream Service Analytics Service See for yourself: http://simianviz.surge.sh/riak
  • 60. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Load Balancer Load Balancer Stream Service Analytics Service See for yourself: http://simianviz.surge.sh/riak
  • 61. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Ingest Message Queue Load Balancer Load Balancer Stream Service Analytics Service See for yourself: http://simianviz.surge.sh/riak
  • 62. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Ingest Message Queue Load Balancer Load Balancer Stream Service Riak TS Analytics Service Ingester Service See for yourself: http://simianviz.surge.sh/riak
  • 63. Two Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint East Region Ingestion West Region Ingestion Multi Region TS Analytics See for yourself: http://simianviz.surge.sh/riak
  • 64. Two Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint East Region Ingestion West Region Ingestion Multi Region TS Analytics What’s the response time of the stream endpoint? See for yourself: http://simianviz.surge.sh/riak
  • 66. What’s the response time distribution of a very simple storage backed web service? memcached mysql disk volume web service load generator memcached
  • 68. memcached hit % memcached response mysql response service cpu time memcached hit mode mysql cache hit mode mysql disk access mode Hit rates: memcached 40% mysql 70%
  • 69. Hit rates: memcached 60% mysql 70%
  • 70. Hit rates: memcached 20% mysql 90%
  • 72. Changes made to codahale/hdrhistogram Changes made to go-kit/kit/metrics Implementation in adrianco/spigo/collect
  • 73. What to measure? Client Server GetRequest GetResponse Client Time Client Send CS Server Receive SR Server Send SS Client Receive CR Server Time
  • 74. What to measure? Client Server GetRequest GetResponse Client Time Client Send CS Server Receive SR Server Send SS Client Receive CR Response CR-CS Service SS-SR Network SR-CS Network CR-SS Net Round Trip (SR-CS) + (CR-SS) (CR-CS) - (SS-SR) Server Time
  • 75. Spigo Histogram Results Collected with: % spigo -d 60 -j -a storage -c name: storage.*.*..load00...load.denominator_serv quantiles: [{50 47103} {99 139263}] From To Count Prob Bar 20480 21503 2 0.0007 : 21504 22527 2 0.0007 | 23552 24575 1 0.0003 : 24576 25599 5 0.0017 | 25600 26623 5 0.0017 | 26624 27647 1 0.0003 | 27648 28671 3 0.0010 | 28672 29695 5 0.0017 | 29696 30719 127 0.0421 |#### 30720 31743 126 0.0418 |#### 31744 32767 74 0.0246 |## 32768 34815 281 0.0932 |######### 34816 36863 201 0.0667 |###### 36864 38911 156 0.0518 |##### 38912 40959 185 0.0614 |###### 40960 43007 147 0.0488 |#### 43008 45055 161 0.0534 |##### 45056 47103 125 0.0415 |#### 47104 49151 135 0.0448 |#### 49152 51199 99 0.0328 |### 51200 53247 82 0.0272 |## 53248 55295 77 0.0255 |## 55296 57343 66 0.0219 |## 57344 59391 54 0.0179 |# 59392 61439 37 0.0123 |# 61440 63487 45 0.0149 |# 63488 65535 33 0.0109 |# 65536 69631 63 0.0209 |## 69632 73727 98 0.0325 |### 73728 77823 92 0.0305 |### 77824 81919 112 0.0372 |### 81920 86015 88 0.0292 |## 86016 90111 55 0.0182 |# 90112 94207 38 0.0126 |# 94208 98303 51 0.0169 |# 98304 102399 32 0.0106 |# 102400 106495 35 0.0116 |# 106496 110591 17 0.0056 | 110592 114687 19 0.0063 | 114688 118783 18 0.0060 | 118784 122879 6 0.0020 | 122880 126975 8 0.0027 | Normalized probability Response time distribution measured in nanoseconds using High Dynamic Range Histogram :# Zero counts skipped |# Contiguous buckets Median and 99th percentile values service time for load generator Cache hit Cache miss
  • 76. Go-Kit Histogram Example const ( maxHistObservable = 1000000 sampleCount = 500 ) func NewHist(name string) metrics.Histogram { var h metrics.Histogram if name != "" && archaius.Conf.Collect { h = expvar.NewHistogram(name, 1000, maxHistObservable, 1, []int{50, 99}...) if sampleMap == nil { sampleMap = make(map[metrics.Histogram][]int64) } sampleMap[h] = make([]int64, 0, sampleCount) return h } return nil } func Measure(h metrics.Histogram, d time.Duration) { if h != nil && archaius.Conf.Collect { if d > maxHistObservable { h.Observe(int64(maxHistObservable)) } else { h.Observe(int64(d)) } s := sampleMap[h] if s != nil && len(s) < sampleCount { sampleMap[h] = append(s, int64(d)) } } } Nanoseconds! Median and 99%ile Slice for first 500 values as samples for export to Guesstimate
  • 77. Golang Guesstimate Interface https://github.com/adrianco/goguesstimate { "space": { "name": "gotest", "description": "Testing", "is_private": "true", "graph": { "metrics": [ {"id": "AB", "readableId": "AB", "name": "memcached", "location": {"row": 2, "column":4}}, {"id": "AC", "readableId": "AC", "name": "memcached percent", "location": {"row": 2, "column": 3}}, {"id": "AD", "readableId": "AD", "name": "staash cpu", "location": {"row": 3, "column":3}}, {"id": "AE", "readableId": "AE", "name": "staash", "location": {"row": 3, "column":2}} ], "guesstimates": [ {"metric": "AB", "input": null, "guesstimateType": "DATA", "data": [119958,6066,13914,9595,6773,5867,2347,1333,9900,9404,13518,9021,7915,3733,10244,5461,12243,7931,9044,11706, 5706,22861,9022,48661,15158,28995,16885,9564,17915,6610,7080,7065,12992,35431,11910,11465,14455,25790,8339,9 991]}, {"metric": "AC", "input": "40", "guesstimateType": "POINT"}, {"metric": "AD", "input": "[1000,4000]", "guesstimateType": "LOGNORMAL"}, {"metric": "AE", "input": "=100+((randomInt(0,100)>AC)?AB:AD)", "guesstimateType": "FUNCTION"} ] } } }
  • 78. See http://www.getguesstimate.com % cd json_metrics; sh guesstimate.sh storage
  • 80. @adrianco “We see the world as increasingly more complex and chaotic because we use inadequate concepts to explain it. When we understand something, we no longer see it as chaotic or complex.” Jamshid Gharajedaghi - 2011 Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture
  • 82. Q&A Adrian Cockcroft @adrianco http://slideshare.com/adriancockcroft Technology Fellow - Battery Ventures See www.battery.com for a list of portfolio investments
  • 83. Security Visit http://www.battery.com/our-companies/ for a full list of all portfolio companies in which all Battery Funds have invested. Palo Alto Networks Enterprise IT Operations & Management Big DataCompute Networking Storage