O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Microservices: State of the Union

664 visualizações

Publicada em

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/29ZQmIx.

Adrian Cockcroft discusses success/failure stories of adopting microservices, overviews what’s next with microservices and presents some of the techniques that have led to successful deployments. Filmed at qconnewyork.com.

Adrian Cockcroft works at Battery where he advises the firm and its portfolio companies about technology issues and also assists with deal sourcing and due diligence. He was a founding member of eBay Research Labs, developing advanced mobile applications and even building his own homebrew phone, years before iPhone and Android launched.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Microservices: State of the Union

  1. 1. Microservices: State of the Union Adrian Cockcroft @adrianco Technology Fellow - Battery Ventures June 2016
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ microservices-review
  3. 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. What does @adrianco do? @adrianco Technology Due Diligence on Deals Presentations at Conferences Presentations at Companies Technical Advice for Portfolio Companies Program Committee for Conferences Networking with Interesting PeopleTinkering with Technologies Maintain Relationship with Cloud Vendors Previously: Netflix, eBay, Sun Microsystems, CCL, TCU London BSc Applied Physics
  5. 5. Developer responsibilities: Faster, cheaper, safer
  6. 6. What Happened? Rate of change increased Cost and size and risk of change reduced
  7. 7. Disruptor: Continuous Delivery with Containerized Microservices
  8. 8. Microservices
  9. 9. A Microservice Definition Loosely coupled service oriented architecture with bounded contexts
  10. 10. A Microservice Definition Loosely coupled service oriented architecture with bounded contexts If every service has to be updated at the same time it’s not loosely coupled
  11. 11. A Microservice Definition Loosely coupled service oriented architecture with bounded contexts If every service has to be updated at the same time it’s not loosely coupled If you have to know too much about surrounding services you don’t have a bounded context. See the Domain Driven Design book by Eric Evans.
  12. 12. Speeding Up The Platform Datacenter Snowflakes • Deploy in months • Live for years
  13. 13. Speeding Up The Platform Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks
  14. 14. Speeding Up The Platform Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours
  15. 15. Speeding Up The Platform Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours Lambda Deployments • Deploy in milliseconds • Live for seconds
  16. 16. Speeding Up The Platform AWS Lambda is leading exploration of serverless architectures in 2016 Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours Lambda Deployments • Deploy in milliseconds • Live for seconds
  17. 17. http://www.infoq.com/presentations/Twitter-Timeline-Scalability http://www.infoq.com/presentations/twitter-soa http://www.infoq.com/presentations/Zipkin http://www.infoq.com/presentations/scale-gilt Go-Kit https://www.youtube.com/watch?v=aL6sd4d4hxk http://www.infoq.com/presentations/circuit-breaking-distributed-systems https://speakerdeck.com/mattheath/scaling-micro-services-in-go-highload-plus-plus-2014 State of the Art in Web Scale Microservice Architectures AWS Re:Invent : Asgard to Zuul https://www.youtube.com/watch?v=p7ysHhs5hl0 Resiliency at Massive Scale https://www.youtube.com/watch?v=ZfYJHtVL1_w Microservice Architecture https://www.youtube.com/watch?v=CriDUYtfrjs New projects for 2015 and Docker Packaging https://www.youtube.com/watch?v=hi7BDAtjfKY Spinnaker deployment pipeline https://www.youtube.com/watch?v=dwdVwE52KkU http://www.infoq.com/presentations/spring-cloud-2015
  18. 18. Microservice Architectures ConfigurationTooling Discovery Routing Observability Development: Languages and Container Operational: Orchestration and Deployment Infrastructure Datastores Policy: Architectural and Security Compliance
  19. 19. Next Generation Applications Fill in the gaps, rapidly evolving ecosystem choices Archaius LaunchDarkly Habitat Configuration Lambda Docker Spinnaker Tooling Etcd Eureka Consul Discovery Compose Linkerd Weave Routing Zipkin Prometheus Hystrix Observability Development: components interfaces languages e.g. Docker Hub, Artifactory, Datawire Quark, Go, Rust Operational: Mesos, Kubernetes, Swarm, Nomad for private clouds. ECS, Mesos, GKS for public Datastores: Orchestrated, Distributed Ephemeral e.g. Cassandra, or DBaaS e.g. DynamoDB Policy: Security compliance e.g. Docker Content Trust. Architecture compliance e.g. Cloud Foundry
  20. 20. @adrianco In Search of Segmentation Ops Dev Datacenters AD/LDAP Roles VLAN Networks Hypervisor IPtables Docker Links AWS Accounts IAM Roles VPC Security Groups Calico Policy Docker Net/Weave
  21. 21. @adrianco Hierarchical Segmentation B CA B C E FD E F Homepage Team Security Group Reports Team Security Group VPC Z - Manage a small number of network spaces D An AWS oriented example… AWS Account - Manage across multiple accounts containers and links
  22. 22. @adrianco What’s Often Missing? Failure injection testing Versioning, routing Binary protocols and interfaces Timeouts and retries Denormalized data models Monitoring, tracing Simplicity through symmetry
  23. 23. @adrianco Failure Injection Testing Netflix Chaos Monkey, Simian Army, FIT and Gremlin http://techblog.netflix.com/2011/07/netflix-simian-army.html http://techblog.netflix.com/2014/10/fit-failure-injection-testing.html http://techblog.netflix.com/2016/01/automated-failure-testing.html https://www.infoq.com/presentations/failure-test-research-netflix
  24. 24. ! Chaos Monkey - enforcing stateless business logic ! Chaos Gorilla - enforcing zone isolation/replication ! Chaos Kong - enforcing region isolation/replication ! Security Monkey - watching for insecure configuration settings ! FIT & Gremlin - inject errors to enforce robust dependencies ! See over 100 NetflixOSS projects at netflix.github.com ! Get “Technical Indigestion” reading techblog.netflix.com Trust with Verification
  25. 25. @adrianco Benefits of version aware routing Immediately and safely introduce a new version Canary test in production Use DIY feature flags or . Route clients to a version so they can’t get disrupted Change client or dependencies but not both at once Eventually remove old versions Incremental or infrequent “break the build” garbage collection
  26. 26. @adrianco Versioning, Routing Version numbering: Interface.Feature.Bugfix V1.2.3 to V1.2.4 - Canary test then remove old version V1.2.x to V1.3.x - Canary test then remove or keep both Route V1.3.x clients to new version to get new feature Remove V1.2.x only after V1.3.x is found to work for V1.2.x clients V1.x.x to V2.x.x - Route clients to specific versions Remove old server version when all old clients are gone
  27. 27. @adrianco Protocols Measure serialization, transmission, deserialization costs Sending a megabyte of XML between microservices will make you sad, but not as sad as 10yrs ago with SOAP Use Thrift, Protobuf/gRPC, Avro, SBE internally Use JSON for external/public interfaces https://github.com/real-logic/simple-binary-encoding
  28. 28. @adrianco Interfaces When you build a service, build a “driver” client for it Reference implementation error handling and serialization Release automation stress test using client Validate that service interface is usable! Minimize additional dependencies Swagger - OpenAPI Specification Datawire Quark adds behaviors to API spec
  29. 29. @adrianco Interface Version Pinning Change one thing at a time! Pin the version of everything else Incremental build/test/deploy pipeline Deploy existing app code with new platform Deploy existing app code with new dependencies Deploy new app code with pinned platform/dependencies
  30. 30. @adrianco Interfaces between teams
  31. 31. @adrianco Interfaces between teams Client Code Minimal Object Model
  32. 32. @adrianco Interfaces between teams Service Code Client Code Minimal Object Model Full Object Model
  33. 33. @adrianco Interfaces between teams Service Code Client Code Minimal Object Model Full Object Model Cache Code Common Object Model Decoupled object models
  34. 34. @adrianco Interfaces between teams Service Code Client Code Minimal Object Model Service Driver Service Handler Full Object Model Cache Code Common Object Model Decoupled object models
  35. 35. @adrianco Interfaces between teams Service Code Client Code Minimal Object Model Cache Driver Service Driver Service Handler Full Object Model Cache Code Cache Handler Common Object Model Decoupled object models
  36. 36. @adrianco Interfaces between teams Service Code Client Code Minimal Object Model Cache Driver Service Driver Platform Platform Service Handler Full Object Model Cache Code Platform Cache Handler Common Object Model Decoupled object models
  37. 37. @adrianco Interfaces between teams Service Code Client Code Minimal Object Model Cache Driver Service Driver Platform Platform Service Handler Full Object Model Cache Code Platform Cache Handler Common Object Model Versioned dependency interfacesDecoupled object models
  38. 38. @adrianco Interfaces between teams Service Code Client Code Minimal Object Model Cache Driver Service Driver Platform Platform Service Handler Full Object Model Cache Code Platform Cache Handler Common Object Model Versioned dependency interfaces Versioned platform interface Decoupled object models
  39. 39. @adrianco Interfaces between teams Service Code Client Code Minimal Object Model Cache Driver Service Driver Platform Platform Service Handler Full Object Model Cache Code Platform Cache Handler Common Object Model Versioned dependency interfaces Versioned platform interface Decoupled object models Versioned routing
  40. 40. @adrianco Timeouts and Retries Connection timeout vs. request timeout confusion Usually setup incorrectly, global defaults Systems collapse with “retry storms” Timeouts too long, too many retries Services doing work that can never be used
  41. 41. @adrianco Connections and Requests TCP makes a connection, HTTP makes a request HTTP hopefully reuses connections for several requests Both have different timeout and retry needs! TCP timeout is purely a property of one network latency hop HTTP timeout depends on the service and its dependencies connection path request path
  42. 42. @adrianco Timeouts and Retries Edge Service Good Service Good Service Bad config: Every service defaults to 2 second timeout, two retries Edge Service not responding Overloaded service not responding Failed Service If anything breaks, everything upstream stops responding Retries add unproductive work
  43. 43. @adrianco Timeouts and Retries Edge Service Good Service Good Service Bad config: Every service defaults to 2 second timeout, two retries Edge Service not responding Overloaded service not responding Failed Service If anything breaks, everything upstream stops responding Retries add unproductive work
  44. 44. @adrianco Timeouts and Retries Edge Service Good Service Good Service Bad config: Every service defaults to 2 second timeout, two retries Edge Service not responding Overloaded service not responding Failed Service If anything breaks, everything upstream stops responding Retries add unproductive work
  45. 45. @adrianco Timeouts and Retries Bad config: Every service defaults to 2 second timeout, two retries Edge service responds slowly Overloaded service Partially failed service
  46. 46. @adrianco Timeouts and Retries Bad config: Every service defaults to 2 second timeout, two retries Edge service responds slowly Overloaded service Partially failed service First request from Edge timed out so it ignores the successful response and keeps retrying. Middle service load increases as it’s doing work that isn’t being consumed
  47. 47. @adrianco Timeouts and Retries Bad config: Every service defaults to 2 second timeout, two retries Edge service responds slowly Overloaded service Partially failed service First request from Edge timed out so it ignores the successful response and keeps retrying. Middle service load increases as it’s doing work that isn’t being consumed
  48. 48. @adrianco Timeout and Retry Fixes Cascading timeout budget Static settings that decrease from the edge or dynamic budget passed with request How often do retries actually succeed? Don’t ask the same instance the same thing Only retry on a different connection
  49. 49. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, one retry Failed Service
  50. 50. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, one retry Failed Service 3s 1s 1s Fast fail response after 2s Upstream timeout must always be longer than total downstream timeout * retries delay No unproductive work while fast failing
  51. 51. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, failover retry Failed Service For replicated services with multiple instances never retry against a failed instance No extra retries or unproductive work Good Service
  52. 52. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, failover retry Failed Service 3s 1s For replicated services with multiple instances never retry against a failed instance No extra retries or unproductive work Good Service Successful response delayed 1s
  53. 53. @adrianco Manage Inconsistency ACM Paper: "The Network is Reliable" Distributed systems are inconsistent by nature Clients are inconsistent with servers Most caches are inconsistent Versions are inconsistent Get over it and Deal with it See http://queue.acm.org/detail.cfm?id=2655736
  54. 54. @adrianco Denormalized Data Models Any non-trivial organization has many databases Cross references exist, inconsistencies exist Microservices work best with individual simple stores Scale, operate, mutate, fail them independently NoSQL allows flexible schema/object versions
  55. 55. @adrianco Denormalized Data Models Build custom cross-datasource check/repair processes Ensure all cross references are up to date Read these Pat Helland papers Immutability Changes Everything http://highscalability.com/blog/2015/1/26/paper-immutability-changes-everything-by-pat-helland.html Memories, Guesses and Apologies https://blogs.msdn.microsoft.com/pathelland/2007/05/15/memories-guesses-and-apologies/ Standing on the Distributed Shoulders of Giants http://queue.acm.org/detail.cfm?id=2953944
  56. 56. Cloud Native Monitoring and Microservices
  57. 57. Low Latency SaaS Based Monitors https://www.datadoghq.com/ http://www.instana.com/ www.bigpanda.io www.vividcortex.com signalfx.com wavefront.com sysdig.com See www.battery.com for a list of portfolio investments
  58. 58. A Tragic Quadrant Ability to scale Ability to handle rapidly changing microservices In-house tools at web scale companies Most current monitoring & APM tools Next generation APM Next generation Monitoring Datacenter Cloud Containers 100s 1,000s 10,000s 100,000s Lambda
  59. 59. A Tragic Quadrant Ability to scale Ability to handle rapidly changing microservices In-house tools at web scale companies Most current monitoring & APM tools Next generation APM Next generation Monitoring Datacenter Cloud Containers 100s 1,000s 10,000s 100,000s Lambda YMMV: Opinionated approximate positioning only
  60. 60. Interesting architectures have a lot of microservices! Flow visualization is a big challenge. See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
  61. 61. Simulated Microservices Model and visualize microservices Simulate interesting architectures Generate large scale configurations Eventually stress test real tools Code: github.com/adrianco/spigo Simulate Protocol Interactions in Go Visualize with D3 See for yourself: http://simianviz.surge.sh Follow @simianviz for updates ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Three Availability Zones Denominator DNS Endpoint
  62. 62. @adrianco Simplicity through symmetry Symmetry Invariants Stable assertions No special cases Single purpose components
  63. 63. Serverless
  64. 64. Serverless Architectures AWS Lambda getting some early wins Google Cloud Functions, Azure Functions alpha launched IBM OpenWhisk - open sourced Startup activity: iron.io , serverless.com, apex.run toolkit
  65. 65. @adrianco Serverless Architecture API Gateway Kinesis S3DynamoDB
  66. 66. @adrianco Serverless Architecture API Gateway Kinesis S3DynamoDB
  67. 67. @adrianco Serverless Architecture API Gateway Kinesis S3DynamoDB
  68. 68. AWS Lambda Reference Archhttp://www.allthingsdistributed.com/2016/05/aws-lambda-serverless-reference-architectures.html
  69. 69. Serverless Programming Model Event driven functions Role based permissions Whitelisted API based security Good for simple single threaded code
  70. 70. Serverless Cost Efficiencies 100% useful work, no agents, overheads 100% utilization, no charge between requests No need to size capacity for peak traffic Anecdotal costs ~1% of conventional system Ideal for low traffic, Corp IT, spiky workloads
  71. 71. Serverless Work in Progress Tooling for ease of use Multi-region HA/DR patterns Debugging and testing frameworks Monitoring, end to end tracing
  72. 72. DIY Serverless Operating Challenges Startup latency Execution overhead Charging model Capacity planning
  73. 73. Learn More…
  74. 74. @adrianco “We see the world as increasingly more complex and chaotic because we use inadequate concepts to explain it. When we understand something, we no longer see it as chaotic or complex.” Jamshid Gharajedaghi - 2011 Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture
  75. 75. Q&A Adrian Cockcroft @adrianco http://slideshare.com/adriancockcroft Technology Fellow - Battery Ventures See www.battery.com for a list of portfolio investments
  76. 76. Security Visit http://www.battery.com/our-companies/ for a full list of all portfolio companies in which all Battery Funds have invested. Palo Alto Networks Enterprise IT Operations & Management Big DataCompute Networking Storage
  77. 77. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ microservices-review

×