Services are at the core of modern software architecture. Deploying a series of modular, small (micro-)services rather than big monoliths gives developers the flexibility to work in different languages, technologies and release cadence across the system; resulting in higher productivity and velocity, especially for larger teams.
With the adoption of microservices, however, new problems emerge due to the sheer number of services that exist in a larger system. Problems that had to be solved once for a monolith, like security, load balancing, monitoring, and rate limiting need to be handled for each service.
Istio, announced at GlueCon 2017, addresses these problems in a fundamental way through a service mesh framework. With Istio, developers can implement the core logic for the microservices, and let the framework take care of the rest – traffic management, discovery, service identity and security, and policy enforcement. Better yet, this can be also done for existing microservices without rewriting or recompiling any of their parts. Istio uses Envoy as its runtime proxy component and provides an extensible intermediation layer which allows global cross-cutting policy enforcement and telemetry collection.
6. DoIT International confidential │ Do not distribute
Agenda
● Kubernetes Overview with focus on Networking
● Flannel. What. Why. How.
● What’s still missing?
● Istio Key Concepts & Architecture
● More about Istio (features, roadmap, production readiness)
7. DoIT International confidential │ Do not distribute
Kubernetes Overview
Docker by itself is suffice when using containers in production
● What if the application is composed of multiple containers?
● What if you feel that putting all your containers on the same host sucks?
● What about deploying a new version of your application without service interruption?
● What about container failure management?
8. DoIT International confidential │ Do not distribute
Kubernetes Overview
Cluster control plane (AKA master)
● API Server
● Cluster state store
● Controller-Manager Server
● Scheduler
The Kubernetes Node
● Kubelet
● Container runtime
● Kube Proxy
Add-ons and other dependencies
● DNS
● Ingress controller
● Heapster (resource monitoring)
● Dashboard (GUI)
Federation
Kubernetes == container cluster management tool
9. DoIT International confidential │ Do not distribute
Kubernetes Networking
Little reminder on the Kubernetes networking concepts:
● All containers can communicate with all other containers without NAT
● All nodes can communicate with all containers (and vice-versa) without NAT
● The IP that a container sees itself as is the same IP that others see it as
You can’t just take two computers running Docker and expect Kubernetes
to work...
10. DoIT International confidential │ Do not distribute
Kubernetes Networking
There are many different networking options that offer these capabilities for
Kubernetes:
● Contiv
● Flannel
● Nuage Networks
● OpenVSwitch
● OVN
● Project Calico
● Romana
● Weave Net
13. DoIT International confidential │ Do not distribute
What's the problem is dude?
● Well, now we have mash of services that can speak with each other without
any control…
● The way that microservices interact with each other at runtime needs to be:
○ Monitored
○ Managed
○ Controlled
14. DoIT International confidential │ Do not distribute
Isti WHAT?
Istio is “an open platform to connect, manage, and secure microservices”
● An easy way to create a network of deployed services with:
○ Load balancing
○ Service-to-service authentication
○ Monitoring
○ and more
● No change in service code is require.
15. DoIT International confidential │ Do not distribute
Isti WHAT?
Kubernetes → Greek for "helmsman of a ship"
Istio → Greek word for 'sail'
16. DoIT International confidential │ Do not distribute
Istio's Key Capabilities
● Traffic Management:
○ Control the flow of traffic and API calls between services
● Observability:
○ Dependencies between services
○ Nature and flow of traffic between services
17. DoIT International confidential │ Do not distribute
Istio's Key Capabilities
● Policy Enforcement:
○ Apply organizational policy to the interaction between services
○ Ensure access policies are enforced
○ Ensure resources are fairly distributed among consumers
○ Policy changes made by configuring the mesh, not by changing
application code
● Service Identity and Security:
○ Provide services in the mesh with a verifiable identity
○ Protect service traffic
18. DoIT International confidential │ Do not distribute
Istio's Key Capabilities
● Platform Support:
○ Designed to run in a variety of environments
■ Ones that span Cloud
■ On-premise
■ Kubernetes
■ Mesos
■ etc.
● Integration and Customization:
○ Integrate with existing solutions for ACLs, logging, monitoring, quotas,
auditing and more
19. DoIT International confidential │ Do not distribute
Istio’s Architecture
An Istio service mesh is logically split into a data plane and a control plane
● Data plane: Set of intelligent proxies (Envoy)
● Control plane: Managing and configuring proxies to route traffic, as well as
enforcing policies at runtime
21. DoIT International confidential │ Do not distribute
Istio’s Architecture - Envoy
● Extended version of the Envoy proxy
● A high-performance proxy developed in C++
● Mediate all inbound and outbound traffic
● Deployed as a sidecar to the relevant service
● Allows to add Istio capabilities to an existing deployment with no need to re-
architect or rewrite code
22. DoIT International confidential │ Do not distribute
Istio’s Architecture - Envoy
Istio leverages Envoy’s many built-in features such as:
● Dynamic service discovery
● Load balancing
● TLS termination
● HTTP/2
● gRPC proxying
● Circuit breakers
● Health checks
● Staged rollouts with %-based traffic split
● Fault injection
● Rich metrics
23. DoIT International confidential │ Do not distribute
Istio’s Architecture - Mixer
● A generic intermediation layer between application code and
infrastructure backends
● Moves policy decisions out of the app layer and into configuration
● The app code does a fairly simple integration with Mixer
24. DoIT International confidential │ Do not distribute
Istio’s Architecture - Mixer
● Responsible for:
○ Enforcing access control and usage policies
○ Collecting telemetry data
● Extracts request level attributes
● Includes a flexible plugin model to interface with a variety of host
environments and infrastructure backends
26. DoIT International confidential │ Do not distribute
Istio’s Architecture - Pilot
● The core component used for traffic management in Istio is Pilot
● Specify rules to route traffic between Envoy proxies
● Specify failure recovery features such as timeouts, retries, and circuit
breakers
● Maintains a canonical model of all the services in the mesh
27. DoIT International confidential │ Do not distribute
Istio’s Architecture - Pilot
● Collecting and validating configuration and propagating it to the various Istio
components
● Abstracts environment-specific implementation details from Mixer and
Envoy
● Traffic management rules (i.e. generic layer-4 rules and layer-7 HTTP/gRPC
routing rules) can be programmed at runtime via Pilot
29. DoIT International confidential │ Do not distribute
Istio’s Architecture - Istio-Auth
● Provides strong service-to-service and end-user authentication using mutual
TLS
● Can be used to upgrade unencrypted traffic in the service mesh
● Provides the ability to enforce policy based on service identity rather than
network controls
● Future releases of Istio will add:
○ Fine-grained access control
○ Auditing to control and monitor
30. DoIT International confidential │ Do not distribute
Benefits of Istio
Fleet-wide Visibility:
● Produces detailed monitoring data about application and network behaviors
● Rendered using Prometheus & Grafana
● Can be easily extended to send metrics and logs to any collection,
aggregation and querying system
● Enables analysis of performance hotspots and diagnosis of distributed
failure modes with Zipkin tracing
31. DoIT International confidential │ Do not distribute
Benefits of Istio
Resiliency and efficiency:
● Operators need to assume that the network will be unreliable
● Operators can use retries, load balancing, flow-control (HTTP/2), and circuit-
breaking to compensate
● Istio provides a uniform approach to configuring these features, making it
easier to operate a highly resilient service mesh
32. DoIT International confidential │ Do not distribute
Benefits of Istio
Developer productivity:
● Developer can focus on building service features in their language of choice,
while Istio handles resiliency and networking challenges
● Developers are freed from having to bake solutions to distributed systems
problems into their code
● Improves productivity by providing common functionality supporting A/B
testing, canarying, and fault injection
33. DoIT International confidential │ Do not distribute
Benefits of Istio
Policy Driven Ops:
● Decouples cluster operators from the feature development cycle
● Allowing improvements to security, monitoring, scaling, and service topology
to be rolled out without code changes
● Operators can route a precise subset of production traffic to qualify a new
service release
34. DoIT International confidential │ Do not distribute
Benefits of Istio
Policy Driven Ops:
● Can inject failures or delays into traffic to test the resilience of the service
mesh
● Set up rate limits to prevent services from being overloaded
● Can be used to enforce compliance rules, defining ACLs between services
35. DoIT International confidential │ Do not distribute
Benefits of Istio
Secure by default:
It is a common fallacy of distributed computing that the network is secure
● Enables operators to authenticate and secure all communication between
services using a mutual TLS connection
● Aligned with the emerging SPIFFE specification
● Based on similar systems that have been tested extensively inside Google
36. DoIT International confidential │ Do not distribute
Benefits of Istio
Incremental Adoption:
● Designed to be completely transparent to the services running in the mesh
● Allowing teams to incrementally adopt features of Istio over time
37. DoIT International confidential │ Do not distribute
Key Concepts in Istio - Traffic Management
Each Envoy instance maintains:
● Load balancing information based on the information it gets from Pilot
● Periodic health-checks of other instances
39. DoIT International confidential │ Do not distribute
Key Concepts in Istio - Traffic Management
Communication between services:
● Clients of a service have no knowledge of
different versions of the service
● Envoy determines its actual choice of
service version dynamically based on the
routing rules specified by the operator
using Pilot
40. DoIT International confidential │ Do not distribute
Key Concepts in Istio - Traffic Management
Ingress and Egress Envoys:
● Istio assumes that all traffic entering and leaving the service mesh transits
through Envoy proxies.
● For user-facing services operators can:
○ conduct A/B testing
○ Deploy canary services
○ Etc...
● By routing traffic to external web services via Envoy, operators can add
failure recovery features such as circuit breakers, impose rate limits via
Mixer, and provide authentication using Istio-Auth
42. DoIT International confidential │ Do not distribute
Key Concepts in Istio - Traffic Management
Handling Failures:
Provides a set of out-of-the-box opt-in failure recovery features
Features include:
● Timeouts
● Retries with timeout budgets and variable jitter between retries
● Limits on number of concurrent connections and requests to upstream services
● Active health checks on each member of the load balancing pool
● Fine-grained circuit breakers (passive health checks) – applied per instance in the load balancing
pool
43. DoIT International confidential │ Do not distribute
Key Concepts in Istio - Traffic Management
Fault Injection:
● Protocol-specific fault injection into the network
● Faults can be injected into requests that match specific criteria
● Restriction of the % of requests that should be subjected to faults
● Two types of faults can be injected:
○ Delays: Timing failures, mimicking increased network latency, or an overloaded upstream
service
○ Aborts: Crash failures that mimic failures in upstream services. Usually manifest in the form
of HTTP error codes, or TCP connection failures.
44. DoIT International confidential │ Do not distribute
Key Concepts in Istio - Traffic Management
Rules Configuration:
● Simple Domain-specific language (DSL)
● The DSL allows the operator to configure service-level properties such as:
● Routes
● Circuit breakers
● Timeouts
● Retries
● Injecting faults in the request path
● Set up common continuous deployment tasks such as:
○ Canary rollouts
○ A/B testing
○ Staged rollouts with %-based traffic splits
○ Etc.
45. DoIT International confidential │ Do not distributeDoIT International confidential │ Do not distribute
Demo Time!
Vadim Solovey
48. DoIT International confidential │ Do not distributeDoIT International confidential │ Do not distribute
Thank you!
Vadim Solovey Yoram Ben
Yaacov
49. DoIT International confidential │ Do not distribute
Istio vs Linkerd
Istio linkerd
Maturity < 1y >2y
Deployment Transparent proxy sidecar Standalone RPC routing proxy
Programming languages C++ Scala/JVM
Memory and CPU Low Significantly higher than Envoy’s
Configuration language Extensive Minimalist
Host-to-Host authentication using Kubernetes Supported Not supported
API-driven routing Supported Not supported
Hot reloads Supported Not supported explicitly
External registries (E.g. Consul) Not supported Supported
Tracing sprinkles Not supported Supported
50. DoIT International confidential │ Do not distributeDoIT International confidential │ Do not distribute
Thank you (again…)!
Vadim Solovey Yoram Ben
Yaacov
Notas do Editor
Flannel allows inter-pod communication between different hosts by providing an overlay software-defined network (SDN). This solves the main issue we had the Docker networking model. As I said before, when using Docker, each container has an IP address that allows it to communicate with other containers on the same host. When pods are placed in different hosts, they rely on their host IP address. Therefore, communication between them is possible by port-mapping. This is fine at a container-level, but applications running inside these containers can have a hard time if they need to advertise their external IP and port to everyone else.
Flannel helps by giving each host a different IP subnet range. The Docker daemon will then assign IPs from this range to containers. Then containers can talk to each user using these unique IP addresses by means of packet encapsulation. Imagine that you have two containers, Container A and Container B. Container A is placed on Host Machine A, and Container B is placed on Host Machine B. When Container A wants to talk to Container B, it will use container B’s IP address as the destination address of his packet. This packet will then be encapsulated with an outer UDP packet between Host Machine A and Host Machine B, which will be sent by Host Machine A, and that will have Host Machine B’s IP address as the destination address. Once the packet arrives to Host Machine B, the encapsulation is removed and the packet is routed to the container using the inner IP address. The flannel configuration regarding the container/Host Machine mapping is stored in etcd. The routing is done by a flannel daemon called flanneld.
Managing this runtime behavior is important because, while we have tools like Docker and Kubernetes to manage deployment and execution of service code, that’s not enough to make applications resilient and manageable. The way that microservices interact with each other at runtime—how traffic load flows through the system—needs to be monitored, managed, and controlled.
Managing this runtime behavior is important because, while we have tools like Docker and Kubernetes to manage deployment and execution of service code, that’s not enough to make applications resilient and manageable. The way that microservices interact with each other at runtime—how traffic load flows through the system—needs to be monitored, managed, and controlled.
Istio is described as:
“an open platform to connect, manage, and secure microservices. Istio provides an easy way to create a network of deployed services with load balancing, service-to-service authentication, monitoring, and more, without requiring any changes in service code. You add Istio support to services by deploying a special sidecar proxy throughout your environment that intercepts all network communication between microservices, configured and managed using Istio’s control plane functionality”
Istio is described as:
“an open platform to connect, manage, and secure microservices. Istio provides an easy way to create a network of deployed services with load balancing, service-to-service authentication, monitoring, and more, without requiring any changes in service code. You add Istio support to services by deploying a special sidecar proxy throughout your environment that intercepts all network communication between microservices, configured and managed using Istio’s control plane functionality”
Mixer is a highly modular and extensible component. One of it’s key functions is to abstract away the details of different policy and telemetry backend systems, allowing Envoy and Istio-based services to be agnostic of those backends, which keeps them portable.
Mixer’s flexibility in dealing with different infrastructure backends is achieved by having a general-purpose plug-in model. Individual plug-ins are known as adapters and they allow Mixer to interface to different infrastructure backends that deliver core functionality, such as logging, monitoring, quotas, ACL checking, and more. Adapters enable Mixer to expose a single consistent API, independent of the backends in use. The exact set of adapters used at runtime is determined through configuration and can easily be extended to target new or custom infrastructure backends.
Traffic Management:
The core component used for traffic management in Istio is Pilot, which manages and configures all the Envoy proxy instances deployed in a particular Istio service mesh. It lets you specify what rules you want to use to route traffic between Envoy proxies and configure failure recovery features such as timeouts, retries, and circuit breakers. It also maintains a canonical model of all the services in the mesh and uses this to let Envoys know about the other instances in the mesh via its discovery service.
Each Envoy instance maintains load balancing information based on the information it gets from Pilot and periodic health-checks of other instances in its load-balancing pool, allowing it to intelligently distribute traffic between destination instances while following its specified routing rules.
Pilot is responsible for collecting and validating configuration and propagating it to the various Istio components. It abstracts environment-specific implementation details from Mixer and Envoy, providing them with an abstract representation of the user’s services that is independent of the underlying platform. In addition, traffic management rules (i.e. generic layer-4 rules and layer-7 HTTP/gRPC routing rules) can be programmed at runtime via Pilot.
Failures happen, and operators need tools to stay on top of the health of clusters and their graphs of microservices.
Traffic Management:
The core component used for traffic management in Istio is Pilot, which manages and configures all the Envoy proxy instances deployed in a particular Istio service mesh. It lets you specify what rules you want to use to route traffic between Envoy proxies and configure failure recovery features such as timeouts, retries, and circuit breakers. It also maintains a canonical model of all the services in the mesh and uses this to let Envoys know about the other instances in the mesh via its discovery service.
Each Envoy instance maintains load balancing information based on the information it gets from Pilot and periodic health-checks of other instances in its load-balancing pool, allowing it to intelligently distribute traffic between destination instances while following its specified routing rules.
Using Istio’s traffic management model essentially decouples traffic flow and infrastructure scaling, letting operators specify via Pilot what rules they want traffic to follow rather than which specific pods/VMs should receive traffic - Pilot and intelligent Envoy proxies look after the rest. So, for example, you can specify via Pilot that you want 5% of traffic for a particular service to go to a canary version irrespective of the size of the canary deployment, or send traffic to a particular version depending on the content of the request.
While Envoy sidecar/proxy provides a host of failure recovery mechanisms to services running on Istio, it is still imperative to test the end-to-end failure recovery capability of the application as a whole. Misconfigured failure recovery policies (e.g., incompatible/restrictive timeouts across service calls) could result in continued unavailability of critical services in the application, resulting in poor user experience.
Istio enables protocol-specific fault injection into the network, instead of killing pods, delaying or corrupting packets at TCP layer.Failures observed by the application layer are the same regardless of network level failures, and that more meaningful failures can be injected at the application layer (e.g., HTTP error codes) to exercise the resilience of an application.
Operators can configure faults to be injected into requests that match specific criteria. Operators can further restrict the percentage of requests that should be subjected to faults. Two types of faults can be injected: delays and aborts. Delays are timing failures, mimicking increased network latency, or an overloaded upstream service. Aborts are crash failures that mimic failures in upstream services. Aborts usually manifest in the form of HTTP error codes, or TCP connection failures.