A discussion of misconceptions, problems, and industry trends that hinder adoption of cloud technology; with an emphasis on scenarios that appear to work but fail at critical moments.
Be sure to read the notes!
4. Resiliency failures
• When a non-critical dependency failed, it
brought down the main service (lack of circuit
breaker)
• Running a critical workload on a single VM
6. The physics has changed
• The architectures and methods
that used to bring success don’t
always work in the cloud.
• Even worse, they will appear to
work until some critical event.
8. Fallacies of Cloud Computing
1. Everything is reliable.
2. Latency is zero.
3. Bandwidth is infinite.
4. Security is inherited
5. Topology doesn't matter.
9. The Myths of Lift & Shift
If you just migrate your workload to the cloud…
• It will perform better
• It will scale out easily
• It will be more reliable
• It will cost less
19. “My First Law of Distributed
Object Design: Don't
distribute your objects ”
- Martin Fowler
20. final
thoughts
• Don’t assume that you can do what you’ve done
before.
• Invest in learning; innovation is happening
quickly.
• Look for way to manage and reduce complexity.
• There is help:
https://docs.microsoft.com/azure/guidance/
patterns & practices is part of the Azure Customer Advisory Team aka AzureCAT.
AzureCAT engages directly with customers in order to better understand how they are really using the platform.
Our team is in a a position where we see dozens of cloud-based solutions fail.
We are seeing many people struggle with cloud fundamentals.
They are not dumb.
They are smart and experienced professionals.
What’s especially troublesome is that no problem was apparent until something critical happened.
This means that we are building solutions in the cloud that seem to work, but fail at critical moments.
In our experience these “critical moments” come in two types:
The need to scale quickly
Recovering from a fault
Source: https://www.flickr.com/photos/proimos/4199675334/
These are problems that load testing would have uncovered.
Source: https://www.flickr.com/photos/zeze57/5018254780/
These are harder to detect because they are not tied to system usage (i.e., testing from the perspective of a client won’t reveal the problem)
Source: https://www.flickr.com/photos/menschmaschine/17254386256/
It’s like we’re trying to fly a jet to the moon.
Everything appears to be going well, but once we reach a certain limit we run out of air.
Source: https://www.flickr.com/photos/nathaninsandiego/5373028008/
Performance test is critical.
Keep in mind that performance testing of a high-scale solution is about traffic flow and not necessarily the individual processes.
Image Source: https://flic.kr/p/sdMBLq
In the 1990s, “Fallacies of Distributed Computing” originated with L Peter Deutsch et al from Sun Microsystems. This list of fallacies is inspired by, and for points exactly copies, the original.
The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn't change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.
Source: https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing
Image Source: https://www.flickr.com/photos/sea-turtle/3049443478/
Naively migrating an on-premise workload, that wasn’t designed for the realities of the cloud, will lead to failure.
I’m talking about the idea of making minimal changes to migrate to the cloud.
It’s likely to perform worse.Because of higher latency, lower bandwidth, and transient failures
Both scaling out and resiliency require design.Scaling out needs careful planning around shared state, data partitioning, et alResiliency needs mitigations for transient failures.
Comparing costs is difficult. First, there is the CapEx/OpEx tradeoff. Long term savings is definitely a possibility, but you have to address the other problems first.
L&S still has a role; it is usually a step to the next thing.
In addition, you need to understand the realities of L&S.
Image Source: https://www.flickr.com/photos/giopuo/345913721/
It’s important to be aware of the “components” that make up a cloud computing infrastructure.
Public cloud vendors (Azure, AWS) describe their services in terms of Compute, Storage, and Networking.
(There are other categories, but these are the primary ones.)
Each category of services presents Solution Architects with choices.
Each choice comes with it’s own pros and cons.
In surveys, customers are not thinking in these terms.
However, to design a solution to take advantage of the cloud and to avoid some of the failures we’ve been discussing, understanding these categories is important.
For the last couple of decades, for many applications, we didn’t have to think about the underlying infrastructure.
Hardware was cheap.
This was okay when scope was bounded.
More change is on the way.
The trend is towards less management of, and even awareness of, the infrastructure.
Right now, IaaS is dominant. However, everyone wants to get PaaS.
PaaS itself is spectrum. In Azure, we have Cloud Services, AppService, and Service Fabric that all offer different trade-offs.
In addition, PaaS platforms like CloudFoundry are increasingly popular.
The gap will widen between “what we have now” and “where we want to go”
There’s some urgency to stay competitive
I’m personally speculating that there may be a “leap frog” moment coming. Analogous to the way some developing countries skipped building out telephone networks and jumped directly to cellular networks.
It’s not just about learning the new technologies.
There are new ways of thinking about solution development.
“Microservices” is a great example.
It’s hard for people to explain because it is a philosophy of application design.
It’s often described as an “architectural approach” or style.
Likewise, “DevOps” is really about
All of these new ideas are just the evolution of “how do we do more, faster”?
Another type of problem is the “knowledge doesn’t transfer”.
Source: https://www.flickr.com/photos/30996111@N05/4335659000/
The learning curve is steep.
There are lots of pitfalls (anti-patterns).
It requires a new way of thinking.
The #1 challenge for survey respondents is a lack of resources and expertise.
And the problem seems to be getting worse.
Source: http://www.rightscale.com/blog/cloud-industry-insights/cloud-computing-trends-2016-state-cloud-survey
Published in August 2003; that’s 13 years ago. It’s a been a popular approach to software design every since.
The important point is that it’s about “tackling complexity”.
The implication is that software is inherently complicated.
It’s become more complex since in the last 13 years.
Considering everything that we’ve discussed so far,
The need for high scale systems
The need for for high availability
We can predict that things are getting even more complicated.
What’s is happening to address this?
The DevOps movement is continuing to grow.
There’s a lot to be said about DevOps; I’m not going to go into details here,
But automation is a key aspect.
As system grow more complex, they become more difficult to deploy and to maintain.
The philosophy underlying DevOps is really about reducing that complexity.
(Though admittedly it introduces a new type of complexity.)
In 2015, Chef, Puppet, & Docker are the top 3 DevOps tools
Docker is the fastest growing DevOps tool, with adoption more than doubling year-over-year from 13 percent in 2015 to 27 percent in 2016.
Docker is a way to package applications; reducing complexity through consistency.
Puppet and Chef are configuration management tools; reducing complexity through automation
Source: http://www.rightscale.com/blog/cloud-industry-insights/new-devops-trends-2016-state-cloud-survey
Another recent but significant trend is “microservices”.
Companies like Netflix, Uber, Amazon, and Spotify have claimed a lot of success with microservices.
According to an IDG survey from March 2016,
½ have plans to move to microservices
Over ¼ plan to do so in less than a year
I would personally suggest that only
1/10 actually know what microservices are…
Much like DevOps, microservices is another way of managing complexity.
Each team only needs to understand it’s own service.
Source: http://www.idgconnect.com/view_abstract/34891/reach-clouds-enhanced-application-service-innovation-needs-flexible-dynamic-cloud-architecture-support
Look for opportunities to simplify
Pay down technical debt
Design deliberately; chose simplicity.
When the solution starts getting complicated, take a step back and make sure you’re solving the right problem.
Of course, things will still be complex. My point is that we need to deliberately fight the complexity.
Source: http://martinfowler.com/bliki/FirstLaw.html
Image Source: https://flic.kr/p/6yWYbN
There are source of data that were referenced in this presentation.
When choosing where to host your application in Azure, you need to know which questions to ask.
Do I need to migrate quickly?
Do I need to minimize changes to an existing solution?
What is my operations team familiar with?
Do I care about portability? Between clouds (public and private)?
Do I need to learn a new programming model?
How much will my workload cost if hosted on X?
How is scaling supported on option X?
Are my workloads CPU or I/O bound?
Are my workloads short-lived or long-lived?
Do I care about density or isolation?
How do they relate to your data needs?
Image source: https://www.flickr.com/photos/derekbruff/5583561290/
Many of the compute questions are also appropriate for storage options.
Understand the features and trade-offs of different categories of storage
Relational, KeyValue, Document, Column-oriented, Graph
How do the different categories support scaling and partitioning?
What are the consistency guarantees?
What are the reliability guarantees?
Which is more important?
Managed services are easier, but you might need to go IaaS for nuanced control
Don’t reinvent wheels (e.g., use Azure Search for indexing)
How is the data accessed?
Is it mostly read or mostly write?
Are reads sequential or random access?
Do you know the structure of your data ahead of time?
Is the schema likely to change?
Don’t be afraid to mix and match different storage solutions. Use the best fit for the problem.
Image source: https://www.flickr.com/photos/derekbruff/5583561290/