Bringing down an application is easy. All it takes is the failure of a single service and the entire set of services that make up the application can come crashing down like a house of cards. Just one minor error from a non-critical service can be disastrous to the entire application. There are, of course, many ways to prevent dependent services from failing. However, adding extra resiliency in non-critical services also adds complexity and cost, and sometimes it is not needed.
Application availability is best served by focusing your energies and processes on your most critical systems while working to minimize the impact of non-critical systems. Service Tiers are a way to accomplish this.
In this talk, we will learn what service tiers are and how they can be applied to service based applications. Then we will show how to utilize service tiers to keep your application available and functioning as designed. We will use example service definitions to illustrate how service tiers can help you keep your application working.
42. ProblemSeverity
Service Tier of service with problem
Low Severity
High Severity
Tier 1 Tier 2 Tier 3 Tier 4
Noncritical
Perhaps next day
followup acceptable
Critical
Immediate response
needed
Service Tier
Helps Determine Required Responsiveness
Issue #1
Issue #2
Hello,
We all want a modern application.
What is a modern application?
A modern application is an application that
Responsive to customer needs
Scale dynamically (cost optimization)
High availability
This is what it takes to make a modern application. We must use modern tools and techniques, processes and systems to keep them running and modern.
Dynamic Infrastructure, Flexible Architecture, DevOps Culture, Solid Instrumentation.
{c}For this presentation, we are going to focus on Flexible Architecture.
Modern applications are built of services. Services building on capabilities of other services. Interconnected services. All working together toward a common goal.
But what happens if the goal is not met? What happens if a service, deep in the bowels of the application, stops performing?
{c}Often, that means a cascade of problems. Service F fails Service D{c}Service D fails Service C.
{c}But, what about Service A? What if Service D is really not critical to the running of Service A? Can Service A operate even if Service D is down? Can we stop the cascade from occurring?
Inter-Service failure is a common cause of availability issues in modern applications. Resolving them can be difficult. The easiest way is to add resiliency everywhere. But that’s expensive, complex, inefficient.
{c}Another option is to add resiliency where needed, but ignore it where it isn’t needed. But mistakes here can be disastrous.
{c}Do we waste time and money…and opportunities…or make risky decisions?
This is where Service Tiers come to play.
Service Tiers are simply labels applied to each service.
{c}The label sorts the services from most critical…{c}to least critical.
They specify how critical a service is to the running of your application.
Let’s describe this better by giving some defiitions. Let’s define what each service tier label fits. First, let’s look at the most critical type of services. Those are Tier 1 services.
A Tier 1 Service…
Here are some example tier 1 services.
These are the most critical services to your application. If any one of these services is down, it is a major impact to you, your customers, and your company.
Tier 2 is the next down the severity list. These are still critical services, but less critical than Tier 1.
The definition of Tier 2 Service that we use is this…
Here are some example services…
Now we are into the rhelm of less critical services. Tier 3 services are less critical, but still important.
Look closely at this definition…
Minimal or difficult to notice impact…
Not zero impact, not devastating impact…but impact…
What are some examples?
Recommendations service – if recommendations aren’t visible, certainly customers may notice, but won’t dramatically change their experience.
Now let’s look at non-critical services. These are labeled Tier 4.
A Tier 4 service….
No or minor impact…
Example Tier 4 services…
This is where Service Tiers come to play.
Service Tiers are simply labels applied to each service.
{c}The label sorts the services from most critical…{c}to least critical.
They specify how critical a service is to the running of your application.
Ok, let’s take a look at this in practice. Let’s look at a real life use case and an example application architecture that goes along with it.
The example we are going to show is an online T-Shirt store. {c} At the start, we have several services that manage the front end customer experience. This includes the website, checkout, and order viewing capabilities. {c}Next, we need a catalog database to store all of our inventory.{c}Then, we need a series of services that work with the catalog in order to support the customer front end. This includes a catalog viewing service, search, and price calculator.{c}Next, the back end merchandisers need to be able to add and edit catalog entries, and update prices in the catalog. So far so good. {c}We also need a service to manage orders. {c}Once we have orders, we need to be able to fulfill them, so we need services for processing and shipping orders, and support the backend fulfillment center agents.
{c}Lastly, management will need some sort of report regularly to show our order status. This all is obviously very simplified.
But, now that we have this service architecture setup, let’s assign Service Tiers to each service in the application.
Here again is the full service architecture for the store.
Now, some of our services are used to allow customers to see and shop for products. This includes the webstie frontend, catalog view, and catalog database services. Since the customer can’t do anything with our site without these services, our company can’t operate without them. This is, by definition, Tier 1 capability.
Back on the service diagram. {c}Let’s mark those services as Tier 1 services. We’ll use the red 1 to indicate this.
Moving on, our customers also want the ability to search our catalog, and we have a catalog search service to facilitate this. This service is ***useful*** for shopping, but isn’t absolutely ***necessary***. You *can* shop without searching, but it’s not as useful of a shopping experience. Useful for customers but not absolutely necessary, makes this a Tier 2 service.
On the service diagram, {c}let’s mark that with a blue “2”.
Now, our merchandisers within our company need to be able to add and update product offering, and introduce sales and such. This is important capabilities, but if they are not available for a short period of time, it is not immediately impactful to our customers. We need these fixed in order for our business to be successful long term, but if they are down for a short period of time, they won’t have a significant impact on our customers nor our business.
This is Tier 3.
On the service diagram, {c}we’ll use a light blue “3”.
Now, besides browsing for product, our customers want to buy things. We want our customers to buy things. In fact, if they can’t buy anything from us, they’ll go somewhere lese where they can buy things. As such, the company cannot operate at all if order processing doesn’t function. This makes these order processing services Tier 1 services.
On the architecture diagram, {c}we’ll add these as Tier 1 services.
Now, our fulfillment center agents need to be able to process and ship customer orders. This obviously is important, because our customers will be mad if we charge them for product but don’t ship it to them. However, some delays in processing an order…perhaps an hour or two…really won’t have much if any impact on our customers. So, these can be Tier 3 services. Some delays won’t have any impact, or minimal impact at best.
So on the architecture diagram, {c}we’ll mark these as Tier 3 services.
Now, customers often want to view the status of their orders to see if they’ve shipped yet or not or find out when they will be delivered. Monitoring their orders is important to our customers, but if they can’t do it for a short period of time, it’s not critical. We won’t lose money or lose customers if they can’t see the status of an order for a short period of time. But, it is something that might annoy customers, so this makes it a Tier 2. It’s not Tier 1 because it’s not essential. But it’s also not Tier 3, because it is visible and potentially annoying to customers.
On the architecture diagram, {c}we’ll add that service as a Tier 2 service. As you can see, we are almost done now.
Last, but not least, our management needs to generate reports in order to see how our business is performing. These are not visible to our customers, and if the service that generates the report is down for a period of time, no money is lost, no orders are lost, and no customers are impacted. This is the definition of a Tier 4 service.
So, on the architecture diagram, {c}we’ll mark that as Tier 4.
That’s it now. All services have Service Tier labels assigned to them. We have Tier 1, 2, 3, and Tier 4 services throughout our store application. Some services are highly critical, some less so. But all are labeled.
Now that the tiers are labeled, how can we use those labels?
Service Tiers are useful to help with: 1) Responsiveness, 2) Dependency management, and 3) Managing expectations. They can be used for all three of these purposes. {c}Let’s first look at responsiveness.
When an issue with a service occurs…such as an outage or performance degregation, we’re expected to respond to that. But each service has a different responsiveness need. Your need to respond to the service quickly depends on {c}1) the severity of the issue and {c}2) The tier of service the issue is labeled. How deos this work?
Let’s look at a chart. The vertical axis of the chart is the problem severity, from low severity on the bottom to high severity at the top.{c}The horizontal axis is the Service tier of the problem. More important services to the left. {c}When problems occur, they can be shown on this chart depending on their severity and the tier assigned to the service having the problem.
{c}On the top-left, are the most critical issues that depend the most immediate response. {c}On the lower-right, are the noncritical issues that perhaps can even wait or another day before they are addressed, or longer.{c}The further up and to the left an issue is, the greater the speed of response needed to handle the issue. A severe Tier 4 problem is not as important as a less sever Tier 1 problem. This is the first value of Service Tiers.
{c}Service Tiers can help you prioritize your planned responsiveness to problems before they occur.
The next use of service tiers, {c}is in dependency management.
When one service talks to another, the criticalness of the dependency depends on the service tier of ***both*** of the services.
{c}How you handle dependencies depends on comparing the service tiers of the two services. Let’s look at an example.
When the service making the call, has a ***HIGHER SERVICE TIER*** then the dependency it is calling. This creates a critical dependency.{c}The critical dependency means the service making the call needs to handle failures of the dependency gracefully. It would be unacceptable for a dependency failure to cause the service calling it to fail. As such, the calling service should go out of its way to protect itself from failures of the dependency service as much as possible.
But when the service making the call, as a ***LOWER SERVICE TIER*** then the dependency it is calling, this is a non-critical dependency. {c} It’s generally acceptable for the service to ignore failures of the dependency. Even if that means the service fails if the dependency fails, that’s usually acceptable. Since the dependency is more critical than the calling service, the calling service can usually depend on the fact that appropriate effort will be put into fixing the dependency in order for the service to meet the needs of its users.
We’ll look at an example of this later.
In general, you can map the tier level of your service, along with the tier level of the service you are depending on, as shown here.
{c}The further you are up and to the left, the more critical the dependency. A Tier 1 service depending on a Tier 4 service is a critical dependency. The Tier 1 service better be able to respond to failures of the Tier 4 service, because the Tier 4 service is generally going to be considered less critical to keep operational.
{c}The lower right are non critical dependencies. A Tier 4 service can generally assume a Tier 1 service it depends on will always be operational. If it is not, a high level of effort will be put into resolving the Tier 1 service, and so the problems of the Tier 4 service won’t be important. It can “go with the flow” and be acceptable.
The third use of Service Tiers, {c}is in expectations management.
In general, {c}critical dependencies, {c}have higher expectations. What do we mean by higher expectations? {c}We mean things like having well defined SLAs, and have performance and availability metrics that are monitored and meet the needs of us and our customers.
{c}More generally, the higher a service’s Service Tier, the greater the expectation on the performance of the service by its dependencies. Faster response, higher availability, more responsive support, etc.
Service Tiers are useful to help with: 1) Responsiveness, 2) Dependency management, and 3) Managing expectations. They can be used for all three of these purposes. {c}Let’s first look at responsiveness.
Let’s look back at our T-Shirt store, with all the Service Tier labels applied.
{c}Here we have a critical dependency. A Tier 1 service is calling a Tier 2 (lower) service. {c}The website frontend needs to handle dependency failures of the search service gracefully. The website needs to stay operational, even if search is currently unavailable. You can’t bring down the entire website only because the search capability is down.
{c}Here we have a noncritical dependency. A Tier 3 service is calling a Tier 1 (higher) service. {c}The view order service can generally ignore ailures o the order management service. This is because if the order management service is down, it probably doesn’t matter that the order viewing service is also down…it wouldn’t be able to be useful anyway. Effort should be put into fixing the higher tier service…the order management service…first.
This all sounds basic and simple. And, really, it is. There is nothing major here…just basic common sense. But it’s surprising how often these things can be ignored. Let me tell you a real life example of this to demonstrate. This really happened.
A web based application, I won’t say which website, but a web app was operational, and when you were logged in, it showed an ***avatar*** of the user in the upper right hand corner. This is pretty common and all of you have seen this, I’m sure.
But on this site, there was a problem. There was a service that figured out the avatar to display, and that service went down. The service that displayed the main webapp page called this avatar service to figure out what to show. Since the avatar service was down,
{c}the whole webapplication went down.
The entire application failed, simply because we were unable to display the icon.
What would have been a better customer experience would be if the web application continued to function, but simply didn’t show the avatar if that service failed.
If Service Tiers had been used…{c}the web application would have clearly been made up of Tier 1 and Tier 2 services.
{c}and the avatar service would have been a Tier 4 service. This would have highlighted the critical dependency, and appropriate action could have been made in the Tier 1 & 2 services to handle the Tier 4 service failures.
This was a real example that really occurred, but Service Tiers could help identify problems like this ahead of time.
That’s it. Service tiers can be used to identify, acknowledge, and handle situations in service dependencies, and set expectations on the performance of a given service.
Bottom line: They are a simple, basic, but critical component in the contract between service owners.