SlideShare uma empresa Scribd logo
1 de 41
Baixar para ler offline
https://docs.microsoft.com/ja-jp/azure/architecture/checklist/availability
https://docs.microsoft.com/ja-jp/azure/architecture/
• Availability 今回
単一障害点をなくそう
All components, services, resources, and compute instances should be deployed as multiple
instances to prevent a single point of failure from affecting availability. This includes
authentication mechanisms. Design the application to be configurable to use multiple instances,
and to automatically detect failures and redirect requests to non-failed instances where the
platform does not do this automatically.
サービスレベルの異なるワークロードは分離しよう
If a service is composed of critical and less-critical workloads, manage them differently and specify
the service features and number of instances to meet their availability requirements.
依存関係を理解し、最小化しよう
Minimize the number of different services used where possible, and ensure you understand all of
the feature and service dependencies that exist in the system. This includes the nature of these
dependencies, and the impact of failure or reduced performance in each one on the overall
application. Microsoft guarantees at least 99.9 percent availability for most services, but this
means that every additional service an application relies on potentially reduces the overall
availability SLA of your system by 0.1 percent.
タスクとメッセージはべき等(安全に繰り返せるよう)にしよう
so that duplicated requests will not cause problems. For example, a service can act as a consumer
that handles messages sent as requests by other parts of the system that act as producers. If the
consumer fails after processing the message, but before acknowledging that it has been
processed, a producer might submit a repeat request which could be handled by another instance
of the consumer. For this reason, consumers and the operations they carry out should be
idempotent so that repeating a previously executed operation does not render the results invalid.
This may mean detecting duplicated messages, or ensuring consistency by using an optimistic
approach to handling conflicts.
メッセージブローカーでクリティカルなトランザクションの可用性を上げよう
Many scenarios for initiating tasks or accessing remote services use messaging to pass
instructions between the application and the target service. For best performance, the application
should be able to send the message and then return to process more requests, without needing
to wait for a reply. To guarantee delivery of messages, the messaging system should provide high
availability. Azure Service Bus message queues implement at least once semantics. This means that
each message posted to a queue will not be lost, although duplicate copies may be delivered
under certain circumstances. If message processing is idempotent (see the previous item),
repeated delivery should not be a problem.
機能的縮退を考慮しよう
when reaching resource limits, and take appropriate action to minimize the impact for the user. In
some cases, the load on the application may exceed the capacity of one or more parts, causing
reduced availability and failed connections. Scaling can help to alleviate this, but it may reach a
limit imposed by other factors, such as resource availability or cost. Design the application so that,
in this situation, it can automatically degrade gracefully. For example, in an ecommerce system, if
the order-processing subsystem is under strain (or has even failed completely), it can be
temporarily disabled while allowing other functionality (such as browsing the product catalog) to
continue. It might be appropriate to postpone requests to a failing subsystem, for example still
enabling customers to submit orders but saving them for later processing, when the orders
subsystem is available again.
突発的なイベント増に対処しよう
Most applications need to handle varying workloads over time, such as peaks first thing in the
morning in a business application or when a new product is released in an ecommerce site. Auto-
scaling can help to handle the load, but it may take some time for additional instances to come
online and handle requests. Prevent sudden and unexpected bursts of activity from overwhelming
the application: design it to queue requests to the services it uses and degrade gracefully when
queues are near to full capacity. Ensure that there is sufficient performance and capacity available
under non-burst conditions to drain the queues and handle outstanding requests. For more
information, see the Queue-Based Load Leveling Pattern.
各サービスは複数のインスタンスにデプロイしよう
Microsoft makes availability guarantees for services that you create and deploy, but these
guarantees are only valid if you deploy at least two instances of each role in the service. This
enables one role to be unavailable while the other remains active. This is especially important if
you need to deploy updates to a live system without interrupting clients' activities; instances can
be taken down and upgraded individually while the others continue online.
アプリを複数のデータセンターに配置しよう
Although extremely unlikely, it is possible for an entire datacenter to go offline through an event
such as a natural disaster or Internet failure. Vital business applications should be hosted in more
than one datacenter to provide maximum availability. This can also reduce latency for local users,
and provide additional opportunities for flexibility when updating applications.
デプロイとメンテナンス作業は、自動化、テストできるようにしよう
Distributed applications consist of multiple parts that must work together. Deployment should
therefore be automated, using tested and proven mechanisms such as scripts and deployment
applications. These can update and validate configuration, and automate the deployment process.
Automated techniques should also be used to perform updates of all or parts of applications. It is
vital to test all of these processes fully to ensure that errors do not cause additional downtime. All
deployment tools must have suitable security restrictions to protect the deployed application;
define and enforce deployment policies carefully and minimize the need for human intervention.
ステージング環境を用意し、本番環境と切り換える仕組みにしよう
where these are available. For example, using Azure Cloud Services staging and production
environments allows applications to be switched from one to another instantly through a virtual IP
address swap (VIP Swap). However, if you prefer to stage on-premises, or deploy different versions
of the application concurrently and gradually migrate users, you may not be able to use a VIP
Swap operation.
設定変更で再起動が必要な要素を理解し、対処しよう
the instance when possible. In many cases, the configuration settings for an Azure application or
service can be changed without requiring the role to be restarted. Role expose events that can be
handled to detect configuration changes and apply them to components within the application.
However, some changes to the core platform settings do require a role to be restarted. When
building components and services, maximize availability and minimize downtime by designing
them to accept changes to configuration settings without requiring the application as a whole to
be restarted.
更新ドメインを意識してダウンタイムなしでアップデートしよう
Azure compute units such as web and worker roles are allocated to upgrade domains. Upgrade
domains group role instances together so that, when a rolling update takes place, each role in the
upgrade domain is stopped, updated, and restarted in turn. This minimizes the impact on
application availability. You can specify how many upgrade domains should be created for a
service when the service is deployed.
(大事なことなので何回も言います) 可用性セットを使おう
Placing two or more virtual machines in the same availability set guarantees that these virtual
machines will not be deployed to the same fault domain. To maximize availability, you should
create multiple instances of each critical virtual machine used by your system and place these
instances in the same availability set. If you are running multiple virtual machines that serve
different purposes, create an availability set for each virtual machine. Add instances of each virtual
machine to each availability set. For example, if you have created separate virtual machines to act
as a web server and a reporting server, create an availability set for the web server and another
availability set for the reporting server. Add instances of the web server virtual machine to the
web server availability set, and add instances of the reporting server virtual machine to the
reporting server availability set.
データを遠隔地に複製しよう
Data in Azure Storage is automatically replicated within in a datacenter. For even higher availability,
use Read-access geo-redundant storage (-RAGRS), which replicates your data to a secondary
region and provides read-only access to the data in the secondary location. The data is durable
even in the case of a complete regional outage or a disaster.
データベースを遠隔地に複製しよう
Azure SQL Database and Cosmos DB both support geo-replication, which enables you to
configure secondary database replicas in other regions. Secondary databases are available for
querying and for failover in the case of a data center outage or the inability to connect to the
primary database. For more information, see Failover groups and active geo-replication (SQL
Database) and How to distribute data globally with Azure Cosmos DB?.
(使えるところでは) 楽観的平行性制御と結果整合性でいこう
where possible. Transactions that block access to resources through locking (pessimistic
concurrency) can cause poor performance and considerably reduce availability. These problems
can become especially acute in distributed systems. In many cases, careful design and techniques
such as partitioning can minimize the chances of conflicting updates occurring. Where data is
replicated, or is read from a separately updated store, the data will only be eventually consistent.
But the advantages usually far outweigh the impact on availability of using transactions to ensure
immediate consistency.
戻すことを意識してバックアップしてますか
and ensure it meets the Recovery Point Objective (RPO). Regularly and automatically back up data
that is not preserved elsewhere, and verify you can reliably restore both the data and the
application itself should a failure occur. Data replication is not a backup feature because errors
and inconsistencies introduced through failure, error, or malicious operations will be replicated
across all stores. The backup process must be secure to protect the data in transit and in storage.
Databases or parts of a data store can usually be recovered to a previous point in time by using
transaction logs. Microsoft Azure provides a backup facility for data stored in Azure SQL Database.
The data is exported to a backup package on Azure blob storage, and can be downloaded to a
secure on-premises location for storage.
RedisはStandard以上がおすすめ
When using Azure Redis Cache, choose the standard option to maintain a secondary copy of the
contents.
タイムアウト設定は戦略的に
Services and resources may become unavailable, causing requests to fail. Ensure that the timeouts
you apply are appropriate for each service or resource as well as the client that is accessing them.
(In some cases, it may be appropriate to allow a longer timeout for a particular instance of a client,
depending on the context and other actions that the client is performing.) Very short timeouts
may cause excessive retry operations for services and resources that have considerable latency.
Very long timeouts can cause blocking if a large number of requests are queued, waiting for a
service or resource to respond.
リトライも戦略的に
Design a retry strategy for access to all services and resources where they do not inherently
support automatic connection retry. Use a strategy that includes an increasing delay between
retries as the number of failures increases, to prevent overloading of the resource and to allow it
to gracefully recover and handle queued requests. Continual retries with very short delays are
likely to exacerbate the problem.
あきらめも重要
when remote services are unavailable. There may be situations in which transient or other faults,
ranging in severity from a partial loss of connectivity to the complete failure of a service, take
much longer than expected to return to normal. Additionally, if a service is very busy, failure in
one part of the system may lead to cascading failures, and result in many operations becoming
blocked while holding onto critical system resources such as memory, threads, and database
connections. Instead of continually retrying an operation that is unlikely to succeed, the
application should quickly accept that the operation has failed, and gracefully handle this failure.
You can use the circuit breaker pattern to reject requests for specific operations for defined
periods. For more information, see Circuit Breaker Pattern.
ダメなら他へつなぐ
to mitigate the impact of a specific service being offline or unavailable. Design applications to take
advantage of multiple instances without affecting operation and existing connections where
possible. Use multiple instances and distribute requests between them, and detect and avoid
sending requests to failed instances, in order to maximize availability.
ダメなら他へ(応用編)
where possible. For example, if writing to SQL Database fails, temporarily store data in blob
storage. Provide a facility to replay the writes in blob storage to SQL Database when the service
becomes available. In some cases, a failed operation may have an alternative action that allows
the application to continue to work even when a component or service fails. If possible, detect
failures and redirect requests to other services that can offer a suitable alternative functionality, or
to back up or reduced functionality instances that can maintain core operations while the primary
service is offline.
起こりやすい障害の対処法はまとめておく
to report the situation to operations staff. For failures that are likely but have not yet occurred,
provide sufficient data to enable operations staff to determine the cause, mitigate the situation,
and ensure that the system remains available. For failures that have already occurred, the
application should return an appropriate error message to the user but attempt to continue
running, albeit with reduced functionality. In all cases, the monitoring system should capture
comprehensive details to enable operations staff to effect a quick recovery, and if necessary, for
designers and developers to modify the system to prevent the situation from arising again.
落ちる前に気づく
The health and performance of an application can degrade over time, without being noticeable
until it fails. Implement probes or check functions that are executed regularly from outside the
application. These checks can be as simple as measuring response time for the application as a
whole, for individual parts of the application, for individual services that the application uses, or
for individual components. Check functions can execute processes to ensure they produce valid
results, measure latency and check availability, and extract information from the system.
いざというとき本当に切り替わりますか
to ensure they are available and operate as expected. Changes to systems and operations may
affect failover and fallback functions, but the impact may not be detected until the main system
fails or becomes overloaded. Test it before it is required to compensate for a live problem at
runtime.
すべては監視システムの信頼の上に
Automated failover and fallback systems, and manual visualization of system health and
performance by using dashboards, all depend on monitoring and instrumentation functioning
correctly. If these elements fail, miss critical information, or report inaccurate data, an operator
might not realize that the system is unhealthy or failing.
実行時間が長いワークフロー全体が落ちるとショックでかい
and retry on failure. Long-running workflows are often composed of multiple steps. Ensure that
each step is independent and can be retried to minimize the chance that the entire workflow will
need to be rolled back, or that multiple compensating transactions need to be executed. Monitor
and manage the progress of long-running workflows by implementing a pattern such
as Scheduler Agent Supervisor Pattern.
広域災害に対する仕組みと訓練
Create an accepted, fully-tested plan for recovery from any type of failure that may affect system
availability. Choose a multi-site disaster recovery architecture for any mission-critical applications.
Identify a specific owner of the disaster recovery plan, including automation and testing. Ensure
the plan is well-documented, and automate the process as much as possible. Establish a backup
strategy for all reference and transactional data, and test the restoration of these backups
regularly. Train operations staff to execute the plan, and perform regular disaster simulations to
validate and improve the plan.
© 2017 Microsoft Corporation. All rights reserved.
本情報の内容(添付文書、リンク先などを含む)は、作成日時点でのものであり、予告なく変更される場合があります。

Mais conteúdo relacionado

Mais procurados

Secure your M365 resources using Azure AD Identity Governance
Secure your M365 resources using Azure AD Identity GovernanceSecure your M365 resources using Azure AD Identity Governance
Secure your M365 resources using Azure AD Identity GovernanceVignesh Ganesan I Microsoft MVP
 
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptx
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptxTeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptx
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptxJasper Oosterveld
 
Azure Storage Services - Part 01
Azure Storage Services - Part 01Azure Storage Services - Part 01
Azure Storage Services - Part 01Neeraj Kumar
 
Google cloud - solution deck
Google cloud - solution deckGoogle cloud - solution deck
Google cloud - solution decksandeep chauhan
 
Azure Monitoring Overview
Azure Monitoring OverviewAzure Monitoring Overview
Azure Monitoring Overviewgjuljo
 
Microsoft Azure Storage Basics
Microsoft Azure Storage BasicsMicrosoft Azure Storage Basics
Microsoft Azure Storage BasicsSai Kishore Naidu
 
Govern your Azure environment through Azure Policy
Govern your Azure environment through Azure PolicyGovern your Azure environment through Azure Policy
Govern your Azure environment through Azure PolicyMicrosoft Tech Community
 
Migrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with ConfidenceMigrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with ConfidenceDavid J Rosenthal
 
Microsoft Azure Overview Class 1
Microsoft Azure Overview Class 1Microsoft Azure Overview Class 1
Microsoft Azure Overview Class 1MH Muhammad Ali
 
SharePoint Overview
SharePoint OverviewSharePoint Overview
SharePoint OverviewAmy Phillips
 
Azure governance v4.0
Azure governance v4.0Azure governance v4.0
Azure governance v4.0Marcos Oikawa
 
Microsoft Azure - Introduction
Microsoft Azure - IntroductionMicrosoft Azure - Introduction
Microsoft Azure - IntroductionPranav Ainavolu
 
Microsoft Azure Networking Basics
Microsoft Azure Networking BasicsMicrosoft Azure Networking Basics
Microsoft Azure Networking BasicsSai Kishore Naidu
 
Azure AD Presentation - @ BITPro - Ajay
Azure AD Presentation - @ BITPro - AjayAzure AD Presentation - @ BITPro - Ajay
Azure AD Presentation - @ BITPro - AjayAnoop Nair
 
DI Amsterdam meetup windows hello core slides 20200319
DI Amsterdam meetup windows hello core slides 20200319DI Amsterdam meetup windows hello core slides 20200319
DI Amsterdam meetup windows hello core slides 20200319Martin Sandren
 
NACS 2022 - Information Barriers and Communication Compliance and Microsoft T...
NACS 2022 - Information Barriers and Communication Compliance and Microsoft T...NACS 2022 - Information Barriers and Communication Compliance and Microsoft T...
NACS 2022 - Information Barriers and Communication Compliance and Microsoft T...Albert Hoitingh
 
Migrating on premises and cloud contents to SharePoint Online at no cost with...
Migrating on premises and cloud contents to SharePoint Online at no cost with...Migrating on premises and cloud contents to SharePoint Online at no cost with...
Migrating on premises and cloud contents to SharePoint Online at no cost with...Juan Carlos Gonzalez
 

Mais procurados (20)

Secure your M365 resources using Azure AD Identity Governance
Secure your M365 resources using Azure AD Identity GovernanceSecure your M365 resources using Azure AD Identity Governance
Secure your M365 resources using Azure AD Identity Governance
 
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptx
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptxTeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptx
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptx
 
Azure Storage Services - Part 01
Azure Storage Services - Part 01Azure Storage Services - Part 01
Azure Storage Services - Part 01
 
Google cloud - solution deck
Google cloud - solution deckGoogle cloud - solution deck
Google cloud - solution deck
 
Azure Monitoring Overview
Azure Monitoring OverviewAzure Monitoring Overview
Azure Monitoring Overview
 
Microsoft Azure Storage Basics
Microsoft Azure Storage BasicsMicrosoft Azure Storage Basics
Microsoft Azure Storage Basics
 
Govern your Azure environment through Azure Policy
Govern your Azure environment through Azure PolicyGovern your Azure environment through Azure Policy
Govern your Azure environment through Azure Policy
 
Migrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with ConfidenceMigrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with Confidence
 
Microsoft Azure Overview Class 1
Microsoft Azure Overview Class 1Microsoft Azure Overview Class 1
Microsoft Azure Overview Class 1
 
Microsoft Zero Trust
Microsoft Zero TrustMicrosoft Zero Trust
Microsoft Zero Trust
 
SharePoint Overview
SharePoint OverviewSharePoint Overview
SharePoint Overview
 
Azure governance v4.0
Azure governance v4.0Azure governance v4.0
Azure governance v4.0
 
Sharepoint Basics
Sharepoint BasicsSharepoint Basics
Sharepoint Basics
 
Microsoft Azure - Introduction
Microsoft Azure - IntroductionMicrosoft Azure - Introduction
Microsoft Azure - Introduction
 
Microsoft Azure Networking Basics
Microsoft Azure Networking BasicsMicrosoft Azure Networking Basics
Microsoft Azure Networking Basics
 
Azure AD Presentation - @ BITPro - Ajay
Azure AD Presentation - @ BITPro - AjayAzure AD Presentation - @ BITPro - Ajay
Azure AD Presentation - @ BITPro - Ajay
 
Azure information protection
Azure information protectionAzure information protection
Azure information protection
 
DI Amsterdam meetup windows hello core slides 20200319
DI Amsterdam meetup windows hello core slides 20200319DI Amsterdam meetup windows hello core slides 20200319
DI Amsterdam meetup windows hello core slides 20200319
 
NACS 2022 - Information Barriers and Communication Compliance and Microsoft T...
NACS 2022 - Information Barriers and Communication Compliance and Microsoft T...NACS 2022 - Information Barriers and Communication Compliance and Microsoft T...
NACS 2022 - Information Barriers and Communication Compliance and Microsoft T...
 
Migrating on premises and cloud contents to SharePoint Online at no cost with...
Migrating on premises and cloud contents to SharePoint Online at no cost with...Migrating on premises and cloud contents to SharePoint Online at no cost with...
Migrating on premises and cloud contents to SharePoint Online at no cost with...
 

Semelhante a Azure Design Review Checklist Availabilityの巻

Design patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsDesign patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsHimanshu Sahu
 
MS Cloud Design Patterns Infographic 2015
MS Cloud Design Patterns Infographic 2015MS Cloud Design Patterns Infographic 2015
MS Cloud Design Patterns Infographic 2015James Tramel
 
Ms cloud design patterns infographic 2015
Ms cloud design patterns infographic 2015Ms cloud design patterns infographic 2015
Ms cloud design patterns infographic 2015Kesavan Munuswamy
 
WebApplicationArchitectureAzure.pptx
WebApplicationArchitectureAzure.pptxWebApplicationArchitectureAzure.pptx
WebApplicationArchitectureAzure.pptxPrashanth Panduranga
 
WebApplicationArchitectureAzure.pdf
WebApplicationArchitectureAzure.pdfWebApplicationArchitectureAzure.pdf
WebApplicationArchitectureAzure.pdfPrashanth Panduranga
 
Cloud testing with synthetic workload generators
Cloud testing with synthetic workload generatorsCloud testing with synthetic workload generators
Cloud testing with synthetic workload generatorsMalathi Malla
 
Jisto_Whitepaper_Recapturing_Stranded_Resources
Jisto_Whitepaper_Recapturing_Stranded_ResourcesJisto_Whitepaper_Recapturing_Stranded_Resources
Jisto_Whitepaper_Recapturing_Stranded_ResourcesKevin Donovan
 
Mule esb intoduction
Mule esb intoductionMule esb intoduction
Mule esb intoductionVamsi Krishna
 
A Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeA Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeYogeshIJTSRD
 
Automatic scaling of web applications for cloud computing services
Automatic scaling of web applications for cloud computing servicesAutomatic scaling of web applications for cloud computing services
Automatic scaling of web applications for cloud computing serviceseSAT Journals
 
10 Best Practices for Reducing Spend in Azure
10 Best Practices for Reducing Spend in Azure10 Best Practices for Reducing Spend in Azure
10 Best Practices for Reducing Spend in AzureVAST
 
Scalable Fault-tolerant microservices
Scalable Fault-tolerant microservicesScalable Fault-tolerant microservices
Scalable Fault-tolerant microservicesMahesh Veerabathiran
 
Microservices approach for Websphere commerce
Microservices approach for Websphere commerceMicroservices approach for Websphere commerce
Microservices approach for Websphere commerceHARIHARAN ANANTHARAMAN
 

Semelhante a Azure Design Review Checklist Availabilityの巻 (20)

Design patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsDesign patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applications
 
MS Cloud Design Patterns Infographic 2015
MS Cloud Design Patterns Infographic 2015MS Cloud Design Patterns Infographic 2015
MS Cloud Design Patterns Infographic 2015
 
Ms cloud design patterns infographic 2015
Ms cloud design patterns infographic 2015Ms cloud design patterns infographic 2015
Ms cloud design patterns infographic 2015
 
saas
saassaas
saas
 
Scaling apps using azure cloud services
Scaling apps using azure cloud servicesScaling apps using azure cloud services
Scaling apps using azure cloud services
 
Unit 5.pptx
Unit 5.pptxUnit 5.pptx
Unit 5.pptx
 
WebApplicationArchitectureAzure.pptx
WebApplicationArchitectureAzure.pptxWebApplicationArchitectureAzure.pptx
WebApplicationArchitectureAzure.pptx
 
WebApplicationArchitectureAzure.pdf
WebApplicationArchitectureAzure.pdfWebApplicationArchitectureAzure.pdf
WebApplicationArchitectureAzure.pdf
 
Cloud testing with synthetic workload generators
Cloud testing with synthetic workload generatorsCloud testing with synthetic workload generators
Cloud testing with synthetic workload generators
 
Jisto_Whitepaper_Recapturing_Stranded_Resources
Jisto_Whitepaper_Recapturing_Stranded_ResourcesJisto_Whitepaper_Recapturing_Stranded_Resources
Jisto_Whitepaper_Recapturing_Stranded_Resources
 
Mule esb intoduction
Mule esb intoductionMule esb intoduction
Mule esb intoduction
 
A Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeA Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System Uptime
 
Automatic scaling of web applications for cloud computing services
Automatic scaling of web applications for cloud computing servicesAutomatic scaling of web applications for cloud computing services
Automatic scaling of web applications for cloud computing services
 
Cloud architecture
Cloud architectureCloud architecture
Cloud architecture
 
10 Best Practices for Reducing Spend in Azure
10 Best Practices for Reducing Spend in Azure10 Best Practices for Reducing Spend in Azure
10 Best Practices for Reducing Spend in Azure
 
Microservices with Spring
Microservices with SpringMicroservices with Spring
Microservices with Spring
 
Scalable Fault-tolerant microservices
Scalable Fault-tolerant microservicesScalable Fault-tolerant microservices
Scalable Fault-tolerant microservices
 
CVx_Pilot_DR_DS
CVx_Pilot_DR_DSCVx_Pilot_DR_DS
CVx_Pilot_DR_DS
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 
Microservices approach for Websphere commerce
Microservices approach for Websphere commerceMicroservices approach for Websphere commerce
Microservices approach for Websphere commerce
 

Mais de Toru Makabe

インフラ廻戦 品川事変 前夜編
インフラ廻戦 品川事変 前夜編インフラ廻戦 品川事変 前夜編
インフラ廻戦 品川事変 前夜編Toru Makabe
 
Ingress on Azure Kubernetes Service
Ingress on Azure Kubernetes ServiceIngress on Azure Kubernetes Service
Ingress on Azure Kubernetes ServiceToru Makabe
 
細かすぎて伝わらないかもしれない Azure Container Networking Deep Dive
細かすぎて伝わらないかもしれない Azure Container Networking Deep Dive細かすぎて伝わらないかもしれない Azure Container Networking Deep Dive
細かすぎて伝わらないかもしれない Azure Container Networking Deep DiveToru Makabe
 
Demystifying Identities for Azure Kubernetes Service
Demystifying Identities for Azure Kubernetes ServiceDemystifying Identities for Azure Kubernetes Service
Demystifying Identities for Azure Kubernetes ServiceToru Makabe
 
Azure Blueprints - 企業で期待される背景と特徴、活用方法
Azure Blueprints - 企業で期待される背景と特徴、活用方法Azure Blueprints - 企業で期待される背景と特徴、活用方法
Azure Blueprints - 企業で期待される背景と特徴、活用方法Toru Makabe
 
ミッション : メガクラウドを安全にアップデートせよ!
ミッション : メガクラウドを安全にアップデートせよ!ミッション : メガクラウドを安全にアップデートせよ!
ミッション : メガクラウドを安全にアップデートせよ!Toru Makabe
 
俺の Kubernetes Workflow with HashiStack
俺の Kubernetes Workflow with HashiStack俺の Kubernetes Workflow with HashiStack
俺の Kubernetes Workflow with HashiStackToru Makabe
 
Resilience Engineering on Kubernetes
Resilience Engineering on KubernetesResilience Engineering on Kubernetes
Resilience Engineering on KubernetesToru Makabe
 
Real World Azure RBAC
Real World Azure RBACReal World Azure RBAC
Real World Azure RBACToru Makabe
 
Azure Kubernetes Service 2019 ふりかえり
Azure Kubernetes Service 2019 ふりかえりAzure Kubernetes Service 2019 ふりかえり
Azure Kubernetes Service 2019 ふりかえりToru Makabe
 
インフラ野郎AzureチームProX
インフラ野郎AzureチームProXインフラ野郎AzureチームProX
インフラ野郎AzureチームProXToru Makabe
 
NoOps Japan Community 1st Anniversary 祝辞
NoOps Japan Community 1st Anniversary 祝辞 NoOps Japan Community 1st Anniversary 祝辞
NoOps Japan Community 1st Anniversary 祝辞 Toru Makabe
 
ZOZOTOWNのCloud Native Journey
ZOZOTOWNのCloud Native JourneyZOZOTOWNのCloud Native Journey
ZOZOTOWNのCloud Native JourneyToru Makabe
 
Essentials of container
Essentials of containerEssentials of container
Essentials of containerToru Makabe
 
インフラ野郎 Azureチーム at クラウド boost
インフラ野郎 Azureチーム at クラウド boostインフラ野郎 Azureチーム at クラウド boost
インフラ野郎 Azureチーム at クラウド boostToru Makabe
 
ダイ・ハード in the Kubernetes world
ダイ・ハード in the Kubernetes worldダイ・ハード in the Kubernetes world
ダイ・ハード in the Kubernetes worldToru Makabe
 
半日でわかる コンテナー技術 (応用編)
半日でわかる コンテナー技術 (応用編)半日でわかる コンテナー技術 (応用編)
半日でわかる コンテナー技術 (応用編)Toru Makabe
 
インフラエンジニア エボリューション ~激変する IT インフラ技術者像、キャリアとスキルを考える~ at Tech Summit 2018
インフラエンジニア エボリューション ~激変する IT インフラ技術者像、キャリアとスキルを考える~ at Tech Summit 2018インフラエンジニア エボリューション ~激変する IT インフラ技術者像、キャリアとスキルを考える~ at Tech Summit 2018
インフラエンジニア エボリューション ~激変する IT インフラ技術者像、キャリアとスキルを考える~ at Tech Summit 2018Toru Makabe
 

Mais de Toru Makabe (20)

インフラ廻戦 品川事変 前夜編
インフラ廻戦 品川事変 前夜編インフラ廻戦 品川事変 前夜編
インフラ廻戦 品川事変 前夜編
 
Ingress on Azure Kubernetes Service
Ingress on Azure Kubernetes ServiceIngress on Azure Kubernetes Service
Ingress on Azure Kubernetes Service
 
細かすぎて伝わらないかもしれない Azure Container Networking Deep Dive
細かすぎて伝わらないかもしれない Azure Container Networking Deep Dive細かすぎて伝わらないかもしれない Azure Container Networking Deep Dive
細かすぎて伝わらないかもしれない Azure Container Networking Deep Dive
 
Demystifying Identities for Azure Kubernetes Service
Demystifying Identities for Azure Kubernetes ServiceDemystifying Identities for Azure Kubernetes Service
Demystifying Identities for Azure Kubernetes Service
 
Azure Blueprints - 企業で期待される背景と特徴、活用方法
Azure Blueprints - 企業で期待される背景と特徴、活用方法Azure Blueprints - 企業で期待される背景と特徴、活用方法
Azure Blueprints - 企業で期待される背景と特徴、活用方法
 
ミッション : メガクラウドを安全にアップデートせよ!
ミッション : メガクラウドを安全にアップデートせよ!ミッション : メガクラウドを安全にアップデートせよ!
ミッション : メガクラウドを安全にアップデートせよ!
 
俺の Kubernetes Workflow with HashiStack
俺の Kubernetes Workflow with HashiStack俺の Kubernetes Workflow with HashiStack
俺の Kubernetes Workflow with HashiStack
 
Resilience Engineering on Kubernetes
Resilience Engineering on KubernetesResilience Engineering on Kubernetes
Resilience Engineering on Kubernetes
 
俺とHashiCorp
俺とHashiCorp俺とHashiCorp
俺とHashiCorp
 
Real World Azure RBAC
Real World Azure RBACReal World Azure RBAC
Real World Azure RBAC
 
Azure Kubernetes Service 2019 ふりかえり
Azure Kubernetes Service 2019 ふりかえりAzure Kubernetes Service 2019 ふりかえり
Azure Kubernetes Service 2019 ふりかえり
 
インフラ野郎AzureチームProX
インフラ野郎AzureチームProXインフラ野郎AzureチームProX
インフラ野郎AzureチームProX
 
NoOps Japan Community 1st Anniversary 祝辞
NoOps Japan Community 1st Anniversary 祝辞 NoOps Japan Community 1st Anniversary 祝辞
NoOps Japan Community 1st Anniversary 祝辞
 
ZOZOTOWNのCloud Native Journey
ZOZOTOWNのCloud Native JourneyZOZOTOWNのCloud Native Journey
ZOZOTOWNのCloud Native Journey
 
Ops meets NoOps
Ops meets NoOpsOps meets NoOps
Ops meets NoOps
 
Essentials of container
Essentials of containerEssentials of container
Essentials of container
 
インフラ野郎 Azureチーム at クラウド boost
インフラ野郎 Azureチーム at クラウド boostインフラ野郎 Azureチーム at クラウド boost
インフラ野郎 Azureチーム at クラウド boost
 
ダイ・ハード in the Kubernetes world
ダイ・ハード in the Kubernetes worldダイ・ハード in the Kubernetes world
ダイ・ハード in the Kubernetes world
 
半日でわかる コンテナー技術 (応用編)
半日でわかる コンテナー技術 (応用編)半日でわかる コンテナー技術 (応用編)
半日でわかる コンテナー技術 (応用編)
 
インフラエンジニア エボリューション ~激変する IT インフラ技術者像、キャリアとスキルを考える~ at Tech Summit 2018
インフラエンジニア エボリューション ~激変する IT インフラ技術者像、キャリアとスキルを考える~ at Tech Summit 2018インフラエンジニア エボリューション ~激変する IT インフラ技術者像、キャリアとスキルを考える~ at Tech Summit 2018
インフラエンジニア エボリューション ~激変する IT インフラ技術者像、キャリアとスキルを考える~ at Tech Summit 2018
 

Último

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 

Último (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 

Azure Design Review Checklist Availabilityの巻

  • 1.
  • 4.
  • 5.
  • 6.
  • 7. 単一障害点をなくそう All components, services, resources, and compute instances should be deployed as multiple instances to prevent a single point of failure from affecting availability. This includes authentication mechanisms. Design the application to be configurable to use multiple instances, and to automatically detect failures and redirect requests to non-failed instances where the platform does not do this automatically.
  • 8. サービスレベルの異なるワークロードは分離しよう If a service is composed of critical and less-critical workloads, manage them differently and specify the service features and number of instances to meet their availability requirements.
  • 9. 依存関係を理解し、最小化しよう Minimize the number of different services used where possible, and ensure you understand all of the feature and service dependencies that exist in the system. This includes the nature of these dependencies, and the impact of failure or reduced performance in each one on the overall application. Microsoft guarantees at least 99.9 percent availability for most services, but this means that every additional service an application relies on potentially reduces the overall availability SLA of your system by 0.1 percent.
  • 10. タスクとメッセージはべき等(安全に繰り返せるよう)にしよう so that duplicated requests will not cause problems. For example, a service can act as a consumer that handles messages sent as requests by other parts of the system that act as producers. If the consumer fails after processing the message, but before acknowledging that it has been processed, a producer might submit a repeat request which could be handled by another instance of the consumer. For this reason, consumers and the operations they carry out should be idempotent so that repeating a previously executed operation does not render the results invalid. This may mean detecting duplicated messages, or ensuring consistency by using an optimistic approach to handling conflicts.
  • 11. メッセージブローカーでクリティカルなトランザクションの可用性を上げよう Many scenarios for initiating tasks or accessing remote services use messaging to pass instructions between the application and the target service. For best performance, the application should be able to send the message and then return to process more requests, without needing to wait for a reply. To guarantee delivery of messages, the messaging system should provide high availability. Azure Service Bus message queues implement at least once semantics. This means that each message posted to a queue will not be lost, although duplicate copies may be delivered under certain circumstances. If message processing is idempotent (see the previous item), repeated delivery should not be a problem.
  • 12. 機能的縮退を考慮しよう when reaching resource limits, and take appropriate action to minimize the impact for the user. In some cases, the load on the application may exceed the capacity of one or more parts, causing reduced availability and failed connections. Scaling can help to alleviate this, but it may reach a limit imposed by other factors, such as resource availability or cost. Design the application so that, in this situation, it can automatically degrade gracefully. For example, in an ecommerce system, if the order-processing subsystem is under strain (or has even failed completely), it can be temporarily disabled while allowing other functionality (such as browsing the product catalog) to continue. It might be appropriate to postpone requests to a failing subsystem, for example still enabling customers to submit orders but saving them for later processing, when the orders subsystem is available again.
  • 13. 突発的なイベント増に対処しよう Most applications need to handle varying workloads over time, such as peaks first thing in the morning in a business application or when a new product is released in an ecommerce site. Auto- scaling can help to handle the load, but it may take some time for additional instances to come online and handle requests. Prevent sudden and unexpected bursts of activity from overwhelming the application: design it to queue requests to the services it uses and degrade gracefully when queues are near to full capacity. Ensure that there is sufficient performance and capacity available under non-burst conditions to drain the queues and handle outstanding requests. For more information, see the Queue-Based Load Leveling Pattern.
  • 14.
  • 15. 各サービスは複数のインスタンスにデプロイしよう Microsoft makes availability guarantees for services that you create and deploy, but these guarantees are only valid if you deploy at least two instances of each role in the service. This enables one role to be unavailable while the other remains active. This is especially important if you need to deploy updates to a live system without interrupting clients' activities; instances can be taken down and upgraded individually while the others continue online.
  • 16. アプリを複数のデータセンターに配置しよう Although extremely unlikely, it is possible for an entire datacenter to go offline through an event such as a natural disaster or Internet failure. Vital business applications should be hosted in more than one datacenter to provide maximum availability. This can also reduce latency for local users, and provide additional opportunities for flexibility when updating applications.
  • 17. デプロイとメンテナンス作業は、自動化、テストできるようにしよう Distributed applications consist of multiple parts that must work together. Deployment should therefore be automated, using tested and proven mechanisms such as scripts and deployment applications. These can update and validate configuration, and automate the deployment process. Automated techniques should also be used to perform updates of all or parts of applications. It is vital to test all of these processes fully to ensure that errors do not cause additional downtime. All deployment tools must have suitable security restrictions to protect the deployed application; define and enforce deployment policies carefully and minimize the need for human intervention.
  • 18. ステージング環境を用意し、本番環境と切り換える仕組みにしよう where these are available. For example, using Azure Cloud Services staging and production environments allows applications to be switched from one to another instantly through a virtual IP address swap (VIP Swap). However, if you prefer to stage on-premises, or deploy different versions of the application concurrently and gradually migrate users, you may not be able to use a VIP Swap operation.
  • 19. 設定変更で再起動が必要な要素を理解し、対処しよう the instance when possible. In many cases, the configuration settings for an Azure application or service can be changed without requiring the role to be restarted. Role expose events that can be handled to detect configuration changes and apply them to components within the application. However, some changes to the core platform settings do require a role to be restarted. When building components and services, maximize availability and minimize downtime by designing them to accept changes to configuration settings without requiring the application as a whole to be restarted.
  • 20. 更新ドメインを意識してダウンタイムなしでアップデートしよう Azure compute units such as web and worker roles are allocated to upgrade domains. Upgrade domains group role instances together so that, when a rolling update takes place, each role in the upgrade domain is stopped, updated, and restarted in turn. This minimizes the impact on application availability. You can specify how many upgrade domains should be created for a service when the service is deployed.
  • 21. (大事なことなので何回も言います) 可用性セットを使おう Placing two or more virtual machines in the same availability set guarantees that these virtual machines will not be deployed to the same fault domain. To maximize availability, you should create multiple instances of each critical virtual machine used by your system and place these instances in the same availability set. If you are running multiple virtual machines that serve different purposes, create an availability set for each virtual machine. Add instances of each virtual machine to each availability set. For example, if you have created separate virtual machines to act as a web server and a reporting server, create an availability set for the web server and another availability set for the reporting server. Add instances of the web server virtual machine to the web server availability set, and add instances of the reporting server virtual machine to the reporting server availability set.
  • 22.
  • 23. データを遠隔地に複製しよう Data in Azure Storage is automatically replicated within in a datacenter. For even higher availability, use Read-access geo-redundant storage (-RAGRS), which replicates your data to a secondary region and provides read-only access to the data in the secondary location. The data is durable even in the case of a complete regional outage or a disaster.
  • 24. データベースを遠隔地に複製しよう Azure SQL Database and Cosmos DB both support geo-replication, which enables you to configure secondary database replicas in other regions. Secondary databases are available for querying and for failover in the case of a data center outage or the inability to connect to the primary database. For more information, see Failover groups and active geo-replication (SQL Database) and How to distribute data globally with Azure Cosmos DB?.
  • 25. (使えるところでは) 楽観的平行性制御と結果整合性でいこう where possible. Transactions that block access to resources through locking (pessimistic concurrency) can cause poor performance and considerably reduce availability. These problems can become especially acute in distributed systems. In many cases, careful design and techniques such as partitioning can minimize the chances of conflicting updates occurring. Where data is replicated, or is read from a separately updated store, the data will only be eventually consistent. But the advantages usually far outweigh the impact on availability of using transactions to ensure immediate consistency.
  • 26. 戻すことを意識してバックアップしてますか and ensure it meets the Recovery Point Objective (RPO). Regularly and automatically back up data that is not preserved elsewhere, and verify you can reliably restore both the data and the application itself should a failure occur. Data replication is not a backup feature because errors and inconsistencies introduced through failure, error, or malicious operations will be replicated across all stores. The backup process must be secure to protect the data in transit and in storage. Databases or parts of a data store can usually be recovered to a previous point in time by using transaction logs. Microsoft Azure provides a backup facility for data stored in Azure SQL Database. The data is exported to a backup package on Azure blob storage, and can be downloaded to a secure on-premises location for storage.
  • 27. RedisはStandard以上がおすすめ When using Azure Redis Cache, choose the standard option to maintain a secondary copy of the contents.
  • 28.
  • 29. タイムアウト設定は戦略的に Services and resources may become unavailable, causing requests to fail. Ensure that the timeouts you apply are appropriate for each service or resource as well as the client that is accessing them. (In some cases, it may be appropriate to allow a longer timeout for a particular instance of a client, depending on the context and other actions that the client is performing.) Very short timeouts may cause excessive retry operations for services and resources that have considerable latency. Very long timeouts can cause blocking if a large number of requests are queued, waiting for a service or resource to respond.
  • 30. リトライも戦略的に Design a retry strategy for access to all services and resources where they do not inherently support automatic connection retry. Use a strategy that includes an increasing delay between retries as the number of failures increases, to prevent overloading of the resource and to allow it to gracefully recover and handle queued requests. Continual retries with very short delays are likely to exacerbate the problem.
  • 31. あきらめも重要 when remote services are unavailable. There may be situations in which transient or other faults, ranging in severity from a partial loss of connectivity to the complete failure of a service, take much longer than expected to return to normal. Additionally, if a service is very busy, failure in one part of the system may lead to cascading failures, and result in many operations becoming blocked while holding onto critical system resources such as memory, threads, and database connections. Instead of continually retrying an operation that is unlikely to succeed, the application should quickly accept that the operation has failed, and gracefully handle this failure. You can use the circuit breaker pattern to reject requests for specific operations for defined periods. For more information, see Circuit Breaker Pattern.
  • 32. ダメなら他へつなぐ to mitigate the impact of a specific service being offline or unavailable. Design applications to take advantage of multiple instances without affecting operation and existing connections where possible. Use multiple instances and distribute requests between them, and detect and avoid sending requests to failed instances, in order to maximize availability.
  • 33. ダメなら他へ(応用編) where possible. For example, if writing to SQL Database fails, temporarily store data in blob storage. Provide a facility to replay the writes in blob storage to SQL Database when the service becomes available. In some cases, a failed operation may have an alternative action that allows the application to continue to work even when a component or service fails. If possible, detect failures and redirect requests to other services that can offer a suitable alternative functionality, or to back up or reduced functionality instances that can maintain core operations while the primary service is offline.
  • 34.
  • 35. 起こりやすい障害の対処法はまとめておく to report the situation to operations staff. For failures that are likely but have not yet occurred, provide sufficient data to enable operations staff to determine the cause, mitigate the situation, and ensure that the system remains available. For failures that have already occurred, the application should return an appropriate error message to the user but attempt to continue running, albeit with reduced functionality. In all cases, the monitoring system should capture comprehensive details to enable operations staff to effect a quick recovery, and if necessary, for designers and developers to modify the system to prevent the situation from arising again.
  • 36. 落ちる前に気づく The health and performance of an application can degrade over time, without being noticeable until it fails. Implement probes or check functions that are executed regularly from outside the application. These checks can be as simple as measuring response time for the application as a whole, for individual parts of the application, for individual services that the application uses, or for individual components. Check functions can execute processes to ensure they produce valid results, measure latency and check availability, and extract information from the system.
  • 37. いざというとき本当に切り替わりますか to ensure they are available and operate as expected. Changes to systems and operations may affect failover and fallback functions, but the impact may not be detected until the main system fails or becomes overloaded. Test it before it is required to compensate for a live problem at runtime.
  • 38. すべては監視システムの信頼の上に Automated failover and fallback systems, and manual visualization of system health and performance by using dashboards, all depend on monitoring and instrumentation functioning correctly. If these elements fail, miss critical information, or report inaccurate data, an operator might not realize that the system is unhealthy or failing.
  • 39. 実行時間が長いワークフロー全体が落ちるとショックでかい and retry on failure. Long-running workflows are often composed of multiple steps. Ensure that each step is independent and can be retried to minimize the chance that the entire workflow will need to be rolled back, or that multiple compensating transactions need to be executed. Monitor and manage the progress of long-running workflows by implementing a pattern such as Scheduler Agent Supervisor Pattern.
  • 40. 広域災害に対する仕組みと訓練 Create an accepted, fully-tested plan for recovery from any type of failure that may affect system availability. Choose a multi-site disaster recovery architecture for any mission-critical applications. Identify a specific owner of the disaster recovery plan, including automation and testing. Ensure the plan is well-documented, and automate the process as much as possible. Establish a backup strategy for all reference and transactional data, and test the restoration of these backups regularly. Train operations staff to execute the plan, and perform regular disaster simulations to validate and improve the plan.
  • 41. © 2017 Microsoft Corporation. All rights reserved. 本情報の内容(添付文書、リンク先などを含む)は、作成日時点でのものであり、予告なく変更される場合があります。