SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
SRE Book Ch 10
KKStream SRE Study Group
Presenter: Chris Huang
2018/08/29
Service Reliability Hierarchy
● From the most basic requirements needed
for a system to function as a service
● Permitting self-actualization and taking
active control of the direction of the service
rather than reactively fighting fires
2
Service Reliability Hierarchy
Incident Response
● On-call support is a tool we use to achieve
our larger mission and remain in touch with
how distributed computing systems actually
work (and fail!).
● If we could find a way to relieve ourselves of
carrying a pager, we would.
3
Service Reliability Hierarchy
Postmortem and Root-Cause Analysis
● We aim to be alerted on and manually solve
only new and exciting problems presented
by our service; it’s woefully boring to "fix"
the same issue over and over.
● This mindset is one of the key differentiators
between the SRE philosophy and some
more traditional operations-focused
environments.
4
Chapter 10 -
Practical Alerting
Practical Alerting
6
● Being alerted for single-machine failures is unacceptable because such data is too noisy to be
actionable.
● Instead we try to build systems that are robust against failures in the systems they depend on.
● A large system should be designed to aggregate signals and prune outliers.
● We need monitoring systems that allow us to alert for high-level service objectives, but retain the
granularity to inspect individual components as needed.
To think about the CloudWatch functionalities that qualivent to
Borgmon
Getting Metrics - varz
● Every Google service has a built-in HTTP server to export internal metrics. Borgmon can easily
fetch a target’s metrics by one HTTP fetch.
● A Borgmon can collect metrics from another Borgmon, so we can build hierarchies that follow the
topology of the service, aggregating and summarizing information and discarding some strategically
at each level.
7
chris@prod-server [~] $ curl http://webserver:80/varz
http_responses map:code 200:25 404:0 500:12
chris@prod-server [~] $ curl http://webserver:80/varz
http_requests 37
errors_total 12
● The /varz HTTP handler simply lists all the exported variables in plain text. A later extension added
a mapped variable, which allows the exporter to define several labels on a variable name, and then
export a table of values or a histogram.
JMX-liked Approach, But Simplified
● JMX (Java Management Extensions)
8
AWS CloudWatch
● AWS services (ELB, RDS) exports
default metrics to CloudWatch.
There is CloudWatch agent to
send instance metrics (CPU, disk,
memory) to CloudWatch.
● For user application, AWS
requires app to send customized
metrics.
9
Alerting
10
CloudWatch Concepts
The following terminology and concepts
are central to your understanding and use
of Amazon CloudWatch:
● Namespaces
● Metrics
● Dimensions
● Statistics
● Percentiles
● Alarms
11
Metrics
● Metrics are the fundamental concept in CloudWatch. A
metric represents a time-ordered set of data points that are
published to CloudWatch.
● AWS services send metrics to CloudWatch, and you can
send your own custom metrics to CloudWatch.
● Metrics are uniquely defined by a name, a namespace, and
zero or more dimensions. Each data point has a time
stamp, and (optionally) a unit of measure. When you
request statistics, the returned data stream is identified by
namespace, metric name, dimension, and (optionally) the
unit.
Dimensions
● A dimension is a name/value pair that uniquely identifies a metric. You can assign up to 10 dimensions to a
metric.
● Every metric has specific characteristics that describe it, and you can think of dimensions as categories for
those characteristics. Dimensions help you design a structure for your statistics plan.
● AWS services that send data to CloudWatch attach dimensions to each metric. You can use dimensions to
filter the results that CloudWatch returns. For example, you can get statistics for a specific EC2 instance by
specifying the InstanceId dimension when you search for metrics.
● For metrics produced by certain AWS services, such as Amazon EC2, CloudWatch can aggregate data
across dimensions.
● CloudWatch does not aggregate across dimensions for your custom metrics.
12
Metrics Statistics
● Statistics are metric data aggregations over specified periods of time. Aggregations are made using the
namespace, metric name, dimensions, and the data point unit of measure, within the time period you
specify.
13
Publish Custom Metrics
You can publish your own metrics to CloudWatch using the AWS CLI or an API. You can view statistical graphs of
your published metrics with the AWS Management Console.
14
chris@prod-server [~] $ aws cloudwatch put-metric-data --namespace VP/API --metric-name
LoginCount --unit Count --value 1 --dimensions Platform=iOS,Subscribe=Freemium
chris@prod-server [~] $ aws cloudwatch put-metric-data --namespace VP/API --metric-name
LoginLatency --unit Milliseconds --value 200.0 --dimensions
Platform=iOS,Subscribe=Freemium
We can simply aggregate and visualize LoginCount on CloudWatch dashboard for
● Total login user count in last 6 hours
● iOS login user count in last 6 hours
● Average login latency for Android Freemium user count in last 6 hours
Black-Box v.s. White-Box Monitoring
● Borgmon (or CloudWatch) is a white-box
monitoring system—it inspects the internal state
of the target service, and the rules are written with
knowledge of the internals in mind. The transparent
nature of this model provides great power to identify
quickly what components are failing
● But you only see the queries that arrive at the
target; the queries that never make it due to a DNS
error are invisible, while queries lost due to a server
crash never make a sound.
● Black-box monitoring like Pingdom is a good way
to see from user’s perspective
15
We’re Hiring
16
https://jobs.lever.co/kkstream
Thank you!
KKStream (Japan)
KKBOX Japan LLC, 6F Urbanprem
Shibuya,
1-4-2 Shibuya, Shibuya-ku,
Tokyo, 150-0002, Japan
Tel: +81 3 6758-7400
Fax: +81 3 6758-7401
Email: biz_info_jp@kkstream.com.tw
KKStream (Taiwan)
8F, 19-11, Sanchong Rd,
Nangang Dist, Taipei City 115,
Taiwan
Tel: +886 2 2655-0369
Fax: +886 2 2655-0929
Email:
biz_info_tw@kkstream.com.tw
Copyright © 2018 KKStream Limited. All rights reserved.
www.kkstream.com.tw

Mais conteúdo relacionado

Mais procurados

Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...IJSRD
 
Towards secure and dependable storage service in cloud
Towards secure and dependable storage service in cloudTowards secure and dependable storage service in cloud
Towards secure and dependable storage service in cloudsibidlegend
 
An introduction to Zen 6 - integrated network and customer experience analytics
An introduction to Zen 6 - integrated network and customer experience analyticsAn introduction to Zen 6 - integrated network and customer experience analytics
An introduction to Zen 6 - integrated network and customer experience analyticsSysMech
 
Cloud computing Review over various scheduling algorithms
Cloud computing Review over various scheduling algorithmsCloud computing Review over various scheduling algorithms
Cloud computing Review over various scheduling algorithmsIJEEE
 
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5Robert Grossman
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingDIGVIJAY SHINDE
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptUtshab Saha
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsPooyan Jamshidi
 
Eventual Consistency - JUG DA
Eventual Consistency - JUG DAEventual Consistency - JUG DA
Eventual Consistency - JUG DASusanne Braun
 
A survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmentA survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmenteSAT Publishing House
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overviewComplex Event Processing - A brief overview
Complex Event Processing - A brief overviewIstván Dávid
 
Leverage High Performance Cloud Computing for your business! | Sysfore
Leverage High Performance Cloud Computing for your business! | SysforeLeverage High Performance Cloud Computing for your business! | Sysfore
Leverage High Performance Cloud Computing for your business! | SysforeSysfore Technologies
 
Mod05lec22(cloudonomics tutorial)
Mod05lec22(cloudonomics tutorial)Mod05lec22(cloudonomics tutorial)
Mod05lec22(cloudonomics tutorial)Ankit Gupta
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)Eran Levy
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupAlan Iovine
 
Scheduling in cloud computing
Scheduling in cloud computingScheduling in cloud computing
Scheduling in cloud computingijccsa
 

Mais procurados (20)

Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
 
Towards secure and dependable storage service in cloud
Towards secure and dependable storage service in cloudTowards secure and dependable storage service in cloud
Towards secure and dependable storage service in cloud
 
An introduction to Zen 6 - integrated network and customer experience analytics
An introduction to Zen 6 - integrated network and customer experience analyticsAn introduction to Zen 6 - integrated network and customer experience analytics
An introduction to Zen 6 - integrated network and customer experience analytics
 
Cloud computing Review over various scheduling algorithms
Cloud computing Review over various scheduling algorithmsCloud computing Review over various scheduling algorithms
Cloud computing Review over various scheduling algorithms
 
Observability
ObservabilityObservability
Observability
 
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computing
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newppt
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
 
Eventual Consistency - JUG DA
Eventual Consistency - JUG DAEventual Consistency - JUG DA
Eventual Consistency - JUG DA
 
G216063
G216063G216063
G216063
 
A survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmentA survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environment
 
N1803048386
N1803048386N1803048386
N1803048386
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overviewComplex Event Processing - A brief overview
Complex Event Processing - A brief overview
 
Leverage High Performance Cloud Computing for your business! | Sysfore
Leverage High Performance Cloud Computing for your business! | SysforeLeverage High Performance Cloud Computing for your business! | Sysfore
Leverage High Performance Cloud Computing for your business! | Sysfore
 
Mod05lec22(cloudonomics tutorial)
Mod05lec22(cloudonomics tutorial)Mod05lec22(cloudonomics tutorial)
Mod05lec22(cloudonomics tutorial)
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetup
 
Scheduling in cloud computing
Scheduling in cloud computingScheduling in cloud computing
Scheduling in cloud computing
 

Semelhante a Kks sre book_ch10

Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Brian Brazil
 
SmartCloud Monitoring and Capacity Planning
SmartCloud Monitoring and Capacity PlanningSmartCloud Monitoring and Capacity Planning
SmartCloud Monitoring and Capacity PlanningIBM Danmark
 
Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Brian Brazil
 
ENT203 Monitoring and Autoscaling, a Match Made in Heaven
ENT203 Monitoring and Autoscaling, a Match Made in HeavenENT203 Monitoring and Autoscaling, a Match Made in Heaven
ENT203 Monitoring and Autoscaling, a Match Made in HeavenAmazon Web Services
 
Why NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB AtlasWhy NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB AtlasDatavail
 
Jamcracker Cloud Management Platform Updates: Devops Framework, Migration Pla...
Jamcracker Cloud Management Platform Updates: Devops Framework, Migration Pla...Jamcracker Cloud Management Platform Updates: Devops Framework, Migration Pla...
Jamcracker Cloud Management Platform Updates: Devops Framework, Migration Pla...Jamcracker Inc
 
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
(ARC305) How J&J Manages AWS At Scale For Enterprise WorkloadsAmazon Web Services
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
AWS Cloud Solutions Architects & Tech Enthusiasts
AWS Cloud Solutions Architects & Tech EnthusiastsAWS Cloud Solutions Architects & Tech Enthusiasts
AWS Cloud Solutions Architects & Tech EnthusiastsJasonRoy50
 
Level 3 Certification: Setting up Sumo Logic - Oct 2018
Level 3 Certification: Setting up Sumo Logic - Oct  2018Level 3 Certification: Setting up Sumo Logic - Oct  2018
Level 3 Certification: Setting up Sumo Logic - Oct 2018Sumo Logic
 
Who's in your Cloud? Cloud State Monitoring
Who's in your Cloud? Cloud State MonitoringWho's in your Cloud? Cloud State Monitoring
Who's in your Cloud? Cloud State MonitoringKevin Hakanson
 
Refactoring Web Services on AWS cloud (PaaS & SaaS)
Refactoring Web Services on AWS cloud (PaaS & SaaS)Refactoring Web Services on AWS cloud (PaaS & SaaS)
Refactoring Web Services on AWS cloud (PaaS & SaaS)IRJET Journal
 
TSOLogic_I-P_Overview-2016-08-16
TSOLogic_I-P_Overview-2016-08-16TSOLogic_I-P_Overview-2016-08-16
TSOLogic_I-P_Overview-2016-08-16Terence White
 
Aws performance-efficiency-pillar
Aws performance-efficiency-pillarAws performance-efficiency-pillar
Aws performance-efficiency-pillarDarnette A
 
SQL in the cloud performance benchmarks
SQL in the cloud performance benchmarks SQL in the cloud performance benchmarks
SQL in the cloud performance benchmarks Thavash Govender
 
Cloudamize Platform Training for Azure.pptx
Cloudamize Platform Training for Azure.pptxCloudamize Platform Training for Azure.pptx
Cloudamize Platform Training for Azure.pptxSasikumarPalanivel3
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services Amazon Web Services
 

Semelhante a Kks sre book_ch10 (20)

Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
SmartCloud Monitoring and Capacity Planning
SmartCloud Monitoring and Capacity PlanningSmartCloud Monitoring and Capacity Planning
SmartCloud Monitoring and Capacity Planning
 
Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)Microservices and Prometheus (Microservices NYC 2016)
Microservices and Prometheus (Microservices NYC 2016)
 
ENT203 Monitoring and Autoscaling, a Match Made in Heaven
ENT203 Monitoring and Autoscaling, a Match Made in HeavenENT203 Monitoring and Autoscaling, a Match Made in Heaven
ENT203 Monitoring and Autoscaling, a Match Made in Heaven
 
Why NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB AtlasWhy NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB Atlas
 
Monitoring on Amazon AWS Cloud
Monitoring on Amazon AWS Cloud Monitoring on Amazon AWS Cloud
Monitoring on Amazon AWS Cloud
 
Jamcracker Cloud Management Platform Updates: Devops Framework, Migration Pla...
Jamcracker Cloud Management Platform Updates: Devops Framework, Migration Pla...Jamcracker Cloud Management Platform Updates: Devops Framework, Migration Pla...
Jamcracker Cloud Management Platform Updates: Devops Framework, Migration Pla...
 
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
 
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
AWS Cloud Solutions Architects & Tech Enthusiasts
AWS Cloud Solutions Architects & Tech EnthusiastsAWS Cloud Solutions Architects & Tech Enthusiasts
AWS Cloud Solutions Architects & Tech Enthusiasts
 
Level 3 Certification: Setting up Sumo Logic - Oct 2018
Level 3 Certification: Setting up Sumo Logic - Oct  2018Level 3 Certification: Setting up Sumo Logic - Oct  2018
Level 3 Certification: Setting up Sumo Logic - Oct 2018
 
Who's in your Cloud? Cloud State Monitoring
Who's in your Cloud? Cloud State MonitoringWho's in your Cloud? Cloud State Monitoring
Who's in your Cloud? Cloud State Monitoring
 
Refactoring Web Services on AWS cloud (PaaS & SaaS)
Refactoring Web Services on AWS cloud (PaaS & SaaS)Refactoring Web Services on AWS cloud (PaaS & SaaS)
Refactoring Web Services on AWS cloud (PaaS & SaaS)
 
TSOLogic_I-P_Overview-2016-08-16
TSOLogic_I-P_Overview-2016-08-16TSOLogic_I-P_Overview-2016-08-16
TSOLogic_I-P_Overview-2016-08-16
 
Aws performance-efficiency-pillar
Aws performance-efficiency-pillarAws performance-efficiency-pillar
Aws performance-efficiency-pillar
 
SQL in the cloud performance benchmarks
SQL in the cloud performance benchmarks SQL in the cloud performance benchmarks
SQL in the cloud performance benchmarks
 
cc.pptx
cc.pptxcc.pptx
cc.pptx
 
Cloudamize Platform Training for Azure.pptx
Cloudamize Platform Training for Azure.pptxCloudamize Platform Training for Azure.pptx
Cloudamize Platform Training for Azure.pptx
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 

Mais de Chris Huang

Real time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystemReal time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystemChris Huang
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...Chris Huang
 
Approaching real-time-hadoop
Approaching real-time-hadoopApproaching real-time-hadoop
Approaching real-time-hadoopChris Huang
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorialChris Huang
 
Scaling big-data-mining-infra2
Scaling big-data-mining-infra2Scaling big-data-mining-infra2
Scaling big-data-mining-infra2Chris Huang
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Chris Huang
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012Chris Huang
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
 
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)Chris Huang
 
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)Chris Huang
 
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1Chris Huang
 
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)Chris Huang
 
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)Chris Huang
 
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)Chris Huang
 
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)Chris Huang
 
重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)Chris Huang
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsChris Huang
 

Mais de Chris Huang (20)

Real time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystemReal time big data applications with hadoop ecosystem
Real time big data applications with hadoop ecosystem
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
 
Approaching real-time-hadoop
Approaching real-time-hadoopApproaching real-time-hadoop
Approaching real-time-hadoop
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Scaling big-data-mining-infra2
Scaling big-data-mining-infra2Scaling big-data-mining-infra2
Scaling big-data-mining-infra2
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...
 
Wissbi osdc pdf
Wissbi osdc pdfWissbi osdc pdf
Wissbi osdc pdf
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)
 
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)
 
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)
 
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2
 
重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1
 
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)
 
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)
 
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)
 
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)
 
重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 

Último

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingSelcen Ozturkcan
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Kks sre book_ch10

  • 1. SRE Book Ch 10 KKStream SRE Study Group Presenter: Chris Huang 2018/08/29
  • 2. Service Reliability Hierarchy ● From the most basic requirements needed for a system to function as a service ● Permitting self-actualization and taking active control of the direction of the service rather than reactively fighting fires 2
  • 3. Service Reliability Hierarchy Incident Response ● On-call support is a tool we use to achieve our larger mission and remain in touch with how distributed computing systems actually work (and fail!). ● If we could find a way to relieve ourselves of carrying a pager, we would. 3
  • 4. Service Reliability Hierarchy Postmortem and Root-Cause Analysis ● We aim to be alerted on and manually solve only new and exciting problems presented by our service; it’s woefully boring to "fix" the same issue over and over. ● This mindset is one of the key differentiators between the SRE philosophy and some more traditional operations-focused environments. 4
  • 6. Practical Alerting 6 ● Being alerted for single-machine failures is unacceptable because such data is too noisy to be actionable. ● Instead we try to build systems that are robust against failures in the systems they depend on. ● A large system should be designed to aggregate signals and prune outliers. ● We need monitoring systems that allow us to alert for high-level service objectives, but retain the granularity to inspect individual components as needed. To think about the CloudWatch functionalities that qualivent to Borgmon
  • 7. Getting Metrics - varz ● Every Google service has a built-in HTTP server to export internal metrics. Borgmon can easily fetch a target’s metrics by one HTTP fetch. ● A Borgmon can collect metrics from another Borgmon, so we can build hierarchies that follow the topology of the service, aggregating and summarizing information and discarding some strategically at each level. 7 chris@prod-server [~] $ curl http://webserver:80/varz http_responses map:code 200:25 404:0 500:12 chris@prod-server [~] $ curl http://webserver:80/varz http_requests 37 errors_total 12 ● The /varz HTTP handler simply lists all the exported variables in plain text. A later extension added a mapped variable, which allows the exporter to define several labels on a variable name, and then export a table of values or a histogram.
  • 8. JMX-liked Approach, But Simplified ● JMX (Java Management Extensions) 8
  • 9. AWS CloudWatch ● AWS services (ELB, RDS) exports default metrics to CloudWatch. There is CloudWatch agent to send instance metrics (CPU, disk, memory) to CloudWatch. ● For user application, AWS requires app to send customized metrics. 9
  • 11. CloudWatch Concepts The following terminology and concepts are central to your understanding and use of Amazon CloudWatch: ● Namespaces ● Metrics ● Dimensions ● Statistics ● Percentiles ● Alarms 11 Metrics ● Metrics are the fundamental concept in CloudWatch. A metric represents a time-ordered set of data points that are published to CloudWatch. ● AWS services send metrics to CloudWatch, and you can send your own custom metrics to CloudWatch. ● Metrics are uniquely defined by a name, a namespace, and zero or more dimensions. Each data point has a time stamp, and (optionally) a unit of measure. When you request statistics, the returned data stream is identified by namespace, metric name, dimension, and (optionally) the unit.
  • 12. Dimensions ● A dimension is a name/value pair that uniquely identifies a metric. You can assign up to 10 dimensions to a metric. ● Every metric has specific characteristics that describe it, and you can think of dimensions as categories for those characteristics. Dimensions help you design a structure for your statistics plan. ● AWS services that send data to CloudWatch attach dimensions to each metric. You can use dimensions to filter the results that CloudWatch returns. For example, you can get statistics for a specific EC2 instance by specifying the InstanceId dimension when you search for metrics. ● For metrics produced by certain AWS services, such as Amazon EC2, CloudWatch can aggregate data across dimensions. ● CloudWatch does not aggregate across dimensions for your custom metrics. 12
  • 13. Metrics Statistics ● Statistics are metric data aggregations over specified periods of time. Aggregations are made using the namespace, metric name, dimensions, and the data point unit of measure, within the time period you specify. 13
  • 14. Publish Custom Metrics You can publish your own metrics to CloudWatch using the AWS CLI or an API. You can view statistical graphs of your published metrics with the AWS Management Console. 14 chris@prod-server [~] $ aws cloudwatch put-metric-data --namespace VP/API --metric-name LoginCount --unit Count --value 1 --dimensions Platform=iOS,Subscribe=Freemium chris@prod-server [~] $ aws cloudwatch put-metric-data --namespace VP/API --metric-name LoginLatency --unit Milliseconds --value 200.0 --dimensions Platform=iOS,Subscribe=Freemium We can simply aggregate and visualize LoginCount on CloudWatch dashboard for ● Total login user count in last 6 hours ● iOS login user count in last 6 hours ● Average login latency for Android Freemium user count in last 6 hours
  • 15. Black-Box v.s. White-Box Monitoring ● Borgmon (or CloudWatch) is a white-box monitoring system—it inspects the internal state of the target service, and the rules are written with knowledge of the internals in mind. The transparent nature of this model provides great power to identify quickly what components are failing ● But you only see the queries that arrive at the target; the queries that never make it due to a DNS error are invisible, while queries lost due to a server crash never make a sound. ● Black-box monitoring like Pingdom is a good way to see from user’s perspective 15
  • 17. Thank you! KKStream (Japan) KKBOX Japan LLC, 6F Urbanprem Shibuya, 1-4-2 Shibuya, Shibuya-ku, Tokyo, 150-0002, Japan Tel: +81 3 6758-7400 Fax: +81 3 6758-7401 Email: biz_info_jp@kkstream.com.tw KKStream (Taiwan) 8F, 19-11, Sanchong Rd, Nangang Dist, Taipei City 115, Taiwan Tel: +886 2 2655-0369 Fax: +886 2 2655-0929 Email: biz_info_tw@kkstream.com.tw Copyright © 2018 KKStream Limited. All rights reserved. www.kkstream.com.tw