SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Deep Dive in Cloud Monitoring
with Amazon EKS and Prometheus
Pahud Hsieh
Specialist SA, Serverless
Amazon Web Services
Kakashi Liu
Infra Lead
UmboCV
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon EKS in the Past Year
● Started in us-east-1 and us-west-2
● Released VPC CNI 1.0
● HIPPA Support
● Released AMI build scripts on Github
● Released VPC CNI 1.1
● Enabled GPU Support
● Support API Aggregation
● Support HPA
● Support eu-west-1
● CLI support for writing the kubeconfig
● Support for Admission Controllers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon EKS in the Past Year
● Released VPC CNI 1.2
● Allow for additional VPC CIDR ranges
● Support for us-east-2
● Official support for ALB Ingress
● Container Marketplace
● CloudMap Integration
● Support for AWS App Mesh
● Support for eu-central1, ap-southeast-1, ap-southeast-2, ap-
northeast-1
● Support for ap-northeast-2
● Added the SLA
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Immediately after that
● Achieved ISO and PCI compliance
● Support for ap-south-1, eu-west-2, eu-west-3
● Released VPC CNI 1.3
● Added a new qiuckstart
● Allowed private API Endpoints
● Launched an App Mesh controller at GA
● Public Preview for Windows nodes
● Deep Learning container launch
● Added 1.2 with a new cluster update API
● Released CSI Drivers for FSx and EFS
● Control plane logs
● Public Preview of A1 instances
● Released a Machine Learning Benchmark tool
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
CloudWatch Container Insights(preview)
Dimensions for Kubernetes
• Clusters
• Nodes
• Services
• Namespaces
• Pods
Pod Metrics
• pod_cpu_reserved_capacity
• pod_cpu_utilization
• pod_cpu_utilization_over_pod_li
mit
• pod_memory_reserved_capacity
• pod_memory_utilization
• pod_memory_utilization_over_p
od_limit
• pod_network_rx_bytes
• pod_network_tx_bytes
Other Metrics
• cluster_failed_node_count
• cluster_node_count
• namespace_number_of_runni
ng_pods
• node_cpu_limit
• node_cpu_reserved_capacity
• node_cpu_usage_total
• node_cpu_utilization
• node_filesystem_utilization
• node_memory_limit
• node_memory_reserved_capa
city
• node_memory_utilization
• node_memory_working_set
• node_network_total_bytes
• node_number_of_running_containers
• node_number_of_running_pods
• service_number_of_running_pods
Reference - https://amzn.to/2HFtHDt
Threshold and Alarm Actions
Amazon EKS and Prometheus
Prometheus
Why Prometheus?
Community
Number of integrations
Ease of use
Why not Prometheus?
Manage it yourself
Complexity in large setups
Possibility: Hybrid Approach
Use Prometheus to collect metrics that
are exposed on /metrics endpoints
Send a subset of critical metrics to
Amazon CloudWatch or a third party
solution.
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Hello!
I am kakashi
- Infra Lead @Umbo CV
- Co-organizer @Golang Taipei Gathering
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Traditional
Solutions
Umbo
Light
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Agenda
Why monitoring
Umbo CV Monitoring pipeline
Prometheus: Why and What
Prometheus with EKS
Use cases
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Why monitoring
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Why monitoring
Alerting Long-term trends
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Umbo CV Monitoring pipeline
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Monitoring types
Infrastructure
Application
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Application monitoring
EC2
Metrics
Store
container
container
exporter
exporter
exporter
/metrics
EC2 /metrics
Collect
Alert
Expose
Metrics
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Prometheus: Why and What
● Graduates Within CNCF.
● Can handle multi-dimensional metrics.
● Performance: can ingest millions of samples per second.
● Powerful query language: PromQL.
● Built-in alerting tool and service discovery mechanism.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Prometheus metrics
EC2 /metrics
EC2 /metrics
User request
http_requests_total{code=200, path="/api/user"} 10
metric_name labels value
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
PromQL example
Total requests / second
sum(rate(http_requests_total[5m]))
Total 5xx requests / second
sum(rate(http_requests_total{code=~"5.*"}[5
m]))
Current percentage of errors across all instances
sum(rate(http_requests_total{code=~"5.*"}[5m])) /
sum(rate(http_requests_total[5m]))
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Alerting rule
alert: Percentage_Of_Errors_Is_High
expr: sum(rate(http_requests_total{code=~"5.*"}[5m]))
/
sum(rate(http_requests_total[5m])) > 5
for: 5m
labels:
severity: critical
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Prometheus with EKS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Prometheus ❤ EKS
● Monitoring system is critical.
● Running Prometheus on Kubernetes can
easily achieve HA.
● Prometheus operator makes it ever easier
○ Automated management and upgrades of
Prometheus.
○ Native k8s configuration.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Install Prometheus on EKS by helm
1. Install Promethues Operator chart
2. Verify
$ helm install --name prom --namespace monitoring stable/prometheus-operator
$ kubectl --namespace monitoring get pods
NAME READY STATUS RESTARTS AGE
alertmanager-prom-op-alertmanager-0 2/2 Running 0 1m
prometheus-prom-op-prometheus-0 3/3 Running 1 1m
prom-op-grafana-5c59ddfb9d-zqfqt 2/2 Running 0 2m
prom-op-kube-state-metrics-76786cc9b4-8q4bj 1/1 Running 0 2m
prom-op-prometheus-node-exporter-6jclc 1/1 Running 0 2m
prom-op-prometheus-node-exporter-bxr49 1/1 Running 0 2m
prom-op-prometheus-operato-operator-6cbf5d5cfd-z6fz4 1/1 Running 0 2m
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Prometheus Operator CRD
● Prometheus & AlertManager
○ Define Prometheus and AlertManager deployment.
● ServiceMonitor
○ Used to specify how metric of k8s services can be
scraped.
● PrometheusRule
○ Can be loaded by a Prometheus instance containing
Prometheus alerting and recording rules.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
EKS cluster monitoring
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
EKS application monitoring through ServiceMonitor
apiVersion:
monitoring.coreos.com/v1
kind: Servicemonitor
metadata:
name: api-servicemonitor
spec:
selector:
matchLabels:
app: api-server
Labels:
app: api-server
Labels:
app: api-server2
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Alerting by PrometheusRule
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
spec:
groups:
- name: api.rules
rules:
- alert: Percentage_Of_Errors_Is_High
expr:
sum(rate(http_requests_total{code=~"5.*"}[5m])) /
sum(rate(http_requests_total[5m])) > 5
for: 5m
labels:
severity: critical
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Dashboard for EKS cluster
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Monitoring camera detection pipeline
Media
Serve
r
CV
Detectio
n
API
Serve
r
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Monitoring camera detection pipeline
Media
Serve
r
CV
Detectio
n
API
Serve
r
# of
frames # cv
requests
# of events
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Service discovery
Media
Serve
r
CV
Detectio
n
API
Serve
r
Scraping through EC2 service
discovery
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Service discovery
Media
Server
CV
Detection
API
Server
Scraping
global:
scrape_interval: 1s
evaluation_interval: 1s
scrape_configs:
- job_name: 'node'
ec2_sd_configs:
- region: eu-east-1
access_key:
<ACCESS_KEY_HERE>
secret_key:
<SECRET_KEY_HERE>
port: 9273
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Application metrics
Media
Serve
r
CV
Detectio
n
API
Serve
r
ms_frames_total{env="production", service="ms", cameraId="ID-123456"}
1000
# of frames
# of cv requests cvreqest_total{env="production", service="cv", cameraId="ID-123456"} 300
# of events event_total{env="production", service="cv", cameraId="ID-123456"} 5
# of frames # of cv request # of events
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Dashboard
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Alerting
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
spec:
groups:
- name: camera.rules
rules:
- alert: FpsLow
annotations:
message: "{{ $labels.cameraid }} fps is lower than 2fps"
expr: sum(rate(ms_frames_total{env="production", cameraId=".+"}[10m])) < 2
for: 30mins
labels:
severity: critical
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Thank you!
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Mais conteúdo relacionado

Mais procurados

[Azure Governance] Lesson 1 : Azure Naming Convention
[Azure Governance] Lesson 1 : Azure Naming Convention[Azure Governance] Lesson 1 : Azure Naming Convention
[Azure Governance] Lesson 1 : Azure Naming Convention☁ Hicham KADIRI ☁
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Amazon Web Services
 
Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019
Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019 Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019
Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019 Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Building Event-driven Architectures with Amazon EventBridge
Building Event-driven Architectures with Amazon EventBridge Building Event-driven Architectures with Amazon EventBridge
Building Event-driven Architectures with Amazon EventBridge James Beswick
 
Encryption and Key Management in AWS
Encryption and Key Management in AWSEncryption and Key Management in AWS
Encryption and Key Management in AWSAmazon Web Services
 
Well Architected Framework - Data
Well Architected Framework - Data Well Architected Framework - Data
Well Architected Framework - Data Craig Milroy
 
AWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
AWS Webinar Series - Cost Optimisation Levers, Tools, and StrategiesAWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
AWS Webinar Series - Cost Optimisation Levers, Tools, and StrategiesAmazon Web Services
 
AWS solution Architect Associate study material
AWS solution Architect Associate study materialAWS solution Architect Associate study material
AWS solution Architect Associate study materialNagesh Ramamoorthy
 
So you want to be Well-Architected?
So you want to be Well-Architected?So you want to be Well-Architected?
So you want to be Well-Architected?Amazon Web Services
 
Backup and Archiving in the AWS Cloud
Backup and Archiving in the AWS CloudBackup and Archiving in the AWS Cloud
Backup and Archiving in the AWS CloudAmazon Web Services
 
Google Anthos - Azure Stack - AWS Outposts :Comparison
Google Anthos - Azure Stack - AWS Outposts :ComparisonGoogle Anthos - Azure Stack - AWS Outposts :Comparison
Google Anthos - Azure Stack - AWS Outposts :ComparisonKrishna-Kumar
 
Getting Started with AWS Lambda Serverless Computing
Getting Started with AWS Lambda Serverless ComputingGetting Started with AWS Lambda Serverless Computing
Getting Started with AWS Lambda Serverless ComputingAmazon Web Services
 
Reducing the Total Cost of IT Infrastructure with AWS Cloud Economics
Reducing the Total Cost of IT Infrastructure with AWS Cloud EconomicsReducing the Total Cost of IT Infrastructure with AWS Cloud Economics
Reducing the Total Cost of IT Infrastructure with AWS Cloud EconomicsAmazon Web Services
 
Living the AWS Well Architected Framework
Living the AWS Well Architected FrameworkLiving the AWS Well Architected Framework
Living the AWS Well Architected FrameworkAdam Dillman
 
AWS AutoScaling
AWS AutoScalingAWS AutoScaling
AWS AutoScalingMahesh Raj
 

Mais procurados (20)

Setting Up a Landing Zone
Setting Up a Landing ZoneSetting Up a Landing Zone
Setting Up a Landing Zone
 
[Azure Governance] Lesson 1 : Azure Naming Convention
[Azure Governance] Lesson 1 : Azure Naming Convention[Azure Governance] Lesson 1 : Azure Naming Convention
[Azure Governance] Lesson 1 : Azure Naming Convention
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
 
Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019
Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019 Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019
Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Building Event-driven Architectures with Amazon EventBridge
Building Event-driven Architectures with Amazon EventBridge Building Event-driven Architectures with Amazon EventBridge
Building Event-driven Architectures with Amazon EventBridge
 
Introduction to DevOps on AWS
Introduction to DevOps on AWSIntroduction to DevOps on AWS
Introduction to DevOps on AWS
 
Encryption and Key Management in AWS
Encryption and Key Management in AWSEncryption and Key Management in AWS
Encryption and Key Management in AWS
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
Well Architected Framework - Data
Well Architected Framework - Data Well Architected Framework - Data
Well Architected Framework - Data
 
AWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
AWS Webinar Series - Cost Optimisation Levers, Tools, and StrategiesAWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
AWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
 
AWS solution Architect Associate study material
AWS solution Architect Associate study materialAWS solution Architect Associate study material
AWS solution Architect Associate study material
 
So you want to be Well-Architected?
So you want to be Well-Architected?So you want to be Well-Architected?
So you want to be Well-Architected?
 
Backup and Archiving in the AWS Cloud
Backup and Archiving in the AWS CloudBackup and Archiving in the AWS Cloud
Backup and Archiving in the AWS Cloud
 
Google Anthos - Azure Stack - AWS Outposts :Comparison
Google Anthos - Azure Stack - AWS Outposts :ComparisonGoogle Anthos - Azure Stack - AWS Outposts :Comparison
Google Anthos - Azure Stack - AWS Outposts :Comparison
 
Getting Started with AWS Lambda Serverless Computing
Getting Started with AWS Lambda Serverless ComputingGetting Started with AWS Lambda Serverless Computing
Getting Started with AWS Lambda Serverless Computing
 
AWS Security Best Practices
AWS Security Best PracticesAWS Security Best Practices
AWS Security Best Practices
 
Reducing the Total Cost of IT Infrastructure with AWS Cloud Economics
Reducing the Total Cost of IT Infrastructure with AWS Cloud EconomicsReducing the Total Cost of IT Infrastructure with AWS Cloud Economics
Reducing the Total Cost of IT Infrastructure with AWS Cloud Economics
 
Living the AWS Well Architected Framework
Living the AWS Well Architected FrameworkLiving the AWS Well Architected Framework
Living the AWS Well Architected Framework
 
AWS AutoScaling
AWS AutoScalingAWS AutoScaling
AWS AutoScaling
 

Semelhante a Deep-Dive-with-Cloud-Monitoring-with-Amazon-EKS-and-Prometheus

AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
 Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트) Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트
AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트
AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트Amazon Web Services Korea
 
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018Amazon Web Services
 
Modern-Application-Design-with-Amazon-ECS
Modern-Application-Design-with-Amazon-ECSModern-Application-Design-with-Amazon-ECS
Modern-Application-Design-with-Amazon-ECSAmazon Web Services
 
Running Kubernetes with Amazon EKS - AWS Online Tech Talks
Running Kubernetes with Amazon EKS - AWS Online Tech TalksRunning Kubernetes with Amazon EKS - AWS Online Tech Talks
Running Kubernetes with Amazon EKS - AWS Online Tech TalksAmazon Web Services
 
Expert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWSExpert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWSAmazon Web Services
 
利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統Amazon Web Services
 
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS SummitIntroduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS SummitAmazon Web Services
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfAmazon Web Services
 
Websites go Serverless - AWS Summit Berlin
Websites go Serverless - AWS Summit BerlinWebsites go Serverless - AWS Summit Berlin
Websites go Serverless - AWS Summit BerlinBoaz Ziniman
 
Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...
Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...
Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...Amazon Web Services
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfAmazon Web Services
 
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018Amazon Web Services
 
CICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfCICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfAmazon Web Services
 
Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)
Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)
Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
Introduction to AWS App Mesh - MAD303 - Atlanta AWS Summit
Introduction to AWS App Mesh - MAD303 - Atlanta AWS SummitIntroduction to AWS App Mesh - MAD303 - Atlanta AWS Summit
Introduction to AWS App Mesh - MAD303 - Atlanta AWS SummitAmazon Web Services
 
Expert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWSExpert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWSAmazon Web Services
 
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS SummitGetting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS SummitAmazon Web Services
 
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...Amazon Web Services
 

Semelhante a Deep-Dive-with-Cloud-Monitoring-with-Amazon-EKS-and-Prometheus (20)

AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
 
Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
 Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트) Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
 
AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트
AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트
AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트
 
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
 
Modern-Application-Design-with-Amazon-ECS
Modern-Application-Design-with-Amazon-ECSModern-Application-Design-with-Amazon-ECS
Modern-Application-Design-with-Amazon-ECS
 
Running Kubernetes with Amazon EKS - AWS Online Tech Talks
Running Kubernetes with Amazon EKS - AWS Online Tech TalksRunning Kubernetes with Amazon EKS - AWS Online Tech Talks
Running Kubernetes with Amazon EKS - AWS Online Tech Talks
 
Expert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWSExpert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWS
 
利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統
 
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS SummitIntroduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Websites go Serverless - AWS Summit Berlin
Websites go Serverless - AWS Summit BerlinWebsites go Serverless - AWS Summit Berlin
Websites go Serverless - AWS Summit Berlin
 
Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...
Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...
Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
 
CICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfCICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdf
 
Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)
Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)
Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)
 
Introduction to AWS App Mesh - MAD303 - Atlanta AWS Summit
Introduction to AWS App Mesh - MAD303 - Atlanta AWS SummitIntroduction to AWS App Mesh - MAD303 - Atlanta AWS Summit
Introduction to AWS App Mesh - MAD303 - Atlanta AWS Summit
 
Expert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWSExpert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWS
 
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS SummitGetting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
 
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 

Deep-Dive-with-Cloud-Monitoring-with-Amazon-EKS-and-Prometheus

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Deep Dive in Cloud Monitoring with Amazon EKS and Prometheus Pahud Hsieh Specialist SA, Serverless Amazon Web Services Kakashi Liu Infra Lead UmboCV
  • 2.
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon EKS in the Past Year ● Started in us-east-1 and us-west-2 ● Released VPC CNI 1.0 ● HIPPA Support ● Released AMI build scripts on Github ● Released VPC CNI 1.1 ● Enabled GPU Support ● Support API Aggregation ● Support HPA ● Support eu-west-1 ● CLI support for writing the kubeconfig ● Support for Admission Controllers
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon EKS in the Past Year ● Released VPC CNI 1.2 ● Allow for additional VPC CIDR ranges ● Support for us-east-2 ● Official support for ALB Ingress ● Container Marketplace ● CloudMap Integration ● Support for AWS App Mesh ● Support for eu-central1, ap-southeast-1, ap-southeast-2, ap- northeast-1 ● Support for ap-northeast-2 ● Added the SLA
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Immediately after that ● Achieved ISO and PCI compliance ● Support for ap-south-1, eu-west-2, eu-west-3 ● Released VPC CNI 1.3 ● Added a new qiuckstart ● Allowed private API Endpoints ● Launched an App Mesh controller at GA ● Public Preview for Windows nodes ● Deep Learning container launch ● Added 1.2 with a new cluster update API ● Released CSI Drivers for FSx and EFS ● Control plane logs ● Public Preview of A1 instances ● Released a Machine Learning Benchmark tool
  • 6.
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T CloudWatch Container Insights(preview)
  • 8. Dimensions for Kubernetes • Clusters • Nodes • Services • Namespaces • Pods
  • 9. Pod Metrics • pod_cpu_reserved_capacity • pod_cpu_utilization • pod_cpu_utilization_over_pod_li mit • pod_memory_reserved_capacity • pod_memory_utilization • pod_memory_utilization_over_p od_limit • pod_network_rx_bytes • pod_network_tx_bytes
  • 10. Other Metrics • cluster_failed_node_count • cluster_node_count • namespace_number_of_runni ng_pods • node_cpu_limit • node_cpu_reserved_capacity • node_cpu_usage_total • node_cpu_utilization • node_filesystem_utilization • node_memory_limit • node_memory_reserved_capa city • node_memory_utilization • node_memory_working_set • node_network_total_bytes • node_number_of_running_containers • node_number_of_running_pods • service_number_of_running_pods Reference - https://amzn.to/2HFtHDt
  • 12. Amazon EKS and Prometheus Prometheus Why Prometheus? Community Number of integrations Ease of use Why not Prometheus? Manage it yourself Complexity in large setups Possibility: Hybrid Approach Use Prometheus to collect metrics that are exposed on /metrics endpoints Send a subset of critical metrics to Amazon CloudWatch or a third party solution.
  • 13. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Hello! I am kakashi - Infra Lead @Umbo CV - Co-organizer @Golang Taipei Gathering
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Traditional Solutions Umbo Light
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Agenda Why monitoring Umbo CV Monitoring pipeline Prometheus: Why and What Prometheus with EKS Use cases
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Why monitoring
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Why monitoring Alerting Long-term trends
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Umbo CV Monitoring pipeline
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Monitoring types Infrastructure Application
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Application monitoring EC2 Metrics Store container container exporter exporter exporter /metrics EC2 /metrics Collect Alert Expose Metrics
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus: Why and What ● Graduates Within CNCF. ● Can handle multi-dimensional metrics. ● Performance: can ingest millions of samples per second. ● Powerful query language: PromQL. ● Built-in alerting tool and service discovery mechanism.
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus metrics EC2 /metrics EC2 /metrics User request http_requests_total{code=200, path="/api/user"} 10 metric_name labels value
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T PromQL example Total requests / second sum(rate(http_requests_total[5m])) Total 5xx requests / second sum(rate(http_requests_total{code=~"5.*"}[5 m])) Current percentage of errors across all instances sum(rate(http_requests_total{code=~"5.*"}[5m])) / sum(rate(http_requests_total[5m]))
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Alerting rule alert: Percentage_Of_Errors_Is_High expr: sum(rate(http_requests_total{code=~"5.*"}[5m])) / sum(rate(http_requests_total[5m])) > 5 for: 5m labels: severity: critical
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus with EKS
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus ❤ EKS ● Monitoring system is critical. ● Running Prometheus on Kubernetes can easily achieve HA. ● Prometheus operator makes it ever easier ○ Automated management and upgrades of Prometheus. ○ Native k8s configuration.
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Install Prometheus on EKS by helm 1. Install Promethues Operator chart 2. Verify $ helm install --name prom --namespace monitoring stable/prometheus-operator $ kubectl --namespace monitoring get pods NAME READY STATUS RESTARTS AGE alertmanager-prom-op-alertmanager-0 2/2 Running 0 1m prometheus-prom-op-prometheus-0 3/3 Running 1 1m prom-op-grafana-5c59ddfb9d-zqfqt 2/2 Running 0 2m prom-op-kube-state-metrics-76786cc9b4-8q4bj 1/1 Running 0 2m prom-op-prometheus-node-exporter-6jclc 1/1 Running 0 2m prom-op-prometheus-node-exporter-bxr49 1/1 Running 0 2m prom-op-prometheus-operato-operator-6cbf5d5cfd-z6fz4 1/1 Running 0 2m
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus Operator CRD ● Prometheus & AlertManager ○ Define Prometheus and AlertManager deployment. ● ServiceMonitor ○ Used to specify how metric of k8s services can be scraped. ● PrometheusRule ○ Can be loaded by a Prometheus instance containing Prometheus alerting and recording rules.
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T EKS cluster monitoring
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T EKS application monitoring through ServiceMonitor apiVersion: monitoring.coreos.com/v1 kind: Servicemonitor metadata: name: api-servicemonitor spec: selector: matchLabels: app: api-server Labels: app: api-server Labels: app: api-server2
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Alerting by PrometheusRule apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule spec: groups: - name: api.rules rules: - alert: Percentage_Of_Errors_Is_High expr: sum(rate(http_requests_total{code=~"5.*"}[5m])) / sum(rate(http_requests_total[5m])) > 5 for: 5m labels: severity: critical
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Dashboard for EKS cluster
  • 35. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Monitoring camera detection pipeline Media Serve r CV Detectio n API Serve r
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Monitoring camera detection pipeline Media Serve r CV Detectio n API Serve r # of frames # cv requests # of events
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Service discovery Media Serve r CV Detectio n API Serve r Scraping through EC2 service discovery
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Service discovery Media Server CV Detection API Server Scraping global: scrape_interval: 1s evaluation_interval: 1s scrape_configs: - job_name: 'node' ec2_sd_configs: - region: eu-east-1 access_key: <ACCESS_KEY_HERE> secret_key: <SECRET_KEY_HERE> port: 9273
  • 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Application metrics Media Serve r CV Detectio n API Serve r ms_frames_total{env="production", service="ms", cameraId="ID-123456"} 1000 # of frames # of cv requests cvreqest_total{env="production", service="cv", cameraId="ID-123456"} 300 # of events event_total{env="production", service="cv", cameraId="ID-123456"} 5 # of frames # of cv request # of events
  • 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Dashboard
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Alerting apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule spec: groups: - name: camera.rules rules: - alert: FpsLow annotations: message: "{{ $labels.cameraid }} fps is lower than 2fps" expr: sum(rate(ms_frames_total{env="production", cameraId=".+"}[10m])) < 2 for: 30mins labels: severity: critical
  • 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.