SlideShare a Scribd company logo
1 of 29
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Running Cloudbreak
on Kubernetes
Richard Doktorics
Krisztian Horvath
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Who we are?
 Krisztian Horvath
– Staff Engineer at Hortonworks
– Works on Cloudbreak from the beginning
– @keyki
 Richard Doktorics
– Senior Software Engineer
– Works on Cloudbreak from the beginning
– @doktoric
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
 Cloudbreak
 Kubernetes
 Helm
 Cloudbreak Rolling Update
 Log collection
 Monitoring & Alerting
 Questions
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Cloudbreak is a tool for provisioning Hadoop clusters on cloud infrastructure
 Simplified Cluster Provisioning
 Automated Cluster Scaling
– AMS (Ambari Metrics System)
– Prometheus based metrics
 Highly Extensible
– Recipes for scripting extensions that run before/after cluster provisioning
– Custom cloud images
 Multiple platforms are supported
– AWS
– GCP
– Azure
– OpenStack
– BYOS (Bring Your Own Stack)
What is Cloudbreak?
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 Cloudbreak Deployer (CBD)
– Written in Go and Bash (go-basher)
– Compiled into single binary
 Micro-service architecture
– Each service runs in a Docker
container
– Each container is replaceable
with custom ones
– Services are handled with
docker-compose
Single node deployment
IMAGE NAMES
traefik:v1.3.8-alpine cbreak_traefik_1
hortonworks/cloudbreak:2.1.0 cbreak_cloudbreak_1
postgres:9.6.1-alpine cbreak_commondb_1
hortonworks/cloudbreak-uaa cbreak_identity_1
hortonworks/hdc-auth:2.1.0 cbreak_sultans_1
hortonworks/cloudbreak-autoscale:2.1.0 cbreak_periscope_1
hortonworks/hdc-web:2.1.0 cbreak_uluwatu_1
gliderlabs/consul-server:0.5 cbreak_consul_1
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 Run Cloudbreak in HA (High Available) mode
– Ability to recover flows in case of node failure
– Avoid master-slave design / leader election problems
 Scale Cloudbreak as we desire
– Distribute each cluster related flow
– Cannot run 2 flows for the same cluster at the same time (e.g: 2 upscale flows)
– Flow cancellation must be handled
 Scale the Web UI
– Had to introduce a Redis cluster for the session store
 Scale every other service as well
 Find a tool that makes it easy to deploy these services to multiple nodes
 Cloudbreak as a Service that is accessible by everyone and can start clusters anywhere
Our goal was to..
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Kubernetes
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Kubernetes is an open-source platform designed to automate deploying, scaling and
operating application containers
 Deploy your applications quickly and predictably
 Scale your applications on the fly
 Roll out new features seamlessly
 Limit hardware usage to required resources only
 Portable: public, private, hybrid, multi-cloud
 Extensible: modular, pluggable, hookable, composable
 Self-healing: auto-placement, auto-restart, auto-replication, auto-scaling
What is Kubernetes?
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 Not because it’s fancy..
 Evaluated Kubernetes, Swarm, Mesos, Rancher
 Open source / Active community with hands-on experience
 Many cloud providers already supports it
 Lots of tooling behind it / API / CLI / Helm / Ansible / Salt
 Integration with most of the cloud providers
– Provision Load Balancer (GCP, AWS, Azure)
– Use object stores to share data (Ceph, S3, GCP bucket, Azure Storage Account)
– Dynamic volume provisioning / Persistent disk (EBS, Azure Blob)
Why Kubernetes?
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Running Kubernetes on Azure
 az aks create --resource-group k8srg --name k8s --agent-count 5 --agent-osdisk-size 100 --agent-vm-size Standard_D12_v2
--service-principal sp --client-secret cs --dns-name-prefix k8s --location westus --ssh-key-value ~/.ssh/id_rsa.pub
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 ACS (Azure Container Service)
– Can run Kubernetes, Swarm, DC/OS
 AKS (Managed Kubernetes)
– No master VMs (at least on your side)
– Multiple agent pools with different VM types
– Scale the agent pools independently
– Automatic upgrades
 ACI (Azure Container Instances)
– No VMs to provision
– “Endless” resource pool
– Pay by seconds
– Can act “as a node” in the Kubernetes cluster
ACS / AKS / ACI
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 Pod
– Group of one or more containers with shared storage/network
– Always co-located and co-scheduled and run in a shared context
 Deployment
– Provides declarative updates for Pods
 StatefulSet
– Manages the deployment and scaling of a set of Pods
and provides guarantees about the ordering
and uniqueness of these Pods
– Has a persistent identifier that it maintains across
any rescheduling
 Service
– Abstraction which defines a logical set of Pods and a policy by which to access them
 Declared in yml files
Kubernetes resources
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deployment and Service example
Deployment Service (cloudbreak.default.svc.cluster.local)
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cloudbreak
spec:
replicas: 5
selector:
matchLabels:
app: cloudbreak
template:
metadata:
labels:
app: cloudbreak
spec:
containers:
- name: cloudbreak
image: hortonworks/cloudbreak:2.1.0
ports:
- containerPort: 8080
name: http-port
- containerPort: 20105
name: jmx-port
apiVersion: v1
kind: Service
metadata:
name: cloudbreak
annotations:
prometheus.io/scrape: true
prometheus.io/path: "/”
prometheus.io/port: 20105
spec:
selector:
app: cloudbreak
ports:
- name: http
protocol: TCP
port: 8080
- name: jmx
protocol: TCP
port: 20105
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Helm
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 No real competitor
 Helps you manage Kubernetes applications
 Officially approved by community
 Official Charts
 Rolling upgrade
 Helm is the client, Tiller is the server
 Tiller is a Kubernetes pod
Why Helm?
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Running Helm on Kubernetes
 Helm package ~= Chart
– Define
– Install
– Upgrade
 Chart
– values.yml: stores variables for the template files templates directory
– Chart.yml: describes the chart, as in it’s name, description and version
– kubernetes templates.yml: Go template support
 Separated Charts for every component
– Cloudbreak
– Monitoring
– Analytics
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deployment and Service example
Deployment Service Deployment template Helm Service template Helm
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cloudbreak
spec:
replicas: 5
selector:
matchLabels:
app: cloudbreak
template:
metadata:
labels:
app: cloudbreak
spec:
containers:
- name: cloudbreak
image:
hortonworks/cloudbreak:2.1.0
ports:
- containerPort: 8080
name: http-port
- containerPort: 20105
name: jmx-port
apiVersion: v1
kind: Service
metadata:
name: cloudbreak
annotations:
prometheus.io/scrape: true
prometheus.io/path: "/”
prometheus.io/port: 20105
spec:
selector:
app: cloudbreak
ports:
- name: http
protocol: TCP
port: 8080
- name: jmx
protocol: TCP
port: 20105
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: {{ .Release.Name }}-cloudbreak
spec:
replicas: {{ .Values.replicas }}
selector:
matchLabels:
app: cloudbreak
release: {{ .Release.Name }}
template:
metadata:
labels:
app: cloudbreak
release: {{ .Release.Name }}
spec:
containers:
- name: cloudbreak
image: {{ .Values.cbImage }}
ports:
- containerPort: 8080
name: http-port
- containerPort: 20105
name: jmx-port
apiVersion: v1
kind: Service
metadata:
name: {{ .Release.Name }}-cloudbreak
annotations:
prometheus.io/scrape: true
prometheus.io/path: "/”
prometheus.io/port: 20105
spec:
selector:
app: cloudbreak
release: {{ .Release.Name }}
ports:
- name: http
protocol: TCP
port: 8080
- name: jmx
protocol: TCP
port: 20105
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Rolling Update
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Rolling Update
 The goal is to have zero downtime update
 Ability to roll back in case something goes wrong
 Rolling Update strategy with Readiness Probe
 Canary releasing
 Prepare for running 2 versions of the application at the same time
Strategy Readiness Probe
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
readinessProbe:
httpGet:
path: /cb/info
port: 8080
initialDelaySeconds: 90
failureThreshold: 5
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Canary releasing
 Run a new version of the application along with the stable one and route
some of the users to this version
 Run your tests against the new version and once you are happy with the results shut
down the old version
 Maintain backward compatibility or you’ll break the update
 Hard to change the database
schema
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Canary releasing
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Logging and Monitoring
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Logging
 Logspout
– Collecting the logs from Docker socket
 Logstash
– Redirecting logs to file outputs
 Azure File Share
– Storing the Log files in Samba share
 LogSearch
– Owned by Hortonworks
– Using Solr under the hood
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Monitoring
 Prometheus
– Java metrics (Custom metrics)
– Provider per cluster
– REST status codes
– Response times
– Active flows per node
– Go metrics
– Consul metrics
– Linux/ Host metrics
– NodeJS metrics
27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Alerting
ALERT successful_stack_creation_aws
IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.successful.aws"}[5m])) > 0
ANNOTATIONS {
status="INFO”,
description="A new stack has been created on AWS.”
}
ALERT stack_creation_failed_aws
IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.failed.aws"}[5m])) > 0
ANNOTATIONS {
status="WARN”,
description="Failed to create a stack on AWS.”
}
ALERT node_down
IF up{job='node_exporter'} == 0
FOR 5m
ANNOTATIONS {
status="ERROR”,
description = "Node {{ $labels.instance }} is down for more than 15 minutes”,
}
28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Questions?
29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Thank you!
Instagram (@hortonworks.hungary)

More Related Content

What's hot

HEPiX2015_a2_RACF_azaytsev_Ceph_v4_mod1
HEPiX2015_a2_RACF_azaytsev_Ceph_v4_mod1HEPiX2015_a2_RACF_azaytsev_Ceph_v4_mod1
HEPiX2015_a2_RACF_azaytsev_Ceph_v4_mod1
Alexander Zaytsev
 
Migration Station at SAS - DevOps for Fusion with Version Control and Continu...
Migration Station at SAS - DevOps for Fusion with Version Control and Continu...Migration Station at SAS - DevOps for Fusion with Version Control and Continu...
Migration Station at SAS - DevOps for Fusion with Version Control and Continu...
Lucidworks
 
Handling Redis failover with ZooKeeper
Handling Redis failover with ZooKeeperHandling Redis failover with ZooKeeper
Handling Redis failover with ZooKeeper
ryanlecompte
 
Divide and conquer: resource segregation in the OpenStack cloud
Divide and conquer: resource segregation in the OpenStack cloudDivide and conquer: resource segregation in the OpenStack cloud
Divide and conquer: resource segregation in the OpenStack cloud
Stephen Gordon
 

What's hot (20)

Highly Available And Distributed Containers - ContainerCon NA 2016
Highly Available And Distributed Containers - ContainerCon NA 2016Highly Available And Distributed Containers - ContainerCon NA 2016
Highly Available And Distributed Containers - ContainerCon NA 2016
 
Terraform
TerraformTerraform
Terraform
 
HEPiX2015_a2_RACF_azaytsev_Ceph_v4_mod1
HEPiX2015_a2_RACF_azaytsev_Ceph_v4_mod1HEPiX2015_a2_RACF_azaytsev_Ceph_v4_mod1
HEPiX2015_a2_RACF_azaytsev_Ceph_v4_mod1
 
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
 
Terraform day1
Terraform day1Terraform day1
Terraform day1
 
Eron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on MesosEron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on Mesos
 
Terraform
TerraformTerraform
Terraform
 
Ansible + Hadoop
Ansible + HadoopAnsible + Hadoop
Ansible + Hadoop
 
John Spray - Ceph in Kubernetes
John Spray - Ceph in KubernetesJohn Spray - Ceph in Kubernetes
John Spray - Ceph in Kubernetes
 
Migration Station at SAS - DevOps for Fusion with Version Control and Continu...
Migration Station at SAS - DevOps for Fusion with Version Control and Continu...Migration Station at SAS - DevOps for Fusion with Version Control and Continu...
Migration Station at SAS - DevOps for Fusion with Version Control and Continu...
 
Terraform
TerraformTerraform
Terraform
 
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitterApache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Handling Redis failover with ZooKeeper
Handling Redis failover with ZooKeeperHandling Redis failover with ZooKeeper
Handling Redis failover with ZooKeeper
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CI
 
Terraform
TerraformTerraform
Terraform
 
Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as Code
 
Divide and conquer: resource segregation in the OpenStack cloud
Divide and conquer: resource segregation in the OpenStack cloudDivide and conquer: resource segregation in the OpenStack cloud
Divide and conquer: resource segregation in the OpenStack cloud
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
 
Terraform Concepts
Terraform ConceptsTerraform Concepts
Terraform Concepts
 

Similar to Running Cloudbreak on Kubernetes

Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
QAware GmbH
 
Kubernetes @ meetic
Kubernetes @ meeticKubernetes @ meetic
Kubernetes @ meetic
Sébastien Le Gall
 

Similar to Running Cloudbreak on Kubernetes (20)

Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
 
DevOps in Age of Kubernetes
DevOps in Age of KubernetesDevOps in Age of Kubernetes
DevOps in Age of Kubernetes
 
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
 
Beyond static configuration
Beyond static configurationBeyond static configuration
Beyond static configuration
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
 
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018
 
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...
 
Kubernetes @ meetic
Kubernetes @ meeticKubernetes @ meetic
Kubernetes @ meetic
 
DCOS Presentation
DCOS PresentationDCOS Presentation
DCOS Presentation
 
Rancher Rodeo 13 mai 2022
Rancher Rodeo 13 mai 2022Rancher Rodeo 13 mai 2022
Rancher Rodeo 13 mai 2022
 
Easy Docker Deployments with Mesosphere DCOS on Azure
Easy Docker Deployments with Mesosphere DCOS on AzureEasy Docker Deployments with Mesosphere DCOS on Azure
Easy Docker Deployments with Mesosphere DCOS on Azure
 
Episode 2: Deploying Kubernetes at Scale
Episode 2: Deploying Kubernetes at ScaleEpisode 2: Deploying Kubernetes at Scale
Episode 2: Deploying Kubernetes at Scale
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS
 
Containers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatContainers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red Hat
 
DevOps and BigData Analytics
DevOps and BigData Analytics DevOps and BigData Analytics
DevOps and BigData Analytics
 
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarCloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumar
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Running Cloudbreak on Kubernetes

  • 1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Running Cloudbreak on Kubernetes Richard Doktorics Krisztian Horvath
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Who we are?  Krisztian Horvath – Staff Engineer at Hortonworks – Works on Cloudbreak from the beginning – @keyki  Richard Doktorics – Senior Software Engineer – Works on Cloudbreak from the beginning – @doktoric
  • 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda  Cloudbreak  Kubernetes  Helm  Cloudbreak Rolling Update  Log collection  Monitoring & Alerting  Questions
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Cloudbreak is a tool for provisioning Hadoop clusters on cloud infrastructure  Simplified Cluster Provisioning  Automated Cluster Scaling – AMS (Ambari Metrics System) – Prometheus based metrics  Highly Extensible – Recipes for scripting extensions that run before/after cluster provisioning – Custom cloud images  Multiple platforms are supported – AWS – GCP – Azure – OpenStack – BYOS (Bring Your Own Stack) What is Cloudbreak?
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  Cloudbreak Deployer (CBD) – Written in Go and Bash (go-basher) – Compiled into single binary  Micro-service architecture – Each service runs in a Docker container – Each container is replaceable with custom ones – Services are handled with docker-compose Single node deployment IMAGE NAMES traefik:v1.3.8-alpine cbreak_traefik_1 hortonworks/cloudbreak:2.1.0 cbreak_cloudbreak_1 postgres:9.6.1-alpine cbreak_commondb_1 hortonworks/cloudbreak-uaa cbreak_identity_1 hortonworks/hdc-auth:2.1.0 cbreak_sultans_1 hortonworks/cloudbreak-autoscale:2.1.0 cbreak_periscope_1 hortonworks/hdc-web:2.1.0 cbreak_uluwatu_1 gliderlabs/consul-server:0.5 cbreak_consul_1
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  Run Cloudbreak in HA (High Available) mode – Ability to recover flows in case of node failure – Avoid master-slave design / leader election problems  Scale Cloudbreak as we desire – Distribute each cluster related flow – Cannot run 2 flows for the same cluster at the same time (e.g: 2 upscale flows) – Flow cancellation must be handled  Scale the Web UI – Had to introduce a Redis cluster for the session store  Scale every other service as well  Find a tool that makes it easy to deploy these services to multiple nodes  Cloudbreak as a Service that is accessible by everyone and can start clusters anywhere Our goal was to..
  • 8. 8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Kubernetes
  • 9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Kubernetes is an open-source platform designed to automate deploying, scaling and operating application containers  Deploy your applications quickly and predictably  Scale your applications on the fly  Roll out new features seamlessly  Limit hardware usage to required resources only  Portable: public, private, hybrid, multi-cloud  Extensible: modular, pluggable, hookable, composable  Self-healing: auto-placement, auto-restart, auto-replication, auto-scaling What is Kubernetes?
  • 10. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  Not because it’s fancy..  Evaluated Kubernetes, Swarm, Mesos, Rancher  Open source / Active community with hands-on experience  Many cloud providers already supports it  Lots of tooling behind it / API / CLI / Helm / Ansible / Salt  Integration with most of the cloud providers – Provision Load Balancer (GCP, AWS, Azure) – Use object stores to share data (Ceph, S3, GCP bucket, Azure Storage Account) – Dynamic volume provisioning / Persistent disk (EBS, Azure Blob) Why Kubernetes?
  • 11. 11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Running Kubernetes on Azure  az aks create --resource-group k8srg --name k8s --agent-count 5 --agent-osdisk-size 100 --agent-vm-size Standard_D12_v2 --service-principal sp --client-secret cs --dns-name-prefix k8s --location westus --ssh-key-value ~/.ssh/id_rsa.pub
  • 12. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  ACS (Azure Container Service) – Can run Kubernetes, Swarm, DC/OS  AKS (Managed Kubernetes) – No master VMs (at least on your side) – Multiple agent pools with different VM types – Scale the agent pools independently – Automatic upgrades  ACI (Azure Container Instances) – No VMs to provision – “Endless” resource pool – Pay by seconds – Can act “as a node” in the Kubernetes cluster ACS / AKS / ACI
  • 13. 13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  Pod – Group of one or more containers with shared storage/network – Always co-located and co-scheduled and run in a shared context  Deployment – Provides declarative updates for Pods  StatefulSet – Manages the deployment and scaling of a set of Pods and provides guarantees about the ordering and uniqueness of these Pods – Has a persistent identifier that it maintains across any rescheduling  Service – Abstraction which defines a logical set of Pods and a policy by which to access them  Declared in yml files Kubernetes resources
  • 14. 14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Deployment and Service example Deployment Service (cloudbreak.default.svc.cluster.local) apiVersion: extensions/v1beta1 kind: Deployment metadata: name: cloudbreak spec: replicas: 5 selector: matchLabels: app: cloudbreak template: metadata: labels: app: cloudbreak spec: containers: - name: cloudbreak image: hortonworks/cloudbreak:2.1.0 ports: - containerPort: 8080 name: http-port - containerPort: 20105 name: jmx-port apiVersion: v1 kind: Service metadata: name: cloudbreak annotations: prometheus.io/scrape: true prometheus.io/path: "/” prometheus.io/port: 20105 spec: selector: app: cloudbreak ports: - name: http protocol: TCP port: 8080 - name: jmx protocol: TCP port: 20105
  • 15. 15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Helm
  • 16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  No real competitor  Helps you manage Kubernetes applications  Officially approved by community  Official Charts  Rolling upgrade  Helm is the client, Tiller is the server  Tiller is a Kubernetes pod Why Helm?
  • 17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Running Helm on Kubernetes  Helm package ~= Chart – Define – Install – Upgrade  Chart – values.yml: stores variables for the template files templates directory – Chart.yml: describes the chart, as in it’s name, description and version – kubernetes templates.yml: Go template support  Separated Charts for every component – Cloudbreak – Monitoring – Analytics
  • 18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Deployment and Service example Deployment Service Deployment template Helm Service template Helm apiVersion: extensions/v1beta1 kind: Deployment metadata: name: cloudbreak spec: replicas: 5 selector: matchLabels: app: cloudbreak template: metadata: labels: app: cloudbreak spec: containers: - name: cloudbreak image: hortonworks/cloudbreak:2.1.0 ports: - containerPort: 8080 name: http-port - containerPort: 20105 name: jmx-port apiVersion: v1 kind: Service metadata: name: cloudbreak annotations: prometheus.io/scrape: true prometheus.io/path: "/” prometheus.io/port: 20105 spec: selector: app: cloudbreak ports: - name: http protocol: TCP port: 8080 - name: jmx protocol: TCP port: 20105 apiVersion: extensions/v1beta1 kind: Deployment metadata: name: {{ .Release.Name }}-cloudbreak spec: replicas: {{ .Values.replicas }} selector: matchLabels: app: cloudbreak release: {{ .Release.Name }} template: metadata: labels: app: cloudbreak release: {{ .Release.Name }} spec: containers: - name: cloudbreak image: {{ .Values.cbImage }} ports: - containerPort: 8080 name: http-port - containerPort: 20105 name: jmx-port apiVersion: v1 kind: Service metadata: name: {{ .Release.Name }}-cloudbreak annotations: prometheus.io/scrape: true prometheus.io/path: "/” prometheus.io/port: 20105 spec: selector: app: cloudbreak release: {{ .Release.Name }} ports: - name: http protocol: TCP port: 8080 - name: jmx protocol: TCP port: 20105
  • 19. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Rolling Update
  • 20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Rolling Update  The goal is to have zero downtime update  Ability to roll back in case something goes wrong  Rolling Update strategy with Readiness Probe  Canary releasing  Prepare for running 2 versions of the application at the same time Strategy Readiness Probe strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 readinessProbe: httpGet: path: /cb/info port: 8080 initialDelaySeconds: 90 failureThreshold: 5
  • 21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Canary releasing  Run a new version of the application along with the stable one and route some of the users to this version  Run your tests against the new version and once you are happy with the results shut down the old version  Maintain backward compatibility or you’ll break the update  Hard to change the database schema
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Canary releasing
  • 23. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Logging and Monitoring
  • 24. 24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Logging  Logspout – Collecting the logs from Docker socket  Logstash – Redirecting logs to file outputs  Azure File Share – Storing the Log files in Samba share  LogSearch – Owned by Hortonworks – Using Solr under the hood
  • 25. 25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
  • 26. 26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Monitoring  Prometheus – Java metrics (Custom metrics) – Provider per cluster – REST status codes – Response times – Active flows per node – Go metrics – Consul metrics – Linux/ Host metrics – NodeJS metrics
  • 27. 27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Alerting ALERT successful_stack_creation_aws IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.successful.aws"}[5m])) > 0 ANNOTATIONS { status="INFO”, description="A new stack has been created on AWS.” } ALERT stack_creation_failed_aws IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.failed.aws"}[5m])) > 0 ANNOTATIONS { status="WARN”, description="Failed to create a stack on AWS.” } ALERT node_down IF up{job='node_exporter'} == 0 FOR 5m ANNOTATIONS { status="ERROR”, description = "Node {{ $labels.instance }} is down for more than 15 minutes”, }
  • 28. 28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Questions?
  • 29. 29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Thank you! Instagram (@hortonworks.hungary)