More Related Content Similar to Running Cloudbreak on Kubernetes (20) Running Cloudbreak on Kubernetes1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Running Cloudbreak
on Kubernetes
Richard Doktorics
Krisztian Horvath
2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Who we are?
Krisztian Horvath
– Staff Engineer at Hortonworks
– Works on Cloudbreak from the beginning
– @keyki
Richard Doktorics
– Senior Software Engineer
– Works on Cloudbreak from the beginning
– @doktoric
3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
Cloudbreak
Kubernetes
Helm
Cloudbreak Rolling Update
Log collection
Monitoring & Alerting
Questions
4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Cloudbreak is a tool for provisioning Hadoop clusters on cloud infrastructure
Simplified Cluster Provisioning
Automated Cluster Scaling
– AMS (Ambari Metrics System)
– Prometheus based metrics
Highly Extensible
– Recipes for scripting extensions that run before/after cluster provisioning
– Custom cloud images
Multiple platforms are supported
– AWS
– GCP
– Azure
– OpenStack
– BYOS (Bring Your Own Stack)
What is Cloudbreak?
6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Cloudbreak Deployer (CBD)
– Written in Go and Bash (go-basher)
– Compiled into single binary
Micro-service architecture
– Each service runs in a Docker
container
– Each container is replaceable
with custom ones
– Services are handled with
docker-compose
Single node deployment
IMAGE NAMES
traefik:v1.3.8-alpine cbreak_traefik_1
hortonworks/cloudbreak:2.1.0 cbreak_cloudbreak_1
postgres:9.6.1-alpine cbreak_commondb_1
hortonworks/cloudbreak-uaa cbreak_identity_1
hortonworks/hdc-auth:2.1.0 cbreak_sultans_1
hortonworks/cloudbreak-autoscale:2.1.0 cbreak_periscope_1
hortonworks/hdc-web:2.1.0 cbreak_uluwatu_1
gliderlabs/consul-server:0.5 cbreak_consul_1
7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Run Cloudbreak in HA (High Available) mode
– Ability to recover flows in case of node failure
– Avoid master-slave design / leader election problems
Scale Cloudbreak as we desire
– Distribute each cluster related flow
– Cannot run 2 flows for the same cluster at the same time (e.g: 2 upscale flows)
– Flow cancellation must be handled
Scale the Web UI
– Had to introduce a Redis cluster for the session store
Scale every other service as well
Find a tool that makes it easy to deploy these services to multiple nodes
Cloudbreak as a Service that is accessible by everyone and can start clusters anywhere
Our goal was to..
9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Kubernetes is an open-source platform designed to automate deploying, scaling and
operating application containers
Deploy your applications quickly and predictably
Scale your applications on the fly
Roll out new features seamlessly
Limit hardware usage to required resources only
Portable: public, private, hybrid, multi-cloud
Extensible: modular, pluggable, hookable, composable
Self-healing: auto-placement, auto-restart, auto-replication, auto-scaling
What is Kubernetes?
10. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Not because it’s fancy..
Evaluated Kubernetes, Swarm, Mesos, Rancher
Open source / Active community with hands-on experience
Many cloud providers already supports it
Lots of tooling behind it / API / CLI / Helm / Ansible / Salt
Integration with most of the cloud providers
– Provision Load Balancer (GCP, AWS, Azure)
– Use object stores to share data (Ceph, S3, GCP bucket, Azure Storage Account)
– Dynamic volume provisioning / Persistent disk (EBS, Azure Blob)
Why Kubernetes?
11. 11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Running Kubernetes on Azure
az aks create --resource-group k8srg --name k8s --agent-count 5 --agent-osdisk-size 100 --agent-vm-size Standard_D12_v2
--service-principal sp --client-secret cs --dns-name-prefix k8s --location westus --ssh-key-value ~/.ssh/id_rsa.pub
12. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
ACS (Azure Container Service)
– Can run Kubernetes, Swarm, DC/OS
AKS (Managed Kubernetes)
– No master VMs (at least on your side)
– Multiple agent pools with different VM types
– Scale the agent pools independently
– Automatic upgrades
ACI (Azure Container Instances)
– No VMs to provision
– “Endless” resource pool
– Pay by seconds
– Can act “as a node” in the Kubernetes cluster
ACS / AKS / ACI
13. 13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Pod
– Group of one or more containers with shared storage/network
– Always co-located and co-scheduled and run in a shared context
Deployment
– Provides declarative updates for Pods
StatefulSet
– Manages the deployment and scaling of a set of Pods
and provides guarantees about the ordering
and uniqueness of these Pods
– Has a persistent identifier that it maintains across
any rescheduling
Service
– Abstraction which defines a logical set of Pods and a policy by which to access them
Declared in yml files
Kubernetes resources
14. 14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deployment and Service example
Deployment Service (cloudbreak.default.svc.cluster.local)
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cloudbreak
spec:
replicas: 5
selector:
matchLabels:
app: cloudbreak
template:
metadata:
labels:
app: cloudbreak
spec:
containers:
- name: cloudbreak
image: hortonworks/cloudbreak:2.1.0
ports:
- containerPort: 8080
name: http-port
- containerPort: 20105
name: jmx-port
apiVersion: v1
kind: Service
metadata:
name: cloudbreak
annotations:
prometheus.io/scrape: true
prometheus.io/path: "/”
prometheus.io/port: 20105
spec:
selector:
app: cloudbreak
ports:
- name: http
protocol: TCP
port: 8080
- name: jmx
protocol: TCP
port: 20105
16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
No real competitor
Helps you manage Kubernetes applications
Officially approved by community
Official Charts
Rolling upgrade
Helm is the client, Tiller is the server
Tiller is a Kubernetes pod
Why Helm?
17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Running Helm on Kubernetes
Helm package ~= Chart
– Define
– Install
– Upgrade
Chart
– values.yml: stores variables for the template files templates directory
– Chart.yml: describes the chart, as in it’s name, description and version
– kubernetes templates.yml: Go template support
Separated Charts for every component
– Cloudbreak
– Monitoring
– Analytics
18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deployment and Service example
Deployment Service Deployment template Helm Service template Helm
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cloudbreak
spec:
replicas: 5
selector:
matchLabels:
app: cloudbreak
template:
metadata:
labels:
app: cloudbreak
spec:
containers:
- name: cloudbreak
image:
hortonworks/cloudbreak:2.1.0
ports:
- containerPort: 8080
name: http-port
- containerPort: 20105
name: jmx-port
apiVersion: v1
kind: Service
metadata:
name: cloudbreak
annotations:
prometheus.io/scrape: true
prometheus.io/path: "/”
prometheus.io/port: 20105
spec:
selector:
app: cloudbreak
ports:
- name: http
protocol: TCP
port: 8080
- name: jmx
protocol: TCP
port: 20105
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: {{ .Release.Name }}-cloudbreak
spec:
replicas: {{ .Values.replicas }}
selector:
matchLabels:
app: cloudbreak
release: {{ .Release.Name }}
template:
metadata:
labels:
app: cloudbreak
release: {{ .Release.Name }}
spec:
containers:
- name: cloudbreak
image: {{ .Values.cbImage }}
ports:
- containerPort: 8080
name: http-port
- containerPort: 20105
name: jmx-port
apiVersion: v1
kind: Service
metadata:
name: {{ .Release.Name }}-cloudbreak
annotations:
prometheus.io/scrape: true
prometheus.io/path: "/”
prometheus.io/port: 20105
spec:
selector:
app: cloudbreak
release: {{ .Release.Name }}
ports:
- name: http
protocol: TCP
port: 8080
- name: jmx
protocol: TCP
port: 20105
20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Rolling Update
The goal is to have zero downtime update
Ability to roll back in case something goes wrong
Rolling Update strategy with Readiness Probe
Canary releasing
Prepare for running 2 versions of the application at the same time
Strategy Readiness Probe
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
readinessProbe:
httpGet:
path: /cb/info
port: 8080
initialDelaySeconds: 90
failureThreshold: 5
21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Canary releasing
Run a new version of the application along with the stable one and route
some of the users to this version
Run your tests against the new version and once you are happy with the results shut
down the old version
Maintain backward compatibility or you’ll break the update
Hard to change the database
schema
23. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Logging and Monitoring
24. 24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Logging
Logspout
– Collecting the logs from Docker socket
Logstash
– Redirecting logs to file outputs
Azure File Share
– Storing the Log files in Samba share
LogSearch
– Owned by Hortonworks
– Using Solr under the hood
26. 26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Monitoring
Prometheus
– Java metrics (Custom metrics)
– Provider per cluster
– REST status codes
– Response times
– Active flows per node
– Go metrics
– Consul metrics
– Linux/ Host metrics
– NodeJS metrics
27. 27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Alerting
ALERT successful_stack_creation_aws
IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.successful.aws"}[5m])) > 0
ANNOTATIONS {
status="INFO”,
description="A new stack has been created on AWS.”
}
ALERT stack_creation_failed_aws
IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.failed.aws"}[5m])) > 0
ANNOTATIONS {
status="WARN”,
description="Failed to create a stack on AWS.”
}
ALERT node_down
IF up{job='node_exporter'} == 0
FOR 5m
ANNOTATIONS {
status="ERROR”,
description = "Node {{ $labels.instance }} is down for more than 15 minutes”,
}
29. 29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Thank you!
Instagram (@hortonworks.hungary)