Clustree production on GKE with 280 pods across 15 nodes

•

3 gostaram•959 visualizações

Clustree runs about 30 microservices on Google Kubernetes Engine (GKE) with ~280 pods across 15 nodes. They use Kubernetes for all stateless applications across environments and some stateful ones. Key aspects of their infrastructure include Docker, Elasticsearch, RabbitMQ, Prometheus for metrics, Fluentd and Logstash for logging to Elasticsearch, and Influxdb with Grafana. They have experienced some issues but find Kubernetes provides great benefits like easy rolling upgrades and declarative infrastructure.

Software

Clustree production on GKE
Romain Vrignaud - VP Engineering - romain@clustree.com

• Full python microservices (~30 / env)
• Elasticsearch
• REST API for synchronous calls
• Rabbitmq for asynchronous calls
Clustree stack

• 12 factor
• Git commit as docker tag
• docker-compose and kubernetes
• develop branch vs master branch
Engineering practices

• ~ 12 people in tech team
• Developers in charge of their app up to production
• Infrastructure team provides:
• Tools
• Guidelines
• Expertise
Organization

• Why GKE ?
• GKE Cluster (15 nodes)
• ~280 pods
• 200GB / 225 GB of memory allocated
• Inside Kubernetes
• All stateless applications for all environments
• All stateful applications for integration environments
• Outside Kubernetes
• staging / production stateful app
• Infrastructure
• Spark
Infrastructure

• Namespaces to isolate environments
• RC everywhere (even for single pods)
• Service discovery
• Secrets
• Volumes
• Jobs
Features used

• 1 heapster with Google Cloud Monitoring sink (not used)
• 1 heapster with an influxdb sink
• telegraf
• prometheus inputs for all nodes
• custom python script to gather cluster wide metrics
• 1 telegraf instance running on each node
• Influxdb 0.10.x and Grafana
Metrics

• 1 fluentd per node to push to Google Cloud Logging (not used)
• 200 MB per node
• 1 logstash per node to push to elasticsearch
• 500 MB per node …
• kubernetes plugin (container name, namespaces, pod, rc, etc..)
• interlaced logs => structured logs !
• OOM pattern detection (ram limits are difficult to find !)
Logging system

• Auto healing cluster
• Pods hooks + nagios + consul + consul-template => failed
• Sentry
• Still need to decide push vs pull monitoring
• pull : prometheus
• push : kapacitor / watcher
• Google Cloud Monitoring ?
• How to monitor kubernetes events ?
Monitoring

• Migration 1.0 -> 1.1 : DNS discovery outage (#18171)
• Loss of 1/3 of node cluster (… yesterday) (#13346)
• Volumes (#14642)
• Memory pressure on nodes
A handful of issues
# references github issues numbers

• private services access outside cluster (#14545)
• No public IP from public Load Balancers
• IAM
• network isolation
• kubectl exec (timeout / TERM) (#12179, #13585)
• node resizing on GKE
A few painful points

• Spawn a new environment in a few minutes (to test a new
feature)
• Super easy rolling-upgrade and rollback
• Fully declarative infrastructure
But a lot of joy !

• Ubernetes
• ScheduledJob
• PetSet
• HPA on custom metrics
• Network policy
The future is bright

• Google-container ML
• Slack
• Github (Read the proposals !)
• GKE support
Great community

• So much to do / discover / learn but really exciting !
• Docker evolutions are way less important for us than kubernetes new
features
• Kubernetes is really a powerful abstraction and enable team autonomy
and velocity
• Still a young project / ecosystem but evolving really quickly
Conclusion

• Logstash kubernetes plugin : https://github.com/vaijab/logstash-filter-kubernetes
• Network policy : http://www.projectcalico.org/a-sneak-peek-at-kubernetes-policy/
• Kubernetes proposals : https://github.com/kubernetes/kubernetes/tree/master/docs/proposals
References

Mais conteúdo relacionado

Mais procurados

Understanding KubernetesTu Pham

Kubernetes Architecture and IntroductionStefan Schimanski

Kubernetes BasicsEueung Mulyana

Kubernetes 101Jacopo Nardiello

Kubernetes in 30 minutes (2017/03/10)lestrrat

Scaling Microservices with KubernetesDeivid Hahn Fração

Kubernetes intro public - kubernetes meetup 4-21-2015Rohit Jnagal

Kubernetes Architecture and Introduction – Paris Kubernetes MeetupStefan Schimanski

Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka Mario Ishara Fernando

Docker Madison, Introduction to KubernetesTimothy St. Clair

Introduction to KubernetesParis Apostolopoulos

Kubernetes IntroductionEric Gustafson

Kubernetes Walk Through from Technical ViewLei (Harry) Zhang

Kubernetes automation in productionPaul Bakker

Kubernetes architectureJanakiram MSV

Kubernetes meetup 101Jakir Patel

Tectonic Summit 2016: Kubernetes 1.5 and BeyondCoreOS

Platform Orchestration with Kubernetes and DockerJulian Strobl

Building Clustered Applications with Kubernetes and DockerSteve Watt

Introduction kubernetes 2017_12_24Sam Zheng

Mais procurados (20)

Understanding Kubernetes

Kubernetes Architecture and Introduction

Kubernetes Basics

Kubernetes 101

Kubernetes in 30 minutes (2017/03/10)

Scaling Microservices with Kubernetes

Kubernetes intro public - kubernetes meetup 4-21-2015

Kubernetes Architecture and Introduction – Paris Kubernetes Meetup

Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka

Docker Madison, Introduction to Kubernetes

Introduction to Kubernetes

Kubernetes Introduction

Kubernetes Walk Through from Technical View

Kubernetes automation in production

Kubernetes architecture

Kubernetes meetup 101

Tectonic Summit 2016: Kubernetes 1.5 and Beyond

Platform Orchestration with Kubernetes and Docker

Building Clustered Applications with Kubernetes and Docker

Introduction kubernetes 2017_12_24

Semelhante a Clustree production on GKE with 280 pods across 15 nodes

Kubernetes Manchester - 6th December 2018David Stockton

Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen

Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Fwdays

Vinetalk: The missing piece for cluster managers to enable accelerator sharingVINEYARD - Versatile Integrated Accelerator-based Heterogeneous Data Centres

SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...SaltStack

Kubernetes for HCL Connections Component Pack - Build or Buy?Martin Schmidt

Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?panagenda

Monitoring kubernetes across data center and cloudDatadog

Load balancing in the SRE wayShawn Zhu

Container orchestration and microservices worldKarol Chrapek

The Kubernetes Operator Pattern - ContainerConf Nov 2017Jakob Karalus

The Operator Pattern - Managing Stateful Services in KubernetesQAware GmbH

John adams talk cloudyJohn Adams

How to Puppetize Google Cloud Platform - PuppetConf 2014Puppet

Docker in Production: How RightScale Delivers Cloud ApplicationsRightScale

Lessons learned with kubernetes in productionat PlayPassPeter Vandenabeele

Cloud native applicationsreallavalamp

Ansible dockerQNIB Solutions

Queick: A Simple Job Queue System for PythonRyota Suenaga

Moby KubeCon 2017Patrick Chanezon

Semelhante a Clustree production on GKE with 280 pods across 15 nodes (20)

Kubernetes Manchester - 6th December 2018

Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)

Sergey Dzyuban "To Build My Own Cloud with Blackjack…"

Vinetalk: The missing piece for cluster managers to enable accelerator sharing

SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...

Kubernetes for HCL Connections Component Pack - Build or Buy?

Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?

Monitoring kubernetes across data center and cloud

Load balancing in the SRE way

Container orchestration and microservices world

The Kubernetes Operator Pattern - ContainerConf Nov 2017

The Operator Pattern - Managing Stateful Services in Kubernetes

John adams talk cloudy

How to Puppetize Google Cloud Platform - PuppetConf 2014

Docker in Production: How RightScale Delivers Cloud Applications

Lessons learned with kubernetes in productionat PlayPass

Cloud native applications

Ansible docker

Queick: A Simple Job Queue System for Python

Moby KubeCon 2017

Último

why an Opensea Clone Script might be your perfect match.pdfjoe51371421

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

chapter--4-software-project-planning.pptkotipi9215

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

EY_Graph Database Powered SustainabilityNeo4j

Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin

Project Based Learning (A.I).pptx detail explanationkaushalgiri8080

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

Professional Resume Template for Software DevelopersVinodh Ram

TECUNIQUE: Success Stories: IT Service providermohitmore19

Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08

Asset Management Software - InfographicHr365.us smith

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

Clustree production on GKE with 280 pods across 15 nodes

1. Clustree production on GKE Romain Vrignaud - VP Engineering - romain@clustree.com

2. We

3. • Full python microservices (~30 / env) • Elasticsearch • REST API for synchronous calls • Rabbitmq for asynchronous calls Clustree stack

5. • 12 factor • Git commit as docker tag • docker-compose and kubernetes • develop branch vs master branch Engineering practices

7. • ~ 12 people in tech team • Developers in charge of their app up to production • Infrastructure team provides: • Tools • Guidelines • Expertise Organization

8. • Why GKE ? • GKE Cluster (15 nodes) • ~280 pods • 200GB / 225 GB of memory allocated • Inside Kubernetes • All stateless applications for all environments • All stateful applications for integration environments • Outside Kubernetes • staging / production stateful app • Infrastructure • Spark Infrastructure

10. • Namespaces to isolate environments • RC everywhere (even for single pods) • Service discovery • Secrets • Volumes • Jobs Features used

11. Production tooling

12. • 1 heapster with Google Cloud Monitoring sink (not used) • 1 heapster with an influxdb sink • telegraf • prometheus inputs for all nodes • custom python script to gather cluster wide metrics • 1 telegraf instance running on each node • Influxdb 0.10.x and Grafana Metrics

13.

14.

15. • 1 fluentd per node to push to Google Cloud Logging (not used) • 200 MB per node • 1 logstash per node to push to elasticsearch • 500 MB per node … • kubernetes plugin (container name, namespaces, pod, rc, etc..) • interlaced logs => structured logs ! • OOM pattern detection (ram limits are difficult to find !) Logging system

16. • Auto healing cluster • Pods hooks + nagios + consul + consul-template => failed • Sentry • Still need to decide push vs pull monitoring • pull : prometheus • push : kapacitor / watcher • Google Cloud Monitoring ? • How to monitor kubernetes events ? Monitoring

17.

18. • Migration 1.0 -> 1.1 : DNS discovery outage (#18171) • Loss of 1/3 of node cluster (… yesterday) (#13346) • Volumes (#14642) • Memory pressure on nodes A handful of issues # references github issues numbers

19. • private services access outside cluster (#14545) • No public IP from public Load Balancers • IAM • network isolation • kubectl exec (timeout / TERM) (#12179, #13585) • node resizing on GKE A few painful points

20. • Spawn a new environment in a few minutes (to test a new feature) • Super easy rolling-upgrade and rollback • Fully declarative infrastructure But a lot of joy !

21. • Ubernetes • ScheduledJob • PetSet • HPA on custom metrics • Network policy The future is bright

22. • Google-container ML • Slack • Github (Read the proposals !) • GKE support Great community

23. • So much to do / discover / learn but really exciting ! • Docker evolutions are way less important for us than kubernetes new features • Kubernetes is really a powerful abstraction and enable team autonomy and velocity • Still a young project / ecosystem but evolving really quickly Conclusion

24. Thank you

25. • Logstash kubernetes plugin : https://github.com/vaijab/logstash-filter-kubernetes • Network policy : http://www.projectcalico.org/a-sneak-peek-at-kubernetes-policy/ • Kubernetes proposals : https://github.com/kubernetes/kubernetes/tree/master/docs/proposals References

Clustree production on GKE with 280 pods across 15 nodes

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Clustree production on GKE with 280 pods across 15 nodes

Semelhante a Clustree production on GKE with 280 pods across 15 nodes (20)

Último

Último (20)

Clustree production on GKE with 280 pods across 15 nodes