SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
PGConf.ASIA 2019
Indonesia, Bali
PostgreSQL on K8S at
Zalando: two+ years in
production
ALEXANDER KUKUSHKIN
10-09-2019
2
ABOUT ME
Alexander Kukushkin
Database Engineer @ZalandoTech
The Patroni guy
alexander.kukushkin@zalando.de
Twitter: @cyberdemn
3
Typical problems and horror stories
Brief introduction to Kubernetes
Spilo & Patroni
Postgres-Operator
AGENDA
4
WE BRING FASHION TO PEOPLE IN 17 COUNTRIES
17 markets
7 fulfillment centers
26.4 million active customers
5.4 billion € net sales 2018
250 million visits per month
15,000 employees in Europe
5
PostgreSQL at Zalando
> 300
In the data centers
> 160
Databases on AWS
Managed by DB team
> 1000
Databases in other
Kubernetes clusters
> 190
Run in the ACID’s
Kubernetes cluster
6
What is Kubernetes?
● A set of open-source components running on one or more servers
● A container orchestration system
● An abstraction layer over your real or virtualized hardware
● An “infrastructure as a code” system
● Automatic resource allocation
● Next step after Ansible/Chef/Puppet
7
What Kubernetes is not?
● An operating system
● A magical way to make your infrastructure scalable
● An excuse to fire your devops (someone has to
configure and manage it)
● A good solution for running 2-3 servers
8
Terminology
Traditional infrastructure
● Physical server
● Virtual machine
● Individual application
● NAS/SAN
● Load balancer
● Application registry/hardware information
● Password files, certificates
Kubernetes
● Node
● Pod
● Container (typically Docker)
● Persistent Volumes
● Service/Endpoint
● Labels
● Secrets
9
Kubernetes overview
10
Stateful applications on Kubernetes
● PersistentVolumes
○ Abstracts details how storage is provisioned
○ Supports many different storage types via plugins:
■ EBS, AzureDisk, iSCSI, NFS, CEPH, Glusterfs and so on
● StatefulSets
○ Guarantied number of Pods with stable (and unique) identifiers
○ Ordered deployment and scaling
○ Connecting Pods with corresponding persistent storage
(PersistentVolume+PersistentVolumeClaim)
11
● All supported versions of PostgreSQL inside the single image
● Plenty of extensions (pg_partman, pg_cron, postgis, timescaledb, etc)
● Additional tools (pgq, pgbouncer, wal-e/wal-g)
● PGDATA on an external volume
● Patroni for HA
● Environment-variables based configuration
● Lightweight!
Spilo Docker image
12
● Automatic failover solution for PostgreSQL streaming replication
● A python daemon that manages one PostgreSQL instance
● Keeps the cluster state in a DCS (Etcd, Zookeeper, Consul, Kubernetes)
○ Uses DCS for leader elections
■ Makes PostgreSQL 1st class citizen on Kubernetes!
● Helps to automate a lot of things like:
○ A new cluster deployment
○ Scaling out and in
○ PostgreSQL configuration management
What is Patroni
13
Spilo & Patroni on K8S
Node2
Pod: demo-0
role: master
PersistentVolume
PersistentVolume
Node1
StatefulSet: demo
Pod: demo-1
role: replica
Secret: demo
WATCH()
UPDATE()
Service: demo-replica
labelSelector: role=replica
Service: demo
Endpoint: demoS3
14
● A few long YAML manifests to write
● Different parts of PostgreSQL configuration spread over multiple
manifests
● No easy way to work with a cluster as a whole (update, delete)
● Manual generation of DB objects, i.e. users, and their passwords.
Manual deployment to Kubernetes
15
● Encapsulates knowledge of a human operating the service
● Implement a controller application to act on custom resources
● CRD (custom resource definitions) to describe a domain-specific object
(i.e. a Postgres cluster)
https://coreos.com/blog/introducing-operators.html
Kubernetes operator pattern
16
● Defines a custom Postgresql resource
● Watches instances of Postgresql, creates/updates/deletes corresponding
Kubernetes objects
● Allows updating running-cluster resources (memory, cpu, volumes), postgres
configuration
● Creates databases, users and automatically generates passwords
● Auto-repairs, smart rolling updates (switchover to replicas before updating
the master)
Zalando Postgres-Operator
17
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: acid-minimal-cluster
spec:
teamId: "ACID" # is used to provision human users
volume:
size: 1Gi
numberOfInstances: 2
users:
zalando: # database owner
- createrole
- createdb
foo_app_user: # role for application foo
databases: # name->owner
foo: zalando
postgresql:
version: "11"
Postgresql manifest
18
deploycluster
manifest
Stateful set
Spilo pod
Kubernetes cluster
PATRONI
operator
pod
Endpoint
Service
Client
application
operator
config map
Cluster
secrets
DB
deployer
create
create
create
watch
Infrastructure
roles
19
https://www.reddit.com/r/ProgrammerHumor/comments/9fhvyl/writing_yaml/
20
21
Zalando Postgres-Operator features
● Rolling updates on Postgres cluster changes
● Volume resize without Pod restarts
● Cloning Postgres clusters
● Logical Backups to S3 Bucket
● Standby cluster from S3 WAL archive
● Configurable for non-cloud environments
● UI to create and edit Postgres cluster manifests
22
Kubernetes rolling upgrade
● Rotates all worker nodes in the K8s cluster
○ Does it in a rolling matter, one-by-one
○ If you are unlucky, it will cause the number of failover equal number
of pods in your postgres cluster
● How Postgres-Operator deals with it?
○ Detect the to-be-decommissioned node by lack of the ready label
and SchedulingDisabled status
○ Move all master pods:
■ Move replicas to the already updated node
■ Trigger switchover to those replicas
23
Availability Zone 1
Node
cluster: A
primary
cluster: B
primary
cluster: C
replica
Availability Zone 2 Availability Zone 3
Kubernetes rolling upgrade (start)
Node Node
cluster: A
replica
cluster: B
replica
cluster: C
primary
Node Node
cluster: A
replica
cluster: B
replica
cluster: C
replica
Node
Node (to-be-decommissioned) Node (new) Terminated PodActive Pod
24
Availability Zone 1
Node
cluster: A
primary
cluster: B
primary
cluster: C
replica
Availability Zone 2 Availability Zone 3
Kubernetes rolling upgrade (step 1)
Node Node
cluster: A
replica
cluster: B
replica
cluster: C
primary
Node Node
cluster: A
replica
cluster: B
replica
cluster: C
replica
Node
cluster: A
replica
cluster: B
replica
cluster: A
replica
cluster: B
replica
cluster: C
replica
cluster: C
replica
Node (to-be-decommissioned) Node (new) Terminated PodActive Pod
25
Availability Zone 1
Node
cluster: A
primary
cluster: B
primary
Availability Zone 2 Availability Zone 3
Kubernetes rolling upgrade (step 1)
Node Node
cluster: C
primary
Node Node
cluster: A
replica
cluster: B
replica
cluster: A
replica
cluster: B
replica
cluster: C
replica
cluster: C
replica
Node (to-be-decommissioned) Node (new) Terminated PodActive Pod
26
Availability Zone 1
Node
cluster: A
primary
cluster: B
primary
Availability Zone 2 Availability Zone 3
Kubernetes rolling upgrade (switchover)
Node Node
cluster: C
primary
Node Node
cluster: A
replica
cluster: B
replica
cluster: A
replica
cluster: B
replica
cluster: C
replica
cluster: C
replica
Node (to-be-decommissioned) Node (new) Terminated PodActive Pod
27
Availability Zone 1
Node
cluster: A
replica
cluster: B
replica
Availability Zone 2 Availability Zone 3
Kubernetes rolling upgrade (switchover)
Node Node
cluster: C
replica
Node Node
cluster: A
primary
cluster: B
replica
cluster: A
replica
cluster: B
primary
cluster: C
replica
cluster: C
primary
Node (to-be-decommissioned) Node (new) Terminated PodActive Pod
28
Availability Zone 1
Node
Availability Zone 2 Availability Zone 3
Kubernetes rolling upgrade (finish)
Node
cluster: A
replica
cluster: B
replica
Node Node
cluster: C
replica
Node
cluster: A
primary
cluster: B
replica
cluster: A
replica
cluster: B
primary
cluster: C
replica
cluster: C
primary
Node (to-be-decommissioned) Node (new) Terminated PodActive Pod
cluster: A
replica
cluster: B
replica
cluster: C
replica
Most common issues
on K8s
30
Problems with AWS infrastructure
● AWS API Rate Limit Exceeded
○ Prevents or delays attaching/detaching persistent volumes (EBS)
to/from Pods
■ Delays recovery of failed Pods
○ Might delay a deployment of a new cluster
● Sometimes EC2 instances fail and being shutdown by AWS
○ Shutdown might take ages
○ All EBS volumes remain attached until instance is shutted down
■ Pods can’t be rescheduled
31
KUBE2IAM
● AWS IAM + Instance Profile is very flexible, convenient, and powerful
technology
● AWS tools and libraries can use temporary credentials obtained from the
Instance Profile to access other AWS services, for example S3
○ We are using S3 for backups and continuous archiving (wal-e/wal-g)
● Instance Profile is configured per EC2 Instance (K8s Node)!
● kube2iam - provides different IAM roles for Pods running on a single Node
● kube2iam might stop serving the Pod which is being shutdown
○ wal-e is unable to get credentials to access S3
■ Last WAL segments remain unarchived! :(
32
Lack of Disk space
● Single volume for PGDATA, pg_wal and logs
● FATAL,53100,could not write to file
"pg_wal/xlogtemp.22993": No space left on device
○ Usually ends up with postgres being self shutdown
● Patroni tries to recover the primary which isn’t running
○ “start->promote->No space left->shutdown” loop
Disk space MUST be monitored!
33
Disk space problems root cause or why
we don’t implement volume autoextend
● Excessive logging
○ slow queries, human access, application errors, connections/disconnections
● pg_wal growth
○ archive_command is slow/failing
○ Unconsumed changes on the replication slot
■ Replica is not streaming? Replica is slow?
■ Logical replication slot?
○ checkpoints taking too long due to throttled IOPS
● PGDATA growth
○ Table and index bloat!
■ Useless updates of unchanged data?
■ Autovacuum tuning? Zheap?
○ Natural growth of data
■ Lack of retention policies?
■ Broken cleanup jobs?
34
ORM can cause wal-e to fail!
wal_e.main ERROR MSG: Attempted to archive a file that is too
large. HINT: There is a file in the postgres database
directory that is larger than 1610612736 bytes. If no such
file exists, please report this as a bug. In particular,
check pg_stat/pg_stat_statements.stat.tmp, which appears to
be 2010822591 bytes
Meanwhile in pg_stat_statements:
UPDATE foo SET bar = $1 WHERE id IN ($2, $3, $4, …, $10500);
UPDATE foo SET bar = $1 WHERE id IN ($2, $3, $4, …, $100500);
…. and so on
35
Exclusive backup issues
PANIC,XX000,"online backup was canceled, recovery cannot
continue",,,,,"xlog redo at D45/EB000028 for
XLOG/CHECKPOINT_SHUTDOWN: redo D45/EB000028; tli 237; prev
tli 237; fpw true; xid 0:105446371; oid 187558; multi 1;
offset 0; oldest xid 544 in DB 1; oldest multi 1 in DB 1;
oldest/newest commit timestamp xid: 0/0; oldest running xid
0; shutdown",,,,""
● There is no way to join back such failed primary as a replica without
rebuilding (reinitializing) it!
○ wal-g supports non-exclusive backups, but not yet stable enough
36
● Postgres log:
server process (PID 10810) was terminated by signal 9: Killed
● dmesg -T:
[Wed Jul 31 01:35:35 2019] Memory cgroup out of memory: Kill process 14208
(postgres) score 606 or sacrifice child
[Wed Jul 31 01:35:35 2019] Killed process 14208 (postgres) total-vm:2972124kB,
anon-rss:68724kB, file-rss:1304kB, shmem-rss:2691844kB
[Wed Jul 31 01:35:35 2019] oom_reaper: reaped process 14208 (postgres), now
anon-rss:0kB, file-rss:0kB, shmem-rss:2691844kB
Out-Of-Memory Killer
37
Out-Of-Memory Killer
● Pids in the container (10810) and on the host are different (14208)
○ Sometimes it is hard to investigate what happened
● oom_score_adj trick doesn’t really make sense in the container
○ There are only Patroni+PostgreSQL running
● It is not really clear how memory accounting in the container works:
○ memory: usage 8388392kB, limit 8388608kB, failcnt 1
○ cache:2173896KB rss:6019692KB rss_huge:0KB shmem:2173428KB
mapped_file:2173512KB dirty:132KB writeback:0KB swap:0KB
inactive_anon:15732KB active_anon:8177696KB inactive_file:320KB active_file:184KB
unevictable:0KB
38
Kubernetes+Docker
● ERROR: could not resize shared memory
segment "/PostgreSQL.1384046013" to 8388608
bytes: No space left on device
● PostgreSQL 11 (due to the “parallel hash join”)
● Docker limits /dev/shm to 64MB by default
● How to fix?
○ Mount custom dshm tmpfs volume to /dev/shm
■ Or set enableShmVolume: true in the cluster
manifest
39
Problems with PostgreSQL
● Logical decoding on the replica? Failover slots?
○ Patroni does sort of a hack by not allowing connections until
logical slot is created.
■ Consumer might still lose some events.
● “FATAL too many connections”
○ Prevents replica from starting streaming
■ Solved in PostgreSQL 12 (wal_senders not count as
part of max_connection)
○ Built-in connection pooler?
40
Human errors
● YAML formatting :)
● Inadequate resource and limit requests
○ Pod can’t be scheduled due to the node weakness
○ Processes are terminated by oom-killer
● Deleted Postgres-Operator/Spilo ServiceAccount by
employees
41
Conclusion
● Postgres-Operator helps us to manage more than 1000
PostgreSQL clusters distributed in 80+ K8s accounts with
minimal effort.
○ It wouldn’t be possible without high level of
automation
● In the cloud and on K8s you have to be ready to deal with
absolutely new problems and failure scenarios
○ Find the solution and implement it in the operator!
42
● Postgres-operator: https://github.com/zalando/postgres-operator
● Patroni: https://github.com/zalando/patroni
● Spilo: https://github.com/zalando/spilo
Open-source
Thank you!
Questions?

Mais conteúdo relacionado

Mais procurados

Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
Jonathan Katz
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsBeyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
DataWorks Summit
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deployment
hyeongchae lee
 
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike SteenbergenMeet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
distributed matters
 

Mais procurados (20)

A guide of PostgreSQL on Kubernetes
A guide of PostgreSQL on KubernetesA guide of PostgreSQL on Kubernetes
A guide of PostgreSQL on Kubernetes
 
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky HaryadiPGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes
 
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.
 
Multi Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on KubernetesMulti Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on Kubernetes
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsBeyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
 
PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...
PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...
PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
 
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores FinnotoPGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deployment
 
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various cloudsPGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
 
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike SteenbergenMeet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
 
Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cl...
Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cl...Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cl...
Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cl...
 
PGConf.ASIA 2019 Bali - Your Business Continuity Matrix and PostgreSQL's Disa...
PGConf.ASIA 2019 Bali - Your Business Continuity Matrix and PostgreSQL's Disa...PGConf.ASIA 2019 Bali - Your Business Continuity Matrix and PostgreSQL's Disa...
PGConf.ASIA 2019 Bali - Your Business Continuity Matrix and PostgreSQL's Disa...
 
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
 

Semelhante a PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin

Patroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionPatroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companion
Alexander Kukushkin
 

Semelhante a PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin (20)

PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companionPGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
 
Patroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionPatroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companion
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Keynote #Tech - Google : aperçu de la gestion des services distribués chez Go...
Keynote #Tech - Google : aperçu de la gestion des services distribués chez Go...Keynote #Tech - Google : aperçu de la gestion des services distribués chez Go...
Keynote #Tech - Google : aperçu de la gestion des services distribués chez Go...
 
Lessons learned with kubernetes in production at PlayPass
Lessons learned with kubernetes in productionat PlayPassLessons learned with kubernetes in productionat PlayPass
Lessons learned with kubernetes in production at PlayPass
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep Dive[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep Dive
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)
 
App container rkt
App container rktApp container rkt
App container rkt
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
 
Kubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformKubernetes: The Next Research Platform
Kubernetes: The Next Research Platform
 
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
 
Kubernetes Basis: Pods, Deployments, and Services
Kubernetes Basis: Pods, Deployments, and ServicesKubernetes Basis: Pods, Deployments, and Services
Kubernetes Basis: Pods, Deployments, and Services
 
Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
[WSO2Con EU 2018] Deploying Applications in K8S and Docker
[WSO2Con EU 2018] Deploying Applications in K8S and Docker[WSO2Con EU 2018] Deploying Applications in K8S and Docker
[WSO2Con EU 2018] Deploying Applications in K8S and Docker
 

Mais de Equnix Business Solutions

Mais de Equnix Business Solutions (20)

Yang perlu kita ketahui Untuk memahami aspek utama IT dalam bisnis_.pdf
Yang perlu kita ketahui Untuk memahami aspek utama IT dalam bisnis_.pdfYang perlu kita ketahui Untuk memahami aspek utama IT dalam bisnis_.pdf
Yang perlu kita ketahui Untuk memahami aspek utama IT dalam bisnis_.pdf
 
Kebocoran Data_ Tindakan Hacker atau Kriminal_ Bagaimana kita mengantisipasi...
Kebocoran Data_  Tindakan Hacker atau Kriminal_ Bagaimana kita mengantisipasi...Kebocoran Data_  Tindakan Hacker atau Kriminal_ Bagaimana kita mengantisipasi...
Kebocoran Data_ Tindakan Hacker atau Kriminal_ Bagaimana kita mengantisipasi...
 
Kuliah Tamu - Dari Proses Bisnis Menuju Struktur Data.pdf
Kuliah Tamu - Dari Proses Bisnis Menuju Struktur Data.pdfKuliah Tamu - Dari Proses Bisnis Menuju Struktur Data.pdf
Kuliah Tamu - Dari Proses Bisnis Menuju Struktur Data.pdf
 
EWTT22_ Apakah Open Source Cocok digunakan dalam Korporasi_.pdf
EWTT22_ Apakah Open Source Cocok digunakan dalam Korporasi_.pdfEWTT22_ Apakah Open Source Cocok digunakan dalam Korporasi_.pdf
EWTT22_ Apakah Open Source Cocok digunakan dalam Korporasi_.pdf
 
Oracle to PostgreSQL, Challenges to Opportunity.pdf
Oracle to PostgreSQL, Challenges to Opportunity.pdfOracle to PostgreSQL, Challenges to Opportunity.pdf
Oracle to PostgreSQL, Challenges to Opportunity.pdf
 
[EWTT2022] Strategi Implementasi Database dalam Microservice Architecture.pdf
[EWTT2022] Strategi Implementasi Database dalam Microservice Architecture.pdf[EWTT2022] Strategi Implementasi Database dalam Microservice Architecture.pdf
[EWTT2022] Strategi Implementasi Database dalam Microservice Architecture.pdf
 
PostgreSQL as Enterprise Solution v1.1.pdf
PostgreSQL as Enterprise Solution v1.1.pdfPostgreSQL as Enterprise Solution v1.1.pdf
PostgreSQL as Enterprise Solution v1.1.pdf
 
Webinar2021 - Does HA Can Help You Balance Your Load-.pdf
Webinar2021 - Does HA Can Help You Balance Your Load-.pdfWebinar2021 - Does HA Can Help You Balance Your Load-.pdf
Webinar2021 - Does HA Can Help You Balance Your Load-.pdf
 
Webinar2021 - In-Memory Database, is it really faster-.pdf
Webinar2021 - In-Memory Database, is it really faster-.pdfWebinar2021 - In-Memory Database, is it really faster-.pdf
Webinar2021 - In-Memory Database, is it really faster-.pdf
 
EQUNIX - PPT 11DB-Postgres™.pdf
EQUNIX - PPT 11DB-Postgres™.pdfEQUNIX - PPT 11DB-Postgres™.pdf
EQUNIX - PPT 11DB-Postgres™.pdf
 
equpos - General Presentation v20230420.pptx
equpos - General Presentation v20230420.pptxequpos - General Presentation v20230420.pptx
equpos - General Presentation v20230420.pptx
 
Equnix Appliance- Jawaban terbaik untuk kebutuhan komputasi yang mumpuni.pdf
Equnix Appliance- Jawaban terbaik untuk kebutuhan komputasi yang mumpuni.pdfEqunix Appliance- Jawaban terbaik untuk kebutuhan komputasi yang mumpuni.pdf
Equnix Appliance- Jawaban terbaik untuk kebutuhan komputasi yang mumpuni.pdf
 
OSPX - Professional PostgreSQL Certification Scheme v20201111.pdf
OSPX - Professional PostgreSQL Certification Scheme v20201111.pdfOSPX - Professional PostgreSQL Certification Scheme v20201111.pdf
OSPX - Professional PostgreSQL Certification Scheme v20201111.pdf
 
Equnix Company Profile v20230329.pdf
Equnix Company Profile v20230329.pdfEqunix Company Profile v20230329.pdf
Equnix Company Profile v20230329.pdf
 
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki KondoPGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
 
PGConf.ASIA 2019 - PGSpider High Performance Cluster Engine - Shigeo Hirose
PGConf.ASIA 2019 - PGSpider High Performance Cluster Engine - Shigeo HirosePGConf.ASIA 2019 - PGSpider High Performance Cluster Engine - Shigeo Hirose
PGConf.ASIA 2019 - PGSpider High Performance Cluster Engine - Shigeo Hirose
 
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
 
PGConf.ASIA 2019 Bali - Keynote Speech 3 - Kohei KaiGai
PGConf.ASIA 2019 Bali - Keynote Speech 3 - Kohei KaiGaiPGConf.ASIA 2019 Bali - Keynote Speech 3 - Kohei KaiGai
PGConf.ASIA 2019 Bali - Keynote Speech 3 - Kohei KaiGai
 
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoPGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
 
PGConf.ASIA 2019 Bali - Keynote Speech 1 - Bruce Momjian
PGConf.ASIA 2019 Bali - Keynote Speech 1 - Bruce MomjianPGConf.ASIA 2019 Bali - Keynote Speech 1 - Bruce Momjian
PGConf.ASIA 2019 Bali - Keynote Speech 1 - Bruce Momjian
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin

  • 1. PGConf.ASIA 2019 Indonesia, Bali PostgreSQL on K8S at Zalando: two+ years in production ALEXANDER KUKUSHKIN 10-09-2019
  • 2. 2 ABOUT ME Alexander Kukushkin Database Engineer @ZalandoTech The Patroni guy alexander.kukushkin@zalando.de Twitter: @cyberdemn
  • 3. 3 Typical problems and horror stories Brief introduction to Kubernetes Spilo & Patroni Postgres-Operator AGENDA
  • 4. 4 WE BRING FASHION TO PEOPLE IN 17 COUNTRIES 17 markets 7 fulfillment centers 26.4 million active customers 5.4 billion € net sales 2018 250 million visits per month 15,000 employees in Europe
  • 5. 5 PostgreSQL at Zalando > 300 In the data centers > 160 Databases on AWS Managed by DB team > 1000 Databases in other Kubernetes clusters > 190 Run in the ACID’s Kubernetes cluster
  • 6. 6 What is Kubernetes? ● A set of open-source components running on one or more servers ● A container orchestration system ● An abstraction layer over your real or virtualized hardware ● An “infrastructure as a code” system ● Automatic resource allocation ● Next step after Ansible/Chef/Puppet
  • 7. 7 What Kubernetes is not? ● An operating system ● A magical way to make your infrastructure scalable ● An excuse to fire your devops (someone has to configure and manage it) ● A good solution for running 2-3 servers
  • 8. 8 Terminology Traditional infrastructure ● Physical server ● Virtual machine ● Individual application ● NAS/SAN ● Load balancer ● Application registry/hardware information ● Password files, certificates Kubernetes ● Node ● Pod ● Container (typically Docker) ● Persistent Volumes ● Service/Endpoint ● Labels ● Secrets
  • 10. 10 Stateful applications on Kubernetes ● PersistentVolumes ○ Abstracts details how storage is provisioned ○ Supports many different storage types via plugins: ■ EBS, AzureDisk, iSCSI, NFS, CEPH, Glusterfs and so on ● StatefulSets ○ Guarantied number of Pods with stable (and unique) identifiers ○ Ordered deployment and scaling ○ Connecting Pods with corresponding persistent storage (PersistentVolume+PersistentVolumeClaim)
  • 11. 11 ● All supported versions of PostgreSQL inside the single image ● Plenty of extensions (pg_partman, pg_cron, postgis, timescaledb, etc) ● Additional tools (pgq, pgbouncer, wal-e/wal-g) ● PGDATA on an external volume ● Patroni for HA ● Environment-variables based configuration ● Lightweight! Spilo Docker image
  • 12. 12 ● Automatic failover solution for PostgreSQL streaming replication ● A python daemon that manages one PostgreSQL instance ● Keeps the cluster state in a DCS (Etcd, Zookeeper, Consul, Kubernetes) ○ Uses DCS for leader elections ■ Makes PostgreSQL 1st class citizen on Kubernetes! ● Helps to automate a lot of things like: ○ A new cluster deployment ○ Scaling out and in ○ PostgreSQL configuration management What is Patroni
  • 13. 13 Spilo & Patroni on K8S Node2 Pod: demo-0 role: master PersistentVolume PersistentVolume Node1 StatefulSet: demo Pod: demo-1 role: replica Secret: demo WATCH() UPDATE() Service: demo-replica labelSelector: role=replica Service: demo Endpoint: demoS3
  • 14. 14 ● A few long YAML manifests to write ● Different parts of PostgreSQL configuration spread over multiple manifests ● No easy way to work with a cluster as a whole (update, delete) ● Manual generation of DB objects, i.e. users, and their passwords. Manual deployment to Kubernetes
  • 15. 15 ● Encapsulates knowledge of a human operating the service ● Implement a controller application to act on custom resources ● CRD (custom resource definitions) to describe a domain-specific object (i.e. a Postgres cluster) https://coreos.com/blog/introducing-operators.html Kubernetes operator pattern
  • 16. 16 ● Defines a custom Postgresql resource ● Watches instances of Postgresql, creates/updates/deletes corresponding Kubernetes objects ● Allows updating running-cluster resources (memory, cpu, volumes), postgres configuration ● Creates databases, users and automatically generates passwords ● Auto-repairs, smart rolling updates (switchover to replicas before updating the master) Zalando Postgres-Operator
  • 17. 17 apiVersion: "acid.zalan.do/v1" kind: postgresql metadata: name: acid-minimal-cluster spec: teamId: "ACID" # is used to provision human users volume: size: 1Gi numberOfInstances: 2 users: zalando: # database owner - createrole - createdb foo_app_user: # role for application foo databases: # name->owner foo: zalando postgresql: version: "11" Postgresql manifest
  • 18. 18 deploycluster manifest Stateful set Spilo pod Kubernetes cluster PATRONI operator pod Endpoint Service Client application operator config map Cluster secrets DB deployer create create create watch Infrastructure roles
  • 20. 20
  • 21. 21 Zalando Postgres-Operator features ● Rolling updates on Postgres cluster changes ● Volume resize without Pod restarts ● Cloning Postgres clusters ● Logical Backups to S3 Bucket ● Standby cluster from S3 WAL archive ● Configurable for non-cloud environments ● UI to create and edit Postgres cluster manifests
  • 22. 22 Kubernetes rolling upgrade ● Rotates all worker nodes in the K8s cluster ○ Does it in a rolling matter, one-by-one ○ If you are unlucky, it will cause the number of failover equal number of pods in your postgres cluster ● How Postgres-Operator deals with it? ○ Detect the to-be-decommissioned node by lack of the ready label and SchedulingDisabled status ○ Move all master pods: ■ Move replicas to the already updated node ■ Trigger switchover to those replicas
  • 23. 23 Availability Zone 1 Node cluster: A primary cluster: B primary cluster: C replica Availability Zone 2 Availability Zone 3 Kubernetes rolling upgrade (start) Node Node cluster: A replica cluster: B replica cluster: C primary Node Node cluster: A replica cluster: B replica cluster: C replica Node Node (to-be-decommissioned) Node (new) Terminated PodActive Pod
  • 24. 24 Availability Zone 1 Node cluster: A primary cluster: B primary cluster: C replica Availability Zone 2 Availability Zone 3 Kubernetes rolling upgrade (step 1) Node Node cluster: A replica cluster: B replica cluster: C primary Node Node cluster: A replica cluster: B replica cluster: C replica Node cluster: A replica cluster: B replica cluster: A replica cluster: B replica cluster: C replica cluster: C replica Node (to-be-decommissioned) Node (new) Terminated PodActive Pod
  • 25. 25 Availability Zone 1 Node cluster: A primary cluster: B primary Availability Zone 2 Availability Zone 3 Kubernetes rolling upgrade (step 1) Node Node cluster: C primary Node Node cluster: A replica cluster: B replica cluster: A replica cluster: B replica cluster: C replica cluster: C replica Node (to-be-decommissioned) Node (new) Terminated PodActive Pod
  • 26. 26 Availability Zone 1 Node cluster: A primary cluster: B primary Availability Zone 2 Availability Zone 3 Kubernetes rolling upgrade (switchover) Node Node cluster: C primary Node Node cluster: A replica cluster: B replica cluster: A replica cluster: B replica cluster: C replica cluster: C replica Node (to-be-decommissioned) Node (new) Terminated PodActive Pod
  • 27. 27 Availability Zone 1 Node cluster: A replica cluster: B replica Availability Zone 2 Availability Zone 3 Kubernetes rolling upgrade (switchover) Node Node cluster: C replica Node Node cluster: A primary cluster: B replica cluster: A replica cluster: B primary cluster: C replica cluster: C primary Node (to-be-decommissioned) Node (new) Terminated PodActive Pod
  • 28. 28 Availability Zone 1 Node Availability Zone 2 Availability Zone 3 Kubernetes rolling upgrade (finish) Node cluster: A replica cluster: B replica Node Node cluster: C replica Node cluster: A primary cluster: B replica cluster: A replica cluster: B primary cluster: C replica cluster: C primary Node (to-be-decommissioned) Node (new) Terminated PodActive Pod cluster: A replica cluster: B replica cluster: C replica
  • 30. 30 Problems with AWS infrastructure ● AWS API Rate Limit Exceeded ○ Prevents or delays attaching/detaching persistent volumes (EBS) to/from Pods ■ Delays recovery of failed Pods ○ Might delay a deployment of a new cluster ● Sometimes EC2 instances fail and being shutdown by AWS ○ Shutdown might take ages ○ All EBS volumes remain attached until instance is shutted down ■ Pods can’t be rescheduled
  • 31. 31 KUBE2IAM ● AWS IAM + Instance Profile is very flexible, convenient, and powerful technology ● AWS tools and libraries can use temporary credentials obtained from the Instance Profile to access other AWS services, for example S3 ○ We are using S3 for backups and continuous archiving (wal-e/wal-g) ● Instance Profile is configured per EC2 Instance (K8s Node)! ● kube2iam - provides different IAM roles for Pods running on a single Node ● kube2iam might stop serving the Pod which is being shutdown ○ wal-e is unable to get credentials to access S3 ■ Last WAL segments remain unarchived! :(
  • 32. 32 Lack of Disk space ● Single volume for PGDATA, pg_wal and logs ● FATAL,53100,could not write to file "pg_wal/xlogtemp.22993": No space left on device ○ Usually ends up with postgres being self shutdown ● Patroni tries to recover the primary which isn’t running ○ “start->promote->No space left->shutdown” loop Disk space MUST be monitored!
  • 33. 33 Disk space problems root cause or why we don’t implement volume autoextend ● Excessive logging ○ slow queries, human access, application errors, connections/disconnections ● pg_wal growth ○ archive_command is slow/failing ○ Unconsumed changes on the replication slot ■ Replica is not streaming? Replica is slow? ■ Logical replication slot? ○ checkpoints taking too long due to throttled IOPS ● PGDATA growth ○ Table and index bloat! ■ Useless updates of unchanged data? ■ Autovacuum tuning? Zheap? ○ Natural growth of data ■ Lack of retention policies? ■ Broken cleanup jobs?
  • 34. 34 ORM can cause wal-e to fail! wal_e.main ERROR MSG: Attempted to archive a file that is too large. HINT: There is a file in the postgres database directory that is larger than 1610612736 bytes. If no such file exists, please report this as a bug. In particular, check pg_stat/pg_stat_statements.stat.tmp, which appears to be 2010822591 bytes Meanwhile in pg_stat_statements: UPDATE foo SET bar = $1 WHERE id IN ($2, $3, $4, …, $10500); UPDATE foo SET bar = $1 WHERE id IN ($2, $3, $4, …, $100500); …. and so on
  • 35. 35 Exclusive backup issues PANIC,XX000,"online backup was canceled, recovery cannot continue",,,,,"xlog redo at D45/EB000028 for XLOG/CHECKPOINT_SHUTDOWN: redo D45/EB000028; tli 237; prev tli 237; fpw true; xid 0:105446371; oid 187558; multi 1; offset 0; oldest xid 544 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown",,,,"" ● There is no way to join back such failed primary as a replica without rebuilding (reinitializing) it! ○ wal-g supports non-exclusive backups, but not yet stable enough
  • 36. 36 ● Postgres log: server process (PID 10810) was terminated by signal 9: Killed ● dmesg -T: [Wed Jul 31 01:35:35 2019] Memory cgroup out of memory: Kill process 14208 (postgres) score 606 or sacrifice child [Wed Jul 31 01:35:35 2019] Killed process 14208 (postgres) total-vm:2972124kB, anon-rss:68724kB, file-rss:1304kB, shmem-rss:2691844kB [Wed Jul 31 01:35:35 2019] oom_reaper: reaped process 14208 (postgres), now anon-rss:0kB, file-rss:0kB, shmem-rss:2691844kB Out-Of-Memory Killer
  • 37. 37 Out-Of-Memory Killer ● Pids in the container (10810) and on the host are different (14208) ○ Sometimes it is hard to investigate what happened ● oom_score_adj trick doesn’t really make sense in the container ○ There are only Patroni+PostgreSQL running ● It is not really clear how memory accounting in the container works: ○ memory: usage 8388392kB, limit 8388608kB, failcnt 1 ○ cache:2173896KB rss:6019692KB rss_huge:0KB shmem:2173428KB mapped_file:2173512KB dirty:132KB writeback:0KB swap:0KB inactive_anon:15732KB active_anon:8177696KB inactive_file:320KB active_file:184KB unevictable:0KB
  • 38. 38 Kubernetes+Docker ● ERROR: could not resize shared memory segment "/PostgreSQL.1384046013" to 8388608 bytes: No space left on device ● PostgreSQL 11 (due to the “parallel hash join”) ● Docker limits /dev/shm to 64MB by default ● How to fix? ○ Mount custom dshm tmpfs volume to /dev/shm ■ Or set enableShmVolume: true in the cluster manifest
  • 39. 39 Problems with PostgreSQL ● Logical decoding on the replica? Failover slots? ○ Patroni does sort of a hack by not allowing connections until logical slot is created. ■ Consumer might still lose some events. ● “FATAL too many connections” ○ Prevents replica from starting streaming ■ Solved in PostgreSQL 12 (wal_senders not count as part of max_connection) ○ Built-in connection pooler?
  • 40. 40 Human errors ● YAML formatting :) ● Inadequate resource and limit requests ○ Pod can’t be scheduled due to the node weakness ○ Processes are terminated by oom-killer ● Deleted Postgres-Operator/Spilo ServiceAccount by employees
  • 41. 41 Conclusion ● Postgres-Operator helps us to manage more than 1000 PostgreSQL clusters distributed in 80+ K8s accounts with minimal effort. ○ It wouldn’t be possible without high level of automation ● In the cloud and on K8s you have to be ready to deal with absolutely new problems and failure scenarios ○ Find the solution and implement it in the operator!
  • 42. 42 ● Postgres-operator: https://github.com/zalando/postgres-operator ● Patroni: https://github.com/zalando/patroni ● Spilo: https://github.com/zalando/spilo Open-source