SlideShare uma empresa Scribd logo
1 de 88
Baixar para ler offline
Corpus collapsum 
Partition tolerance of Galera in a noisy high load 
environment 
Highload++ 2014 
Raghavendra Prabhu 
 raghavendra.d.prabhu@gmail.com 
Percona  raghavendra.prabhu@percona.com 
 randomsurfer  wnohang.net  rdprabhu  ronin13
The Title?
Our Cluster
Split brain
Introduction 
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed system. ” 
“ A distributed system is one in which the failure of a computer you didn’t 
even know existed can render your own computer unusable. ” - Leslie Lamport 
“ Never attribute to malice that which is adequately explained by stupidity. 
” - Hanlon’s Razor 
“ Never attribute to Byzantine failure which can be explained by an ill 
node(s) ” - Me 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58
Introduction 
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed system. ” 
“ A distributed system is one in which the failure of a computer you didn’t 
even know existed can render your own computer unusable. ” - Leslie Lamport 
“ Never attribute to malice that which is adequately explained by stupidity. 
” - Hanlon’s Razor 
“ Never attribute to Byzantine failure which can be explained by an ill 
node(s) ” - Me 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58
Introduction 
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed system. ” 
“ A distributed system is one in which the failure of a computer you didn’t 
even know existed can render your own computer unusable. ” - Leslie Lamport 
“ Never attribute to malice that which is adequately explained by stupidity. 
” - Hanlon’s Razor 
“ Never attribute to Byzantine failure which can be explained by an ill 
node(s) ” - Me 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58
Introduction 
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed system. ” 
“ A distributed system is one in which the failure of a computer you didn’t 
even know existed can render your own computer unusable. ” - Leslie Lamport 
“ Never attribute to malice that which is adequately explained by stupidity. 
” - Hanlon’s Razor 
“ Never attribute to Byzantine failure which can be explained by an ill 
node(s) ” - Me 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58
20000 feet view
Introduction 
Actors 
▶ Database - WSREP/PXC 
▶ Plugin - Galera 
▶ Traffic control 
♦ Traffic Control - tc 
♦ NetEm 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 7 / 58
Introduction 
Actors 
▶ Database - WSREP/PXC 
▶ Plugin - Galera 
▶ Traffic control 
♦ Traffic Control - tc 
♦ NetEm 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 7 / 58
Introduction 
Actors 
▶ Database - WSREP/PXC 
▶ Plugin - Galera 
▶ Traffic control 
♦ Traffic Control - tc 
♦ NetEm 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 7 / 58
Introduction 
Actors 
▶ Containers - Docker 
▶ Load 
♦ Generators - Sysbench, RQG 
▶ Network 
♦ Dnsmasq 
♦ nsenter 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 8 / 58
Introduction 
Actors 
▶ Containers - Docker 
▶ Load 
♦ Generators - Sysbench, RQG 
▶ Network 
♦ Dnsmasq 
♦ nsenter 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 8 / 58
Introduction 
Actors 
▶ Jenkins 
♦ Build flow and CI 
▶ Storage 
♦ Why 
▶ “Others” 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 9 / 58
Details 
But why 
▶ The ’P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58
Details 
But why 
▶ The ’P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58
Details 
But why 
▶ The ’P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58
Details 
But why 
▶ The ’P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58
Details 
But why 
▶ Failures in warehouses. 
▶ Not quorum, but consensus. 
▶ Real world networks and synchronous replication 
- Delay 
- Partition 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 11 / 58
Galera
Details 
Galera 
▶ Data-centric approach 
▶ EVS 
▶ Causality and Synchronous 
▶ Latency 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 13 / 58
Where did it start
Details 
Where did it start 
▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192 
▶ Loss of PC 
▶ Crash 
▶ HA goal 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 18 / 58
One can bring the whole down
The Flow
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
Details 
Cluster Resilience 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 23 / 58
Details 
Parameters 
▶ Sysbench 
▶ Segment 
▶ Reconciliation period 
▶ Loss nodes 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58
Details 
Parameters 
▶ Sysbench 
▶ Segment 
▶ Reconciliation period 
▶ Loss nodes 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58
Details 
Parameters 
▶ Sysbench 
▶ Segment 
▶ Reconciliation period 
▶ Loss nodes 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58
Details 
Parameters 
▶ Sysbench 
▶ Segment 
▶ Reconciliation period 
▶ Loss nodes 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58
Details 
Parameters 
▶ NetEm 
▶ Detach loss 
▶ Fsync 
▶ Shutdown 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58
Details 
Parameters 
▶ NetEm 
▶ Detach loss 
▶ Fsync 
▶ Shutdown 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58
Details 
Parameters 
▶ NetEm 
▶ Detach loss 
▶ Fsync 
▶ Shutdown 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58
Details 
Parameters 
▶ NetEm 
▶ Detach loss 
▶ Fsync 
▶ Shutdown 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58
Containers!
Details 
Docker 
▶ Why not virtualize 
♦ Occam 
♦ Namespaces 
▶ Simplicity 
♦ Network 
♦ One application per node 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 27 / 58
Details 
Docker 
▶ Portability 
- See same qualitative behavior that I do. 
▶ Reproducibility 
- Makes it determinstic 
▶ Configurable and CI 
- Byproducts 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 28 / 58
Details 
Docker 
▶ QEMU and Docker 
▶ Scalability 
♦ Performance 
♦ Feature 
▶ Abstraction of channels 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 29 / 58
Details 
Container Networking 
▶ Linking didn’t help 
▶ Dnsmasq to rescue! 
♦ Hosts file and volumes 
♦ SIGHUP and refresh 
▶ More elegant methods 
Swarm 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 30 / 58
Details 
Noise 
▶ Initial setup 
- Bridge 
- Egress only 
- IFB 
▶ Present state 
▶ NetEm 
- tc qdisc buckets 
- packet loss, delay, corruption, duplication, reordering 
- nsenter 
▶ Future 
- Docker exec 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 31 / 58
Testing methods
Details 
Method I 
▶ Qdisc is detached after load 
▶ Objective 
- Time to recover of full cluster 
▶ Done with a larger subset 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 33 / 58
Details 
Method II 
▶ Qdisc is kept till the end 
▶ Objective 
- Formation of primary component 
▶ Comparatively smaller set 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 34 / 58
Details 
Observations 
▶ Post sanity types 
- Why 
▶ Which method is more pertinent 
▶ State transfer issues 
- Beginning 
- During re-emergence 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 35 / 58
Details 
Observations 
▶ Direct load to affected nodes 
▶ Logs 
- journalctl 
- Streaming? 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 36 / 58
Details 
Other noises 
▶ Aim 
▶ Fsync 
- libeatmydata 
- Variance 
▶ Correlation with network 
▶ How with Docker 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 37 / 58
System Load
Details 
Load generation 
▶ Sysbench 
- Generation 
- Reconnect on partition 
▶ Sockets chosen 
- Load on affected nodes 
▶ Distribution of Load 
- RR with socat 
- Native sysbench support 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 39 / 58
Details 
Load generation 
▶ Nature of data/load 
- DDL 
▶ RQG in future 
- Fuzz testing 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 40 / 58
The Fix
Strike Out!
Details 
Eviction 
▶ STONITH 
▶ Permanent eviction 
▶ ’N’ strikes & out! 
- Timers - evs parameters 
- wsrep_evs_delayed and wsrep_evs_evict_list 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 43 / 58
Details 
Eviction 
▶ Aim 
▶ Quorum required 
- Why? - Not shoot each other - Non-PC nodes also. 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 44 / 58
Details 
Eviction 
▶ Aim 
▶ Quorum required 
- Why? - Not shoot each other - Non-PC nodes also. 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 44 / 58
Details 
Eviction 
▶ EVS version and upgrade 
▶ TODO! 
- Ingress only - Follow here. 
▶ Credits to Teemu Ollakka, Yan Zhang and Alex Yurchenko from codership. 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 45 / 58
Details 
Coredumps with Docker 
▶ Breakdown of abstraction 
▶ Lack of isolation 
▶ What was done 
- Volumes 
- core_pattern & sysctl 
- suid and ulimit 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 46 / 58
Details 
WAN Segments 
▶ How they work 
▶ Random allocation 
▶ Joiner starvation 
▶ Simulates data center 
▶ Donor selection 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 47 / 58
Epilogue 
The code 
▶ Github: https://github.com/percona/pxc-docker 
▶ Jenkins: http://jenkins.percona.com/job/PXC-5.6-netem/ 
- Demo? 
▶ Contributions/testing welcome! 
▶ Dependencies 
- Sysbench 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 48 / 58
Epilogue 
Code: todo 
▶ Docker automated builds 
▶ Orchestration 
▶ Docker 
♦ Injection 
♦ Signal proxying 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 49 / 58
Epilogue 
Code: todo 
▶ Use Hoare’s channels - Go! 
▶ Run it bare - CoreOS 
▶ Overlay with etcd/fleet/libswarm 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 50 / 58
Future work
Epilogue 
Future work 
▶ Fault injection 
♦ Memory 
- Poisoned memory 
♦ Disk 
- libeatmydata 
- Opposite: laggard! 
- ENOSPC 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 52 / 58
Epilogue 
Fault injection 
▶ CPU 
- NUMA? 
- Hotplug 
▶ More network 
- corruption, duplication, reordering, rate-limit 
- Better distribution 
- Other shaping 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 53 / 58
More Chaos
Epilogue 
Future work 
▶ Disturb cluster more! 
- Membership changes 
* Manual eviction 
* Pull the cord! 
- Corrupt nodes 
▶ Consistency voting 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 55 / 58
Epilogue 
Further Reading 
▶ Byzantine fault tolerance 
- Reaching agreement in presence of faults 
▶ The Network is Reliable 
▶ NetEm 
▶ Latency: The New Web Performance Bottleneck 
▶ Galera 
▶ Auto eviction code 
▶ Don’t Settle for Eventual Consistency 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 56 / 58
Epilogue 
About 
▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB Cluster, Percona. 
▶ Slides will be at slideshare and owncloud 
▶ Keybase.io: rdprabhu 
▶ About.me: raghavendra.prabhu 
▶ Keybase.io: rdprabhu 
▶ Presentation under CC BY-SA 4.0 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 57 / 58
Epilogue 
Image Credits 
▶ http://galeracluster.com/documentation-webpages/ 
▶ http://www.thelastdragontribute.com/40th-anniversary-death-of-bruce-lee/ 
▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png 
▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354 
▶ https://flic.kr/p/9J6GNu 
▶ https://secure.flickr.com/photos/brewbooks/7780990192 
▶ https://www.flickr.com/photos/kwerfeldein/2649294869 
▶ https://secure.flickr.com/photos/mindmob/51951632 
▶ https://secure.flickr.com/photos/arenamontanus/2227769907 
▶ https://www.flickr.com/photos/markop/477199204 
▶ http://galeracluster.com/wp-content/uploads/2013/10/galera_replication1.png 
▶ https://www.flickr.com/photos/gcwest/281385801 
▶ https://www.flickr.com/photos/opethdamna/360934079 
▶ http://digital-amphetamine.deviantart.com/art/Sky-82555664 
▶ http://highload.co/i/logo.png 
▶ https://flic.kr/p/xTT8n 
▶ https://www.flickr.com/photos/29233640@N07/13466208953 
▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/ 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 58 / 58

Mais conteúdo relacionado

Semelhante a Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

Puppet Camp Denver 2015: Running a Benevolent Puppet Regime
Puppet Camp Denver 2015: Running a Benevolent Puppet RegimePuppet Camp Denver 2015: Running a Benevolent Puppet Regime
Puppet Camp Denver 2015: Running a Benevolent Puppet RegimePuppet
 
Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera in a noisy high load environmentCorpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera in a noisy high load environmentRaghavendra Prabhu
 
Frontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy PersonFrontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy PersonPhilip Tellis
 
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...Raghavendra Prabhu
 
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...Raghavendra Prabhu
 
Unit testing like a pirate #wceu 2013
Unit testing like a pirate #wceu 2013Unit testing like a pirate #wceu 2013
Unit testing like a pirate #wceu 2013Ptah Dunbar
 
Agile in style ganesh c 12-26-12 ct buddies
Agile in style   ganesh c 12-26-12 ct buddiesAgile in style   ganesh c 12-26-12 ct buddies
Agile in style ganesh c 12-26-12 ct buddiestcganesh
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Databricks
 
Beat the devil: towards a Drupal performance benchmark
Beat the devil: towards a Drupal performance benchmarkBeat the devil: towards a Drupal performance benchmark
Beat the devil: towards a Drupal performance benchmarkPedro González Serrano
 
Dangers of parallel streams
Dangers of parallel streamsDangers of parallel streams
Dangers of parallel streamsLukáš Křečan
 
Comparing high availability solutions with percona xtradb cluster and percona...
Comparing high availability solutions with percona xtradb cluster and percona...Comparing high availability solutions with percona xtradb cluster and percona...
Comparing high availability solutions with percona xtradb cluster and percona...Marco Tusa
 
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxyBo-Yi Wu
 

Semelhante a Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona) (14)

Puppet Camp Denver 2015: Running a Benevolent Puppet Regime
Puppet Camp Denver 2015: Running a Benevolent Puppet RegimePuppet Camp Denver 2015: Running a Benevolent Puppet Regime
Puppet Camp Denver 2015: Running a Benevolent Puppet Regime
 
Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera in a noisy high load environmentCorpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
 
Frontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy PersonFrontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy Person
 
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
 
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
 
Unit testing like a pirate #wceu 2013
Unit testing like a pirate #wceu 2013Unit testing like a pirate #wceu 2013
Unit testing like a pirate #wceu 2013
 
Agile in style ganesh c 12-26-12 ct buddies
Agile in style   ganesh c 12-26-12 ct buddiesAgile in style   ganesh c 12-26-12 ct buddies
Agile in style ganesh c 12-26-12 ct buddies
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Beat the devil: towards a Drupal performance benchmark
Beat the devil: towards a Drupal performance benchmarkBeat the devil: towards a Drupal performance benchmark
Beat the devil: towards a Drupal performance benchmark
 
Survey of Percona Toolkit
Survey of Percona ToolkitSurvey of Percona Toolkit
Survey of Percona Toolkit
 
Dangers of parallel streams
Dangers of parallel streamsDangers of parallel streams
Dangers of parallel streams
 
Comparing high availability solutions with percona xtradb cluster and percona...
Comparing high availability solutions with percona xtradb cluster and percona...Comparing high availability solutions with percona xtradb cluster and percona...
Comparing high availability solutions with percona xtradb cluster and percona...
 
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
 
Scheduling
SchedulingScheduling
Scheduling
 

Mais de Ontico

One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...Ontico
 
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Ontico
 
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Ontico
 
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Ontico
 
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Ontico
 
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)Ontico
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Ontico
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Ontico
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)Ontico
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)Ontico
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Ontico
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Ontico
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Ontico
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Ontico
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)Ontico
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Ontico
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Ontico
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...Ontico
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Ontico
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Ontico
 

Mais de Ontico (20)

One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
 
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
 
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
 
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
 
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
 
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
 

Último

『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesLumiverse Solutions Pvt Ltd
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 

Último (9)

『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best Practices
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 

Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

  • 1. Corpus collapsum Partition tolerance of Galera in a noisy high load environment Highload++ 2014 Raghavendra Prabhu  raghavendra.d.prabhu@gmail.com Percona  raghavendra.prabhu@percona.com  randomsurfer  wnohang.net  rdprabhu  ronin13
  • 5. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58
  • 6. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58
  • 7. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58
  • 8. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58
  • 10. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 7 / 58
  • 11. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 7 / 58
  • 12. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 7 / 58
  • 13. Introduction Actors ▶ Containers - Docker ▶ Load ♦ Generators - Sysbench, RQG ▶ Network ♦ Dnsmasq ♦ nsenter Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 8 / 58
  • 14. Introduction Actors ▶ Containers - Docker ▶ Load ♦ Generators - Sysbench, RQG ▶ Network ♦ Dnsmasq ♦ nsenter Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 8 / 58
  • 15. Introduction Actors ▶ Jenkins ♦ Build flow and CI ▶ Storage ♦ Why ▶ “Others” Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 9 / 58
  • 16. Details But why ▶ The ’P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58
  • 17. Details But why ▶ The ’P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58
  • 18. Details But why ▶ The ’P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58
  • 19. Details But why ▶ The ’P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58
  • 20. Details But why ▶ Failures in warehouses. ▶ Not quorum, but consensus. ▶ Real world networks and synchronous replication - Delay - Partition Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 11 / 58
  • 22. Details Galera ▶ Data-centric approach ▶ EVS ▶ Causality and Synchronous ▶ Latency Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 13 / 58
  • 23.
  • 24.
  • 25.
  • 26. Where did it start
  • 27. Details Where did it start ▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192 ▶ Loss of PC ▶ Crash ▶ HA goal Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 18 / 58
  • 28. One can bring the whole down
  • 30. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  • 31. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  • 32. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  • 33. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  • 34. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  • 35. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  • 36. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  • 37. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  • 38. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  • 39. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  • 40. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  • 41. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  • 42. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  • 43. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  • 44. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  • 45. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  • 46. Details Cluster Resilience Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 23 / 58
  • 47. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58
  • 48. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58
  • 49. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58
  • 50. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58
  • 51. Details Parameters ▶ NetEm ▶ Detach loss ▶ Fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58
  • 52. Details Parameters ▶ NetEm ▶ Detach loss ▶ Fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58
  • 53. Details Parameters ▶ NetEm ▶ Detach loss ▶ Fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58
  • 54. Details Parameters ▶ NetEm ▶ Detach loss ▶ Fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58
  • 56. Details Docker ▶ Why not virtualize ♦ Occam ♦ Namespaces ▶ Simplicity ♦ Network ♦ One application per node Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 27 / 58
  • 57. Details Docker ▶ Portability - See same qualitative behavior that I do. ▶ Reproducibility - Makes it determinstic ▶ Configurable and CI - Byproducts Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 28 / 58
  • 58. Details Docker ▶ QEMU and Docker ▶ Scalability ♦ Performance ♦ Feature ▶ Abstraction of channels Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 29 / 58
  • 59. Details Container Networking ▶ Linking didn’t help ▶ Dnsmasq to rescue! ♦ Hosts file and volumes ♦ SIGHUP and refresh ▶ More elegant methods Swarm Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 30 / 58
  • 60. Details Noise ▶ Initial setup - Bridge - Egress only - IFB ▶ Present state ▶ NetEm - tc qdisc buckets - packet loss, delay, corruption, duplication, reordering - nsenter ▶ Future - Docker exec Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 31 / 58
  • 62. Details Method I ▶ Qdisc is detached after load ▶ Objective - Time to recover of full cluster ▶ Done with a larger subset Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 33 / 58
  • 63. Details Method II ▶ Qdisc is kept till the end ▶ Objective - Formation of primary component ▶ Comparatively smaller set Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 34 / 58
  • 64. Details Observations ▶ Post sanity types - Why ▶ Which method is more pertinent ▶ State transfer issues - Beginning - During re-emergence Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 35 / 58
  • 65. Details Observations ▶ Direct load to affected nodes ▶ Logs - journalctl - Streaming? Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 36 / 58
  • 66. Details Other noises ▶ Aim ▶ Fsync - libeatmydata - Variance ▶ Correlation with network ▶ How with Docker Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 37 / 58
  • 68. Details Load generation ▶ Sysbench - Generation - Reconnect on partition ▶ Sockets chosen - Load on affected nodes ▶ Distribution of Load - RR with socat - Native sysbench support Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 39 / 58
  • 69. Details Load generation ▶ Nature of data/load - DDL ▶ RQG in future - Fuzz testing Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 40 / 58
  • 72. Details Eviction ▶ STONITH ▶ Permanent eviction ▶ ’N’ strikes & out! - Timers - evs parameters - wsrep_evs_delayed and wsrep_evs_evict_list Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 43 / 58
  • 73. Details Eviction ▶ Aim ▶ Quorum required - Why? - Not shoot each other - Non-PC nodes also. Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 44 / 58
  • 74. Details Eviction ▶ Aim ▶ Quorum required - Why? - Not shoot each other - Non-PC nodes also. Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 44 / 58
  • 75. Details Eviction ▶ EVS version and upgrade ▶ TODO! - Ingress only - Follow here. ▶ Credits to Teemu Ollakka, Yan Zhang and Alex Yurchenko from codership. Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 45 / 58
  • 76. Details Coredumps with Docker ▶ Breakdown of abstraction ▶ Lack of isolation ▶ What was done - Volumes - core_pattern & sysctl - suid and ulimit Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 46 / 58
  • 77. Details WAN Segments ▶ How they work ▶ Random allocation ▶ Joiner starvation ▶ Simulates data center ▶ Donor selection Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 47 / 58
  • 78. Epilogue The code ▶ Github: https://github.com/percona/pxc-docker ▶ Jenkins: http://jenkins.percona.com/job/PXC-5.6-netem/ - Demo? ▶ Contributions/testing welcome! ▶ Dependencies - Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 48 / 58
  • 79. Epilogue Code: todo ▶ Docker automated builds ▶ Orchestration ▶ Docker ♦ Injection ♦ Signal proxying Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 49 / 58
  • 80. Epilogue Code: todo ▶ Use Hoare’s channels - Go! ▶ Run it bare - CoreOS ▶ Overlay with etcd/fleet/libswarm Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 50 / 58
  • 82. Epilogue Future work ▶ Fault injection ♦ Memory - Poisoned memory ♦ Disk - libeatmydata - Opposite: laggard! - ENOSPC Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 52 / 58
  • 83. Epilogue Fault injection ▶ CPU - NUMA? - Hotplug ▶ More network - corruption, duplication, reordering, rate-limit - Better distribution - Other shaping Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 53 / 58
  • 85. Epilogue Future work ▶ Disturb cluster more! - Membership changes * Manual eviction * Pull the cord! - Corrupt nodes ▶ Consistency voting Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 55 / 58
  • 86. Epilogue Further Reading ▶ Byzantine fault tolerance - Reaching agreement in presence of faults ▶ The Network is Reliable ▶ NetEm ▶ Latency: The New Web Performance Bottleneck ▶ Galera ▶ Auto eviction code ▶ Don’t Settle for Eventual Consistency Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 56 / 58
  • 87. Epilogue About ▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB Cluster, Percona. ▶ Slides will be at slideshare and owncloud ▶ Keybase.io: rdprabhu ▶ About.me: raghavendra.prabhu ▶ Keybase.io: rdprabhu ▶ Presentation under CC BY-SA 4.0 Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 57 / 58
  • 88. Epilogue Image Credits ▶ http://galeracluster.com/documentation-webpages/ ▶ http://www.thelastdragontribute.com/40th-anniversary-death-of-bruce-lee/ ▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png ▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354 ▶ https://flic.kr/p/9J6GNu ▶ https://secure.flickr.com/photos/brewbooks/7780990192 ▶ https://www.flickr.com/photos/kwerfeldein/2649294869 ▶ https://secure.flickr.com/photos/mindmob/51951632 ▶ https://secure.flickr.com/photos/arenamontanus/2227769907 ▶ https://www.flickr.com/photos/markop/477199204 ▶ http://galeracluster.com/wp-content/uploads/2013/10/galera_replication1.png ▶ https://www.flickr.com/photos/gcwest/281385801 ▶ https://www.flickr.com/photos/opethdamna/360934079 ▶ http://digital-amphetamine.deviantart.com/art/Sky-82555664 ▶ http://highload.co/i/logo.png ▶ https://flic.kr/p/xTT8n ▶ https://www.flickr.com/photos/29233640@N07/13466208953 ▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/ Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 58 / 58