O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Corpus collapsum
Partition tolerance testing of Galera with
Docker and NetEm
Raghavendra Prabhu
 raghavendra.d.prabhu@gma...
The Title
Split Brain?
Split brain
Introduction
Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributed
system. ”
“ A distributed system is one ...
Introduction
Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributed
system. ”
“ A distributed system is one ...
Introduction
Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributed
system. ”
“ A distributed system is one ...
Introduction
Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributed
system. ”
“ A distributed system is one ...
20000 feet view
Introduction
Actors
▶ Database - WSREP/PXC
▶ Plugin - Galera
▶ Traffic control
♦ Traffic Control - tc
♦ NetEm
Raghavendra ...
Introduction
Actors
▶ Database - WSREP/PXC
▶ Plugin - Galera
▶ Traffic control
♦ Traffic Control - tc
♦ NetEm
Raghavendra ...
Introduction
Actors
▶ Database - WSREP/PXC
▶ Plugin - Galera
▶ Traffic control
♦ Traffic Control - tc
♦ NetEm
Raghavendra ...
Introduction
Actors
▶ Containers - Docker
▶ Load
♦ Generators - Sysbench, RQG
▶ Network
♦ Dnsmasq
♦ nsenter
Raghavendra Pr...
Introduction
Actors
▶ Containers - Docker
▶ Load
♦ Generators - Sysbench, RQG
▶ Network
♦ Dnsmasq
♦ nsenter
Raghavendra Pr...
Introduction
Actors
▶ Jenkins
♦ Build flow and CI
▶ Storage
♦ Why
Raghavendra Prabhu (Percona) Corpus collapsum 20 Februar...
Distributed Systems Testing
A Kobayashi Maru
Cheat on CAP!
Details
Rationale
▶ The ‘P’ in CAP
▶ WAN scalability
▶ Real Reason - fun!
▶ Tolerance to latency variance
Raghavendra Prab...
Details
Rationale
▶ The ‘P’ in CAP
▶ WAN scalability
▶ Real Reason - fun!
▶ Tolerance to latency variance
Raghavendra Prab...
Details
Rationale
▶ The ‘P’ in CAP
▶ WAN scalability
▶ Real Reason - fun!
▶ Tolerance to latency variance
Raghavendra Prab...
Details
Rationale
▶ The ‘P’ in CAP
▶ WAN scalability
▶ Real Reason - fun!
▶ Tolerance to latency variance
Raghavendra Prab...
Details
Rationale
▶ Failures in warehouses.
▶ Not quorum, but consensus.
▶ Real world networks and synchronous replication...
Galera
Details
Galera
▶ Data-centric approach
▶ Extended Virtual Synchrony
▶ Causality and Synchronous
▶ Flow control and tempora...
Details
Galera
▶ Latency
- Global ordering
- Certification and not apply
- Communication overhead
▶ Layers
- Replication
-...
Where did it start
Details
Where did it start
▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192
▶ Loss of PC
▶ Crash
▶ HAT
Raghavendra Pr...
One can bring the whole
down
Details
Tests
▶ Chaos testing
▶ Flow control with sysbench
▶ Network Loss
▶ Future
Raghavendra Prabhu (Percona) Corpus col...
There is no higher menace than
distributed systems testing
Details
NetEm
▶ Initial setup
- Bridge
- Egress only
- IFB
- Present state
▶ NetEm
- tc qdisc buckets
- packet loss, delay...
Details
Tests: Chaos testing
▶ Nodes killed at random around sysbench
▶ Less than half of nodes are chosen
▶ docker inspec...
Details
Tests: Network Loss
▶ Loss nodes
▶ Detach/Keep qdisc
▶ Reconciliation
▶ Sanity checks
▶ Formation of PC || time to...
The Flow
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Parameters
▶ Sysbench
▶ Segment
▶ Reconciliation period
▶ Loss nodes
Raghavendra Prabhu (Percona) Corpus collapsum...
Details
Parameters
▶ Sysbench
▶ Segment
▶ Reconciliation period
▶ Loss nodes
Raghavendra Prabhu (Percona) Corpus collapsum...
Details
Parameters
▶ Sysbench
▶ Segment
▶ Reconciliation period
▶ Loss nodes
Raghavendra Prabhu (Percona) Corpus collapsum...
Details
Parameters
▶ Sysbench
▶ Segment
▶ Reconciliation period
▶ Loss nodes
Raghavendra Prabhu (Percona) Corpus collapsum...
Plumbing the pressure
Details
Parameters
▶ NetEm
▶ Qdisc detach
▶ fsync
▶ Shutdown
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 20...
Details
Parameters
▶ NetEm
▶ Qdisc detach
▶ fsync
▶ Shutdown
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 20...
Details
Parameters
▶ NetEm
▶ Qdisc detach
▶ fsync
▶ Shutdown
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 20...
Details
Parameters
▶ NetEm
▶ Qdisc detach
▶ fsync
▶ Shutdown
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 20...
Containers!
Details
Docker
▶ Why not virtualize
Occam
Namespaces
▶ Simplicity
♦ Network
Logical scalability
♦ One application per node...
Details
Docker
▶ Portability
- Qualitative behavior.
▶ Reproducibility
- Makes it determinstic
▶ Configurable and CI
- Byp...
Details
Docker
▶ QEMU vis-à-vis Docker
▶ Scalability
♦ Performance
♦ Feature
▶ Abstraction of channels
Raghavendra Prabhu ...
Details
Container Networking
▶ Linking didn’t help
▶ Dnsmasq to rescue!
♦ Hosts file and volumes
♦ SIGHUP and refresh
▶ Po...
Testing methods
Details
Overview
▶ Transient noise
▶ Lasting ’sickness’
▶ Sick nodes
▶ Dead members
Raghavendra Prabhu (Percona) Corpus co...
Details
Method I
▶ Qdisc is detached after load
▶ Objective
- Time to recover of full cluster
▶ Done with a larger subset
...
Details
Method II
▶ Qdisc is kept till the end
▶ Objective
- Formation of primary component
▶ Comparatively smaller set
Ra...
Details
Observations
▶ Post sanity types
- Why
▶ Which method is more pertinent
▶ State transfer issues
- Beginning
- Duri...
Details
Observations
▶ Direct load to affected nodes
▶ Partition external to system
▶ Logs
- journalctl
- Streaming?
Ragha...
Details
Other noises
▶ Aim
▶ Fsync
- libeatmydata
- Variance
▶ Correlation with network
▶ How with Docker
- LD_PRELOAD
Rag...
System Load
Details
Load generation
▶ Sysbench
- Generation
- Reconnect on partition
▶ Sockets chosen
- Load on affected nodes
▶ Distr...
Details
Load generation
▶ Nature of data/load
- DDL
▶ RQG in future
- Fuzz testing
Raghavendra Prabhu (Percona) Corpus col...
The Fix
Strike Out!
Details
Eviction
▶ STONITH
▶ Permanent eviction
▶ ’N’ strikes & out!
- Timers - evs parameters
- wsrep_evs_delayed and wsr...
Details
Eviction
▶ Aim
▶ Quorum required
- Why? - Not shoot each other
- Non-PC nodes also.
Raghavendra Prabhu (Percona) C...
Details
Eviction
▶ Aim
▶ Quorum required
- Why? - Not shoot each other
- Non-PC nodes also.
Raghavendra Prabhu (Percona) C...
Details
Coredumps with Docker
▶ Breakdown of abstraction
▶ Lack of isolation
▶ What was done
- Volumes
- core_pattern & sy...
Details
WAN Segments
▶ How they work
▶ Simulates data center
▶ Random allocation - latency multiplier
▶ Joiner starvation
...
Epilogue
The code
▶ Github:
- https://github.com/percona/pxc-docker
-
https://github.com/percona/percona-xtradb-cluster/
-...
Epilogue
Code: todo
▶ Docker automated builds
▶ Orchestration
▶ Docker
♦ Injection
♦ Signal proxying
Raghavendra Prabhu (P...
Epilogue
Code: todo
▶ => Proof of concept to a framework =>
▶ Run it bare - CoreOS, Atomic
▶ Overlay with etcd/fleet/libsw...
Future work
Epilogue
Future work
▶ Fault injection
♦ Memory
- Poisoned memory
♦ Disk
- libeatmydata
- Opposite
- ENOSPC
Raghavendra Pr...
Epilogue
Fault injection
▶ CPU
- NUMA?
- Hotplug
▶ More network
- corruption, duplication, reordering, rate-limit
- Better...
Worst case improves Average
case
Epilogue
Future work
▶ Disturb cluster more!
- Membership changes
* Manual eviction
* Pull the cord!
- Corrupt nodes
▶ Int...
Epilogue
Eventual consistency
▶ CAP
▶ Latency factor
▶ Is Galera EC? No!
- ACIDs only, No BASE
▶ Bounded Staleness
- PBS
▶...
Epilogue
Further Reading
▶ Resources
▶ Byzantine fault tolerance
- Reaching agreement in presence of faults
▶ The Network ...
Epilogue
Further Reading
▶ Worst-Case Distributed Systems Design
▶ HAT, not CAP: Introducing Highly Available Transactions...
Epilogue
We are Hiring Too!
▶ Looking for build engineer - Packaging and Jenkins/CI are your
strengths and you are a linux...
Conference for Database
geeks!
My Talk: Securing databases with
systemd for containers and
services
Epilogue
About/Contact - HA compliant
▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB
Cluster, Percona.
▶ Slides w...
Epilogue
Image Credits
▶ http://galeracluster.com/documentation-webpages/
▶ https://en.wikipedia.org/wiki/Network_theory
▶...
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Próximos SlideShares
Carregando em…5
×

Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm

796 visualizações

Publicada em

This talk is about partition tolerance and chaos testing of a Galera cluster with Docker containers and NetEm.

Publicada em: Software
  • Seja o primeiro a comentar

Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm

  1. 1. Corpus collapsum Partition tolerance testing of Galera with Docker and NetEm Raghavendra Prabhu  raghavendra.d.prabhu@gmail.com Percona  raghavendra.prabhu@percona.com  randomsurfer  wnohang.net  rdprabhu  ronin13
  2. 2. The Title
  3. 3. Split Brain?
  4. 4. Split brain
  5. 5. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68
  6. 6. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68
  7. 7. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68
  8. 8. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68
  9. 9. 20000 feet view
  10. 10. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 7 / 68
  11. 11. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 7 / 68
  12. 12. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 7 / 68
  13. 13. Introduction Actors ▶ Containers - Docker ▶ Load ♦ Generators - Sysbench, RQG ▶ Network ♦ Dnsmasq ♦ nsenter Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 8 / 68
  14. 14. Introduction Actors ▶ Containers - Docker ▶ Load ♦ Generators - Sysbench, RQG ▶ Network ♦ Dnsmasq ♦ nsenter Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 8 / 68
  15. 15. Introduction Actors ▶ Jenkins ♦ Build flow and CI ▶ Storage ♦ Why Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 9 / 68
  16. 16. Distributed Systems Testing A Kobayashi Maru Cheat on CAP!
  17. 17. Details Rationale ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68
  18. 18. Details Rationale ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68
  19. 19. Details Rationale ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68
  20. 20. Details Rationale ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68
  21. 21. Details Rationale ▶ Failures in warehouses. ▶ Not quorum, but consensus. ▶ Real world networks and synchronous replication - Delay - Partition - Non-graceful exits Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 12 / 68
  22. 22. Galera
  23. 23. Details Galera ▶ Data-centric approach ▶ Extended Virtual Synchrony ▶ Causality and Synchronous ▶ Flow control and temporal Synchrony Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 14 / 68
  24. 24. Details Galera ▶ Latency - Global ordering - Certification and not apply - Communication overhead ▶ Layers - Replication - Certification - Group communication ▶ Isolation - REPEATABLE-READ - SNAPSHOT-ISOLATION Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 15 / 68
  25. 25. Where did it start
  26. 26. Details Where did it start ▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192 ▶ Loss of PC ▶ Crash ▶ HAT Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 20 / 68
  27. 27. One can bring the whole down
  28. 28. Details Tests ▶ Chaos testing ▶ Flow control with sysbench ▶ Network Loss ▶ Future Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 22 / 68
  29. 29. There is no higher menace than distributed systems testing
  30. 30. Details NetEm ▶ Initial setup - Bridge - Egress only - IFB - Present state ▶ NetEm - tc qdisc buckets - packet loss, delay, corruption, duplication, reordering - nsenter ▶ Future - Docker exec - Rocket ACI Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 24 / 68
  31. 31. Details Tests: Chaos testing ▶ Nodes killed at random around sysbench ▶ Less than half of nodes are chosen ▶ docker inspect && SIGKILL ▶ Configurable sleep && retry ♦ Snapshot/Incremental State Transfer - Composability of transactional databases ▶ docker restart && repeat Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 25 / 68
  32. 32. Details Tests: Network Loss ▶ Loss nodes ▶ Detach/Keep qdisc ▶ Reconciliation ▶ Sanity checks ▶ Formation of PC || time to recover Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 26 / 68
  33. 33. The Flow
  34. 34. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  35. 35. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  36. 36. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  37. 37. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  38. 38. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  39. 39. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  40. 40. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  41. 41. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  42. 42. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  43. 43. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  44. 44. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  45. 45. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  46. 46. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  47. 47. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  48. 48. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  49. 49. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  50. 50. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68
  51. 51. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68
  52. 52. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68
  53. 53. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68
  54. 54. Plumbing the pressure
  55. 55. Details Parameters ▶ NetEm ▶ Qdisc detach ▶ fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68
  56. 56. Details Parameters ▶ NetEm ▶ Qdisc detach ▶ fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68
  57. 57. Details Parameters ▶ NetEm ▶ Qdisc detach ▶ fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68
  58. 58. Details Parameters ▶ NetEm ▶ Qdisc detach ▶ fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68
  59. 59. Containers!
  60. 60. Details Docker ▶ Why not virtualize Occam Namespaces ▶ Simplicity ♦ Network Logical scalability ♦ One application per node Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 34 / 68
  61. 61. Details Docker ▶ Portability - Qualitative behavior. ▶ Reproducibility - Makes it determinstic ▶ Configurable and CI - Byproducts Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 35 / 68
  62. 62. Details Docker ▶ QEMU vis-à-vis Docker ▶ Scalability ♦ Performance ♦ Feature ▶ Abstraction of channels Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 36 / 68
  63. 63. Details Container Networking ▶ Linking didn’t help ▶ Dnsmasq to rescue! ♦ Hosts file and volumes ♦ SIGHUP and refresh ▶ Potential issues Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 37 / 68
  64. 64. Testing methods
  65. 65. Details Overview ▶ Transient noise ▶ Lasting ’sickness’ ▶ Sick nodes ▶ Dead members Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 39 / 68
  66. 66. Details Method I ▶ Qdisc is detached after load ▶ Objective - Time to recover of full cluster ▶ Done with a larger subset Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 40 / 68
  67. 67. Details Method II ▶ Qdisc is kept till the end ▶ Objective - Formation of primary component ▶ Comparatively smaller set Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 41 / 68
  68. 68. Details Observations ▶ Post sanity types - Why ▶ Which method is more pertinent ▶ State transfer issues - Beginning - During re-emergence Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 42 / 68
  69. 69. Details Observations ▶ Direct load to affected nodes ▶ Partition external to system ▶ Logs - journalctl - Streaming? Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 43 / 68
  70. 70. Details Other noises ▶ Aim ▶ Fsync - libeatmydata - Variance ▶ Correlation with network ▶ How with Docker - LD_PRELOAD Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 44 / 68
  71. 71. System Load
  72. 72. Details Load generation ▶ Sysbench - Generation - Reconnect on partition ▶ Sockets chosen - Load on affected nodes ▶ Distribution of Load - RR with socat - Native sysbench support - HAProxy? Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 46 / 68
  73. 73. Details Load generation ▶ Nature of data/load - DDL ▶ RQG in future - Fuzz testing Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 47 / 68
  74. 74. The Fix
  75. 75. Strike Out!
  76. 76. Details Eviction ▶ STONITH ▶ Permanent eviction ▶ ’N’ strikes & out! - Timers - evs parameters - wsrep_evs_delayed and wsrep_evs_evict_list Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 50 / 68
  77. 77. Details Eviction ▶ Aim ▶ Quorum required - Why? - Not shoot each other - Non-PC nodes also. Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 51 / 68
  78. 78. Details Eviction ▶ Aim ▶ Quorum required - Why? - Not shoot each other - Non-PC nodes also. Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 51 / 68
  79. 79. Details Coredumps with Docker ▶ Breakdown of abstraction ▶ Lack of isolation ▶ What was done - Volumes - core_pattern & sysctl - suid and ulimit Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 52 / 68
  80. 80. Details WAN Segments ▶ How they work ▶ Simulates data center ▶ Random allocation - latency multiplier ▶ Joiner starvation ▶ Donor selection Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 53 / 68
  81. 81. Epilogue The code ▶ Github: - https://github.com/percona/pxc-docker - https://github.com/percona/percona-xtradb-cluster/ - https://github.com/percona/galera ▶ Jenkins: - http://jenkins.percona.com/job/PXC-5.6-netem/ - http://jenkins.percona.com/job/PXC-5.6-bench/ - http://jenkins.percona.com/job/PXC-5.6-chaos/ ▶ Contributions/testing/bugs welcome! Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 54 / 68
  82. 82. Epilogue Code: todo ▶ Docker automated builds ▶ Orchestration ▶ Docker ♦ Injection ♦ Signal proxying Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 55 / 68
  83. 83. Epilogue Code: todo ▶ => Proof of concept to a framework => ▶ Run it bare - CoreOS, Atomic ▶ Overlay with etcd/fleet/libswarm Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 56 / 68
  84. 84. Future work
  85. 85. Epilogue Future work ▶ Fault injection ♦ Memory - Poisoned memory ♦ Disk - libeatmydata - Opposite - ENOSPC Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 58 / 68
  86. 86. Epilogue Fault injection ▶ CPU - NUMA? - Hotplug ▶ More network - corruption, duplication, reordering, rate-limit - Better distribution - Other shaping Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 59 / 68
  87. 87. Worst case improves Average case
  88. 88. Epilogue Future work ▶ Disturb cluster more! - Membership changes * Manual eviction * Pull the cord! - Corrupt nodes ▶ Introduce inconsistencies - Consistency voting - Silent corruptions Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 61 / 68
  89. 89. Epilogue Eventual consistency ▶ CAP ▶ Latency factor ▶ Is Galera EC? No! - ACIDs only, No BASE ▶ Bounded Staleness - PBS ▶ ACID and CAP ▶ Instrumentation ▶ Lambda architecture Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 62 / 68
  90. 90. Epilogue Further Reading ▶ Resources ▶ Byzantine fault tolerance - Reaching agreement in presence of faults ▶ The Network is Reliable ▶ NetEm ▶ Latency: The New Web Performance Bottleneck ▶ Galera Cluster Documentation ▶ Auto eviction code ▶ Don’t Settle for Eventual Consistency ▶ Extended Virtual Synchrony ▶ Galera Flow Control Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 63 / 68
  91. 91. Epilogue Further Reading ▶ Worst-Case Distributed Systems Design ▶ HAT, not CAP: Introducing Highly Available Transactions ▶ Bridging the Gap: Opportunities in Coordination-Avoiding Databases ▶ Linearizability versus Serializability Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 64 / 68
  92. 92. Epilogue We are Hiring Too! ▶ Looking for build engineer - Packaging and Jenkins/CI are your strengths and you are a linux geek. bonus points if you are a linux distro user/contributor/maintainer. ▶ Senior C/C++ developer - if linux userspace development and databases (and distributed systems) is your thing. ▶ Apply here: http://percona.theresumator.com/. Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 65 / 68
  93. 93. Conference for Database geeks! My Talk: Securing databases with systemd for containers and services
  94. 94. Epilogue About/Contact - HA compliant ▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB Cluster, Percona. ▶ Slides will be at slideshare.net/slidunder. ▶ About.me: raghavendra.prabhu ▶ Keybase.io: rdprabhu ▶ Presentation under CC BY-SA 4.0 Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 67 / 68
  95. 95. Epilogue Image Credits ▶ http://galeracluster.com/documentation-webpages/ ▶ https://en.wikipedia.org/wiki/Network_theory ▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png ▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354 ▶ https://flic.kr/p/9J6GNu ▶ http://schauerte.me/data.html ▶ https://secure.flickr.com/photos/brewbooks/7780990192 ▶ https://www.flickr.com/photos/kwerfeldein/2649294869 ▶ https://secure.flickr.com/photos/mindmob/51951632 ▶ https://secure.flickr.com/photos/arenamontanus/2227769907 ▶ https://www.flickr.com/photos/markop/477199204 ▶ https://www.flickr.com/photos/gcwest/281385801 ▶ https://www.flickr.com/photos/29233640@N07/13466208953 ▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/ ▶ http://ok-panic.net/art/jeff/dennis.jpg ▶ https://www.facebook.com/sciencedump/photos/a.296290153732762.90161. 111815475513565/985102638184840/?type=1 ▶ http://upload.wikimedia.org/wikipedia/commons/0/05/Sna_large.png ▶ http://background-kid.com/background-images-light-blue-color.html Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 68 / 68

×