O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Scalability, Availability & Stability Patterns

463.272 visualizações

Publicada em

Overview of scalability, availability and stability patterns, techniques and products.

Publicada em: Tecnologia, Diversão e humor
  • Entre para ver os comentários

Scalability, Availability & Stability Patterns

  1. Scalability,Availability &StabilityPatternsJonas BonérCTO Typesafetwitter: @jboner
  2. Outline
  3. Outline
  4. Outline
  5. Outline
  6. Outline
  7. Introduction
  8. Scalability Patterns
  9. Managing Overload
  10. Scale up vs Scale out?
  11. Generalrecommendations• Immutability as the default• Referential Transparency (FP)• Laziness• Think about your data:• Different data need different guarantees
  12. Scalability Trade-offs
  13. Trade-offs•Performance vs Scalability•Latency vs Throughput•Availability vs Consistency
  14. PerformancevsScalability
  15. How do I know if I have aperformance problem?
  16. How do I know if I have aperformance problem?If your system isslow for a single user
  17. How do I know if I have ascalability problem?
  18. How do I know if I have ascalability problem?If your system isfast for a single userbut slow under heavy load
  19. LatencyvsThroughput
  20. You should strive formaximal throughputwithacceptable latency
  21. AvailabilityvsConsistency
  22. Brewer’sCAPtheorem
  23. You can only pick 2ConsistencyAvailabilityPartition toleranceAt a given point in time
  24. Centralized system• In a centralized system (RDBMS etc.)we don’t have network partitions, e.g.P in CAP• So you get both:•Availability•Consistency
  25. AtomicConsistentIsolatedDurable
  26. Distributed system• In a distributed system we (will) havenetwork partitions, e.g. P in CAP• So you get to only pick one:•Availability•Consistency
  27. CAP in practice:• ...there are only two types of systems:1. CP2. AP• ...there is only one choice to make. Incase of a network partition, what doyou sacrifice?1. C: Consistency2. A:Availability
  28. Basically AvailableSoft stateEventually consistent
  29. Eventual Consistency...is an interesting trade-off
  30. Eventual Consistency...is an interesting trade-offBut let’s get back to that later
  31. Availability Patterns
  32. •Fail-over•Replication• Master-Slave• Tree replication• Master-Master• Buddy ReplicationAvailability Patterns
  33. What do we mean withAvailability?
  34. Fail-over
  35. Fail-overCopyrightMichael Nygaard
  36. Fail-overBut fail-over is not always this simpleCopyrightMichael Nygaard
  37. Fail-overCopyrightMichael Nygaard
  38. Fail-backCopyrightMichael Nygaard
  39. Network fail-over
  40. Replication
  41. • Active replication - Push• Passive replication - Pull• Data not available, read from peer,then store it locally• Works well with timeout-basedcachesReplication
  42. • Master-Slave replication• Tree Replication• Master-Master replication• Buddy replicationReplication
  43. Master-Slave Replication
  44. Master-Slave Replication
  45. Tree Replication
  46. Master-Master Replication
  47. Buddy Replication
  48. Buddy Replication
  49. Scalability Patterns:State
  50. •Partitioning•HTTP Caching•RDBMS Sharding•NOSQL•Distributed Caching•Data Grids•ConcurrencyScalability Patterns: State
  51. Partitioning
  52. HTTP CachingReverse Proxy• Varnish• Squid• rack-cache• Pound• Nginx• Apache mod_proxy• Traffic Server
  53. HTTP CachingCDN,Akamai
  54. Generate Static ContentPrecompute content• Homegrown + cron or Quartz• Spring Batch• Gearman• Hadoop• Google Data Protocol• Amazon Elastic MapReduce
  55. HTTP CachingFirst request
  56. HTTP CachingSubsequent request
  57. Service of RecordSoR
  58. Service of Record•Relational Databases (RDBMS)•NOSQL Databases
  59. How toscale outRDBMS?
  60. Sharding•Partitioning•Replication
  61. Sharding: Partitioning
  62. Sharding: Replication
  63. ORM + rich domain modelanti-pattern•Attempt:• Read an object from DB•Result:• You sit with your whole database in your lap
  64. Think about your data• When do you need ACID?• When is Eventually Consistent a better fit?• Different kinds of data has different needsThink again
  65. When isa RDBMSnotgood enough?
  66. Scaling readsto a RDBMSis hard
  67. Scaling writesto a RDBMSis impossible
  68. Do wereally needa RDBMS?
  69. Do wereally needa RDBMS?Sometimes...
  70. Do wereally needa RDBMS?
  71. Do wereally needa RDBMS?But many times we don’t
  72. NOSQL(Not Only SQL)
  73. •Key-Value databases•Column databases•Document databases•Graph databases•Datastructure databasesNOSQL
  74. Who’s ACID?• Relational DBs (MySQL, Oracle, Postgres)• Object DBs (Gemstone, db4o)• Clustering products (Coherence,Terracotta)• Most caching products (ehcache)
  75. Who’s BASE?Distributed databases• Cassandra• Riak• Voldemort• Dynomite,• SimpleDB• etc.
  76. • Google: Bigtable• Amazon: Dynamo• Amazon: SimpleDB• Yahoo: HBase• Facebook: Cassandra• LinkedIn: VoldemortNOSQL in the wild
  77. But first some background...
  78. • Distributed Hash Tables (DHT)• Scalable• Partitioned• Fault-tolerant• Decentralized• Peer to peer• Popularized• Node ring• Consistent HashingChord & Pastry
  79. Node ring with Consistent HashingFind data in log(N) jumps
  80. “How can we build a DB on top of GoogleFile System?”• Paper: Bigtable:A distributed storage systemfor structured data, 2006• Rich data-model, structured storage• Clones:HBaseHypertableNeptuneBigtable
  81. “How can we build a distributedhash table for the data center?”• Paper: Dynamo:Amazon’s highly available key-value store, 2007• Focus: partitioning, replication and availability• Eventually Consistent• Clones:VoldemortDynomiteDynamo
  82. Types of NOSQL stores• Key-Value databases (Voldemort, Dynomite)• Column databases (Cassandra,Vertica, Sybase IQ)• Document databases (MongoDB, CouchDB)• Graph databases (Neo4J,AllegroGraph)• Datastructure databases (Redis, Hazelcast)
  83. Distributed Caching
  84. •Write-through•Write-behind•Eviction Policies•Replication•Peer-To-Peer (P2P)Distributed Caching
  85. Write-through
  86. Write-behind
  87. Eviction policies• TTL (time to live)• Bounded FIFO (first in first out)• Bounded LIFO (last in first out)• Explicit cache invalidation
  88. Peer-To-Peer• Decentralized• No “special” or “blessed” nodes• Nodes can join and leave as they please
  89. •EHCache•JBoss Cache•OSCache•memcachedDistributed CachingProducts
  90. memcached• Very fast• Simple• Key-Value (string -­‐>  binary)• Clients for most languages• Distributed• Not replicated - so 1/N chancefor local access in cluster
  91. Data Grids / Clustering
  92. Data Grids/ClusteringParallel data storage• Data replication• Data partitioning• Continuous availability• Data invalidation• Fail-over• C + P in CAP
  93. Data Grids/ClusteringProducts• Coherence• Terracotta• GigaSpaces• GemStone• Tibco Active Matrix• Hazelcast
  94. Concurrency
  95. •Shared-State Concurrency•Message-Passing Concurrency•Dataflow Concurrency•Software Transactional MemoryConcurrency
  96. Shared-StateConcurrency
  97. •Everyone can access anything anytime•Totally indeterministic•Introduce determinism at well-definedplaces...•...using locksShared-State Concurrency
  98. •Problems with locks:• Locks do not compose• Taking too few locks• Taking too many locks• Taking the wrong locks• Taking locks in the wrong order• Error recovery is hardShared-State Concurrency
  99. Please use java.util.concurrent.*• ConcurrentHashMap• BlockingQueue• ConcurrentQueue  • ExecutorService• ReentrantReadWriteLock• CountDownLatch• ParallelArray• and  much  much  more..Shared-State Concurrency
  100. Message-PassingConcurrency
  101. •Originates in a 1973 paper by CarlHewitt•Implemented in Erlang, Occam, Oz•Encapsulates state and behavior•Closer to the definition of OOthan classesActors
  102. Actors• Share NOTHING• Isolated lightweight processes• Communicates through messages• Asynchronous and non-blocking• No shared state… hence, nothing to synchronize.• Each actor has a mailbox (message queue)
  103. • Easier to reason about• Raised abstraction level• Easier to avoid–Race conditions–Deadlocks–Starvation–Live locksActors
  104. • Akka (Java/Scala)• scalaz actors (Scala)• Lift Actors (Scala)• Scala Actors (Scala)• Kilim (Java)• Jetlang (Java)• Actor’s Guild (Java)• Actorom (Java)• FunctionalJava (Java)• GPars (Groovy)Actor libs for the JVM
  105. DataflowConcurrency
  106. • Declarative• No observable non-determinism• Data-driven – threads block untildata is available• On-demand, lazy• No difference between:• Concurrent &• Sequential code• Limitations: can’t have side-effectsDataflow Concurrency
  107. STM:SoftwareTransactional Memory
  108. STM: overview• See the memory (heap and stack)as a transactional dataset• Similar to a database• begin• commit• abort/rollback•Transactions are retriedautomatically upon collision• Rolls back the memory on abort
  109. • Transactions can nest• Transactions compose (yipee!!)atomic  {              ...              atomic  {                    ...                }        }  STM: overview
  110. All operations in scope ofa transaction:l Need to be idempotentSTM: restrictions
  111. • Akka (Java/Scala)• Multiverse (Java)• Clojure STM (Clojure)• CCSTM (Scala)• Deuce STM (Java)STM libs for the JVM
  112. Scalability Patterns:Behavior
  113. •Event-Driven Architecture•Compute Grids•Load-balancing•Parallel ComputingScalability Patterns:Behavior
  114. Event-DrivenArchitecture“Four years from now,‘mere mortals’ will begin toadopt an event-driven architecture (EDA) for thesort of complex event processing that has beenattempted only by software gurus [until now]”--Roy Schulte (Gartner), 2003
  115. • Domain Events• Event Sourcing• Command and Query ResponsibilitySegregation (CQRS) pattern• Event Stream Processing• Messaging• Enterprise Service Bus• Actors• Enterprise Integration Architecture (EIA)Event-Driven Architecture
  116. Domain Events“Its really become clear to me in the lastcouple of years that we need a new buildingblock and that is the Domain Events”-- Eric Evans, 2009
  117. Domain Events“Domain Events represent the state of entitiesat a given time when an important eventoccurred and decouple subsystems with eventstreams. Domain Events give us clearer, moreexpressive models in those cases.”-- Eric Evans, 2009
  118. Domain Events“State transitions are an important part ofour problem space and should be modeledwithin our domain.”-- GregYoung, 2008
  119. Event Sourcing• Every state change is materialized in an Event• All Events are sent to an EventProcessor• EventProcessor stores all events in an Event Log• System can be reset and Event Log replayed• No need for ORM, just persist the Events• Many different EventListeners can be added toEventProcessor (or listen directly on the Event log)
  120. Event Sourcing
  121. “A single model cannot be appropriatefor reporting, searching andtransactional behavior.”-- GregYoung, 2008Command and QueryResponsibility Segregation(CQRS) pattern
  122. BidirectionalBidirectional
  123. UnidirectionalUnidirectionalUnidirectional
  124. CQRSin a nutshell• All state changes are represented by Domain Events• Aggregate roots receive Commands and publish Events• Reporting (query database) is updated as a result of thepublished Events•All Queries from Presentation go directly to Reportingand the Domain is not involved
  125. CQRSCopyright by Axis Framework
  126. CQRS: Benefits• Fully encapsulated domain that only exposesbehavior• Queries do not use the domain model• No object-relational impedance mismatch• Bullet-proof auditing and historical tracing• Easy integration with external systems• Performance and scalability
  127. Event Stream Processingselect  *  from  Withdrawal(amount>=200).win:length(5)
  128. Event Stream ProcessingProducts• Esper (Open Source)• StreamBase• RuleCast
  129. Messaging• Publish-Subscribe• Point-to-Point• Store-forward• Request-Reply
  130. Publish-Subscribe
  131. Point-to-Point
  132. Store-ForwardDurability, event log, auditing etc.
  133. Request-ReplyF.e.AMQP’s ‘replyTo’ header
  134. Messaging• Standards:• AMQP• JMS• Products:• RabbitMQ (AMQP)• ActiveMQ (JMS)• Tibco• MQSeries• etc
  135. ESB
  136. ESB products• ServiceMix (Open Source)• Mule (Open Source)• Open ESB (Open Source)• Sonic ESB• WebSphere ESB• Oracle ESB• Tibco• BizTalk Server
  137. Actors• Fire-forget• Async send• Fire-And-Receive-Eventually• Async send + wait on Future for reply
  138. Enterprise IntegrationPatterns
  139. Enterprise IntegrationPatternsApache Camel• More than 80 endpoints• XML (Spring) DSL• Scala DSL
  140. Compute Grids
  141. Compute GridsParallel execution• Divide and conquer1. Split up job in independent tasks2. Execute tasks in parallel3. Aggregate and return result• MapReduce - Master/Worker
  142. Compute GridsParallel execution• Automatic provisioning• Load balancing• Fail-over• Topology resolution
  143. Compute GridsProducts• Platform• DataSynapse• Google MapReduce• Hadoop• GigaSpaces• GridGain
  144. Load balancing
  145. • Random allocation• Round robin allocation• Weighted allocation• Dynamic load balancing• Least connections• Least server CPU• etc.Load balancing
  146. Load balancing• DNS Round Robin (simplest)• Ask DNS for IP for host• Get a new IP every time• Reverse Proxy (better)• Hardware Load Balancing
  147. Load balancing products• Reverse Proxies:• Apache mod_proxy (OSS)• HAProxy (OSS)• Squid (OSS)• Nginx (OSS)• Hardware Load Balancers:• BIG-IP• Cisco
  148. Parallel Computing
  149. • UE: Unit of Execution• Process• Thread• Coroutine• ActorParallel Computing• SPMD Pattern• Master/Worker Pattern• Loop Parallelism Pattern• Fork/Join Pattern• MapReduce Pattern
  150. SPMD Pattern• Single Program Multiple Data• Very generic pattern, used in manyother patterns• Use a single program for all the UEs• Use the UE’s ID to select differentpathways through the program. F.e:• Branching on ID• Use ID in loop index to split loops• Keep interactions between UEs explicit
  151. Master/Worker
  152. Master/Worker• Good scalability• Automatic load-balancing• How to detect termination?• Bag of tasks is empty• Poison pill• If we bottleneck on single queue?• Use multiple work queues• Work stealing• What about fault tolerance?• Use “in-progress” queue
  153. Loop Parallelism•Workflow1.Find the loops that are bottlenecks2.Eliminate coupling between loop iterations3.Parallelize the loop•If too few iterations to pull its weight• Merge loops• Coalesce nested loops•OpenMP• omp  parallel  for
  154. What if task creation can’t be handled by:• parallelizing loops (Loop Parallelism)• putting them on work queues (Master/Worker)
  155. What if task creation can’t be handled by:• parallelizing loops (Loop Parallelism)• putting them on work queues (Master/Worker)EnterFork/Join
  156. •Use when relationship between tasksis simple•Good for recursive data processing•Can use work-stealing1. Fork:Tasks are dynamically created2. Join:Tasks are later terminated anddata aggregatedFork/Join
  157. Fork/Join•Direct task/UE mapping• 1-1 mapping between Task/UE• Problem: Dynamic UE creation is expensive•Indirect task/UE mapping• Pool the UE• Control (constrain) the resource allocation• Automatic load balancing
  158. Java 7 ParallelArray (Fork/Join DSL)Fork/Join
  159. Java 7 ParallelArray (Fork/Join DSL)ParallelArray  students  =      new  ParallelArray(fjPool,  data);double  bestGpa  =  students.withFilter(isSenior)                                                    .withMapping(selectGpa)                                                    .max();Fork/Join
  160. • Origin from Google paper 2004• Used internally @ Google• Variation of Fork/Join• Work divided upfront not dynamically• Usually distributed• Normally used for massive data crunchingMapReduce
  161. • Hadoop (OSS), used @Yahoo• Amazon Elastic MapReduce• Many NOSQL DBs utilizes itfor searching/queryingMapReduceProducts
  162. MapReduce
  163. Parallel Computingproducts• MPI• OpenMP• JSR166 Fork/Join• java.util.concurrent• ExecutorService, BlockingQueue etc.• ProActive Parallel Suite• CommonJ WorkManager (JEE)
  164. Stability Patterns
  165. •Timeouts•Circuit Breaker•Let-it-crash•Fail fast•Bulkheads•Steady State•ThrottlingStability Patterns
  166. TimeoutsAlways use timeouts (if possible):• Thread.wait(timeout)• reentrantLock.tryLock• blockingQueue.poll(timeout,  timeUnit)/offer(..)• futureTask.get(timeout,  timeUnit)• socket.setSoTimeOut(timeout)• etc.
  167. Circuit Breaker
  168. Let it crash• Embrace failure as a natural state inthe life-cycle of the application• Instead of trying to prevent it;manage it• Process supervision• Supervisor hierarchies (from Erlang)
  169. Restart StrategyOneForOne
  170. Restart StrategyOneForOne
  171. Restart StrategyOneForOne
  172. Restart StrategyAllForOne
  173. Restart StrategyAllForOne
  174. Restart StrategyAllForOne
  175. Restart StrategyAllForOne
  176. Supervisor Hierarchies
  177. Supervisor Hierarchies
  178. Supervisor Hierarchies
  179. Supervisor Hierarchies
  180. Fail fast• Avoid “slow responses”• Separate:• SystemError - resources not available• ApplicationError - bad user input etc• Verify resource availability beforestarting expensive task• Input validation immediately
  181. Bulkheads
  182. Bulkheads• Partition and toleratefailure in one part• Redundancy• Applies to threads as well:• One pool for admin tasksto be able to perform taskseven though all threads areblocked
  183. Steady State• Clean up after you• Logging:• RollingFileAppender (log4j)• logrotate (Unix)• Scribe - server for aggregating streaming log data• Always put logs on separate disk
  184. Throttling• Maintain a steady pace• Count requests• If limit reached, back-off (drop, raise error)• Queue requests• Used in for example Staged Event-DrivenArchitecture (SEDA)
  185. ?
  186. thanksfor listening
  187. Extra material
  188. Client-side consistency• Strong consistency• Weak consistency• Eventually consistent• Never consistent
  189. Client-sideEventual Consistency levels• Casual consistency• Read-your-writes consistency (important)• Session consistency• Monotonic read consistency (important)• Monotonic write consistency
  190. Server-side consistencyN = the number of nodes that store replicas ofthe dataW = the number of replicas that need toacknowledge the receipt of the update before theupdate completesR = the number of replicas that are contactedwhen a data object is accessed through a read operation
  191. Server-side consistencyW + R > N strong consistencyW + R <= N eventual consistency