O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Crossroads of Asynchrony and Graceful Degradation

600 visualizações

Publicada em

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1VmbI3t.

Nitesh Kant describes how embracing asynchrony in the Netflix applications, from networking to business processing, creates gracefully degrading and highly resilient applications. Filmed at qconsf.com.

Nitesh Kant is an engineer in Netflix’s Edge Gateway team, working on Netflix’s asynchronous Inter Process Communication stack. He is the author of RxNetty which forms the core of this stack and is currently moving Zuul to this new architecture.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Crossroads of Asynchrony and Graceful Degradation

  1. 1. Crossroads of asynchrony and graceful degradation Nitesh Kant,! Software Engineer, ! Netflix Edge Engineering.! ! @NiteshKant
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /netflix-asynchronous-apps
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  4. 4. Nitesh Kant Who Am I? ❖ Engineer, Edge Engineering, Netflix.! ❖ Core contributor, RxNetty*! ❖ Contributor, Zuul** * https://github.com/ReactiveX/RxNetty ** https://github.com/Netflix/zuul @NiteshKant
  5. 5. How do systems fail?
  6. 6. A simple example. Showing a movie on Netflix.
  7. 7. Video Metadata
  8. 8. Video Bookmark
  9. 9. Video Rating
  10. 10. ! public Movie getMovie(String movieId) {! Metadata metadata = getMovieMetadata(movieId);! Bookmark bookmark = getBookmark(movieId, userId);! Rating rating = getRatings(movieId);! return new Movie(metadata, bookmark, rating);! }! Disclaimer: This is an example and not an exact representation of the processing
  11. 11. Synchronicity ! public Movie getMovie(String movieId) {! Metadata metadata = getMovieMetadata(movieId);! Bookmark bookmark = getBookmark(movieId, userId);! Rating rating = getRatings(movieId);! return new Movie(metadata, bookmark, rating);! }! Disclaimer: This is an example and not an exact representation of the processing
  12. 12. The bigger picture Price of being synchronous? ! public Movie getMovie(String movieId) {! Metadata metadata = getMovieMetadata(movieId);! Bookmark bookmark = getBookmark(movieId, userId);! Rating rating = getRatings(movieId);! return new Movie(metadata, bookmark, rating);! }! Disclaimer: This is an example and not an exact representation of the processing
  13. 13. In a microservices world Edge Service Ratings ServiceVideo Metadata Service Bookmarks Service Disclaimer: This is an example and not an exact representation of the processing
  14. 14. In a microservices world Edge Service Server threadpool Thread Thread Thread Thread Thread getMovieMetadata(movieId) Disclaimer: This is an example and not an exact representation of the processing
  15. 15. In a microservices world Edge Service Server threadpool Thread Thread Thread Thread Thread getMovieMetadata(movieId) getBookmark(movieId, userId) Disclaimer: This is an example and not an exact representation of the processing
  16. 16. In a microservices world Edge Service Server threadpool Thread Thread Thread Thread Thread getMovieMetadata(movieId) getBookmark(movieId, userId) getRatings(movieId) Disclaimer: This is an example and not an exact representation of the processing
  17. 17. Busy thread time = Sum of the time taken to make all 3 service calls
  18. 18. How do systems fail? 1. Latency Latency is your worst enemy in a synchronous world.
  19. 19. Edge Service Server threadpool Thread Thread Thread Thread Thread getRatings(movieId) Disclaimer: This is an example and not an exact representation of the processing
  20. 20. Disclaimer: This is an example and not an exact representation of the processing Ratings Service Edge Service Server threadpool Thread Thread Thread Thread Thread getRatings(movieId)
  21. 21. Disclaimer: This is an example and not an exact representation of the processing Edge Service getRatings(movieId) Server threadpool Thread Thread Thread Thread Thread
  22. 22. Edge Service Disclaimer: This is an example and not an exact representation of the processing getRatings(movieId) Server threadpool Thread Thread Thread Thread Thread
  23. 23. Edge Service Server threadpool Thread Thread Thread Thread Thread Disclaimer: This is an example and not an exact representation of the processing Client Threadpool Thread Thread Thread Thread Thread getRatings(movieId)
  24. 24. Edge Service Server threadpool Thread Thread Thread Thread Thread Disclaimer: This is an example and not an exact representation of the processing Client Threadpool Thread Thread Thread Thread Thread getRatings(movieId)
  25. 25. Edge Service Server threadpool Thread Thread Thread Thread Thread Disclaimer: This is an example and not an exact representation of the processing Client Threadpool Thread Thread Thread Thread Thread getRatings(movieId)
  26. 26. Managing client thread pools Disclaimer: This is an example and not an exact representation of the processing Client Threadpool Thread Thread Thread Thread Thread
  27. 27. Managing client thread pools
  28. 28. Managing client thread pools
  29. 29. Managing client thread pools
  30. 30. Managing client thread pools
  31. 31. Clients have become our babies
  32. 32. Clients have become our babies Edge Service Server threadpool Thread Thread Thread Thread Thread getMovieMetadata(movieId) Disclaimer: This is an example and not an exact representation of the processing Client Threadpool Thread Thread Thread Thread Thread Client Threadpool Thread Thread Thread Thread Thread Client Threadpool Thread Thread Thread Thread Thread getBookmark(movieId, userId) getRatings(movieId)
  33. 33. Clients have become our babies
  34. 34. Untuned/Wrongly tuned clients cause many outages.
  35. 35. Have we exchanged a bigger problem with a smaller one?
  36. 36. How do systems fail? 2. Overload Abusive clients, recovery spikes, special events ….
  37. 37. We did a load test… https://github.com/Netflix-Skunkworks/ WSPerfLab Hello Netflix!
  38. 38. Detailed analysis available online: https://github.com/Netflix-Skunkworks/WSPerfLab/blob/master/test-results/RxNetty_vs_Tomcat_April2015.pdf
  39. 39. ! Graceful This isn’t graceful degradation!
  40. 40. This happens at high CPU usage.
  41. 41. This happens at high CPU usage. So, don’t let the system reach that limit…
  42. 42. This happens at high CPU usage. So, don’t let the system reach that limit… a.k.a Throttling.
  43. 43. Fairness? One abusive request type can penalize other request paths.
  44. 44. How do systems fail? 3. Thundering herds The failure after recovery….
  45. 45. Retries Edge Service Video Metadata Service Disclaimer: This is an example and not an exact representation of the processing
  46. 46. Retries Edge Service Disclaimer: This is an example and not an exact representation of the processing Video Metadata Service Cluster
  47. 47. Retries Edge Service Disclaimer: This is an example and not an exact representation of the processing Video Metadata Service Cluster
  48. 48. Retries Edge Service Disclaimer: This is an example and not an exact representation of the processing Video Metadata Service Cluster
  49. 49. Retries are useful in steady state…. …but…
  50. 50. Retries Edge Service Disclaimer: This is an example and not an exact representation of the processing Video Metadata Service Cluster
  51. 51. Our systems are missing empathy.
  52. 52. Because they lack knowledge about the peers.
  53. 53. Knowledge comes from various signals..
  54. 54. Ability to adapt to those signals is important.
  55. 55. This can not adapt… ! public Movie getMovie(String movieId) {! Metadata metadata = getMovieMetadata(movieId);! Bookmark bookmark = getBookmark(movieId, userId);! Rating rating = getRatings(movieId);! return new Movie(metadata, bookmark, rating);! }! Disclaimer: This is an example and not an exact representation of the processing
  56. 56. Asynchrony It is the key to success.
  57. 57. What should be async?
  58. 58. What should be async? Edge Service Video Metadata Service
  59. 59. What should be async? Edge Service Video Metadata Service getMovieMetadata(movieId) getBookmark(movieId, userId) getRatings(movieId) Application logic
  60. 60. What should be async? Edge Service Video Metadata Service getMovieMetadata(movieId) getBookmark(movieId, userId) getRatings(movieId) I/O I/O I/O Application logic I/O
  61. 61. What should be async? Edge Service Video Metadata Service getMovieMetadata(movieId) getBookmark(movieId, userId) getRatings(movieId) I/O I/O I/O Application logic I/O Network protocol
  62. 62. Key aspects of being async.
  63. 63. Key aspects of being async. 1. Lifecycle control
  64. 64. Lifecycle control Start processing Stop processing
  65. 65. Key aspects of being async. 2. Flow control
  66. 66. Flow control When How much
  67. 67. Key aspects of being async. 3. Function composition
  68. 68. Function composition ! public Movie getMovie(String movieId) {! Metadata metadata = getMovieMetadata(movieId);! Bookmark bookmark = getBookmark(movieId, userId);! Rating rating = getRatings(movieId);! return new Movie(metadata, bookmark, rating);! }!
  69. 69. Function composition Composing the processing ! of a method into a single control point. public Observable<Movie> getMovie(String movieId) {! return Observable.zip(getMovieMetadata(movieId),! getBookmark(movieId, userId),! getRatings(movieId),! (meta,bmark,rating)->new Movie(meta,bmark,rating));! }!
  70. 70. Composing the processing ! of a method into a single control point. Flow & Lifecycle Control with
  71. 71. What should be async? Edge Service Video Metadata Service getMovieMetadata(movieId) getBookmark(movieId, userId) getRatings(movieId) I/O I/O I/O Application logic I/O Network protocol
  72. 72. I/O Edge Service Server threadpool Thread Thread Thread Thread Thread Client Threadpool Thread Thread Thread Thread Thread getRatings(movieId)
  73. 73. I/O Edge Service Disclaimer: This is an example and not an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection Eventloops = f (Number of cores)
  74. 74. I/O Edge Service Disclaimer: This is an example and not an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection Connections multiplexed ! on a ! single eventloop.
  75. 75. I/O Disclaimer: This is an example and not an exact representation of the processing Edge Service Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection
  76. 76. I/O Edge Service Disclaimer: This is an example and not an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection Clients share the eventloops! with! the server.
  77. 77. I/O Edge Service Disclaimer: This is an example and not an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection All clients share ! the! same eventloop
  78. 78. Composing the processing ! of a service into a single control point. Flow & Lifecycle Control with
  79. 79. What should be async? Edge Service Video Metadata Service getMovieMetadata(movieId) getBookmark(movieId, userId) getRatings(movieId) I/O I/O I/O Application logic I/O Network protocol
  80. 80. Network Protocol HTTP/1.1?
  81. 81. HTTP/1.1 GET /movie?id=1 HTTP/1.1
  82. 82. HTTP/1.1 GET /movie?id=2 HTTP/1.1 GET /movie?id=1 HTTP/1.1
  83. 83. HTTP/1.1 GET /movie?id=3 HTTP/1.1 GET /movie?id=2 HTTP/1.1 GET /movie?id=1 HTTP/1.1
  84. 84. HTTP/1.1 GET /movie?id=3 HTTP/1.1 GET /movie?id=2 HTTP/1.1 GET /movie?id=1 HTTP/1.1 HTTP/1.1 200 OK! ID: 1! …
  85. 85. HTTP/1.1 GET /movie?id=3 HTTP/1.1 GET /movie?id=2 HTTP/1.1 GET /movie?id=1 HTTP/1.1 HTTP/1.1 200 OK! ID: 1! … HTTP/1.1 200 OK! ID: 2! …
  86. 86. HTTP/1.1 GET /movie?id=3 HTTP/1.1 GET /movie?id=2 HTTP/1.1 GET /movie?id=1 HTTP/1.1 HTTP/1.1 200 OK! ID: 1! … HTTP/1.1 200 OK! ID: 2! … HTTP/1.1 200 OK! ID: 3! …
  87. 87. HTTP/1.1 GET /movie?id=3 GET /movie?id=2 GET /movie?id=1 HTTP/1.1 200 ID: 1! … HTTP/1.1 200 ID: 2! … HTTP/1.1 200 ID: 3! … Head Of Line Blocking => Synchronous
  88. 88. Network Protocol HTTP/1.1?
  89. 89. Network Protocol We need a multiplexed bi-directional protocol
  90. 90. Multiplexed GET /movie?id=1 HTTP/1.1 GET /movie?id=2 HTTP/1.1 GET /movie?id=3 HTTP/1.1
  91. 91. Bi-directional GET /movie?id=1 HTTP/1.1 GET /movie?id=2 HTTP/1.1 GET /movie?id=3 HTTP/1.1 CANCEL
  92. 92. Composing the processing ! of the entire application into a single control point. Flow & Lifecycle Control with
  93. 93. Edge Service Video Metadata Service Rating service C* store C* store /movie?id=123 Disclaimer: This is an example and not an exact representation of the processing
  94. 94. Edge Service Video Metadata Service Rating service C* store C* store /movie?id=123 Disclaimer: This is an example and not an exact representation of the processing Observable<Movie>
  95. 95. Observable<Movie> Composing the processing ! of the entire application into a single control point.
  96. 96. Revisiting the failure modes
  97. 97. Latency Edge Service Disclaimer: This is an example and not an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection
  98. 98. Latency Edge Service Disclaimer: This is an example and not an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection Impact is localized to the connection.
  99. 99. Latency Edge Service Disclaimer: This is an example and not an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection Impact is localized to the connection. An outstanding request has little cost.
  100. 100. An outstanding request has little cost. GET /movie?id=1 HTTP/1.1 HTTP/1.1 200 OK! … } Any stored state between! request - response! is costly.
  101. 101. Outstanding requests have low cost so Latency is a lesser evil in asynchronous systems.
  102. 102. Overload & Thundering Herds Edge Service Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection Disclaimer: This is an example and not an exact representation of the processing Reduce work done ! when overloaded
  103. 103. Overload & Thundering Herds Edge Service Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection Disclaimer: This is an example and not an exact representation of the processing Reduce work done ! when overloaded Stop accepting! new requests.
  104. 104. Stop accepting new requests Non-blocking I/O gives better control
  105. 105. Stop accepting new requests But … we are still “throttling”
  106. 106. Stop accepting new requests Are we being empathetic?
  107. 107. Request-leasing http://reactivesocket.io/
  108. 108. Request-leasing Peer 1Peer 2 Network connection
  109. 109. Request-leasing Peer 1Peer 2 “Lease” 5 requests for 1 minute. Network connection
  110. 110. Request-leasing Peer 1Peer 2 “Lease” 5 requests for 1 minute. GET /movie?id=1 HTTP/1.1 GET /movie?id=2 HTTP/1.1 Network connection
  111. 111. Server Client 1 Client 2 Client 8
  112. 112. ServerCapacity: 100 RPM Client 1 Client 2 Client 8 “Lease” 10 requests for 1 minute.
  113. 113. ServerCapacity: 100 RPM Client 1 Client 2 Client 8 “Lease” 10 requests for 1 minute. “Lease” 10 requests for 1 minute. “Lease” 10 requests for 1 minute.
  114. 114. ServerCapacity: 100 RPM Client 1 Client 2 Client 8 “Lease” 10 requests for 1 minute. “Lease” 10 requests for 1 minute. “Lease” 10 requests for 1 minute. Reserve Capacity: ! 20 RPM
  115. 115. Time bound lease. “Lease” 10 requests for 1 minute.
  116. 116. Time bound lease. No extra work for cancelling leases. “Lease” 10 requests for 1 minute.
  117. 117. Time bound lease. No extra work for cancelling leases. Receiver controls the flow of requests “Lease” 10 requests for 1 minute.
  118. 118. When things go south
  119. 119. ServerCapacity: 20 RPM Client 1 Client 2 Client 8 “Lease” 5 requests for 1 minute. “Lease” 2 requests for 1 minute. X No more “Lease”
  120. 120. ServerCapacity: 20 RPM Client 1 Client 2 Client 8 “Lease” 5 requests for 1 minute. “Lease” 2 requests for 1 minute. X No more “Lease” Prioritization
  121. 121. Managing client configs?
  122. 122. Threadpools? Edge Service Disclaimer: This is an example and not an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection I/O! is! non-blocking.
  123. 123. Threadpools? Edge Service Disclaimer: This is an example and not an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection Application code ! is! non-blocking.
  124. 124. Threadpools? Disclaimer: This is an example and not an exact representation of the processing Edge Service Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection No blocking/Waiting => Only CPU work
  125. 125. Threadpools? Disclaimer: This is an example and not an exact representation of the processing Edge Service Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection No blocking/Waiting => Only CPU work So, Eventloops = # of cores
  126. 126. Tuning parameters?
  127. 127. Tuning parameters? X
  128. 128. Tuning parameters? X X
  129. 129. Tuning parameters? X X? ?
  130. 130. Case for timeouts?
  131. 131. Case for timeouts? Read Timeouts Thread Timeouts ✤ Useful in unblocking threads 
 on socket reads. ✤ Business level SLA. ✤ Unblock the calling thread.
  132. 132. Case for timeouts? Read Timeouts Thread Timeouts ✤ Useful in unblocking threads 
 on socket reads. ✤ Business level SLA. ✤ Unblock the calling thread. X X As there are no blocking calls.X
  133. 133. Case for timeouts? Read Timeouts Thread Timeouts ✤ Useful in unblocking threads 
 on socket reads. ✤ Business level SLA. ✤ Unblock the calling thread.
  134. 134. Business level SLA Edge Service Video Metadata Service Disclaimer: This is an example and not an exact representation of the processing
  135. 135. Business level SLA Edge Service Video Metadata Service Disclaimer: This is an example and not an exact representation of the processing Rating service C* store C* store
  136. 136. Business level SLA Edge Service Video Metadata Service Disclaimer: This is an example and not an exact representation of the processing Rating service C* store C* store
  137. 137. Business level SLA Edge Service Video Metadata Service Disclaimer: This is an example and not an exact representation of the processing Rating service C* store C* store Thread timeouts are pretty invasive at every level
  138. 138. Business level SLA Edge Service Video Metadata Service Disclaimer: This is an example and not an exact representation of the processing Rating service C* store C* store Thread timeouts are pretty invasive at every level Do we need them at every step?
  139. 139. Edge Service Video Metadata Service Rating service C* store C* store /movie?id=123 Disclaimer: This is an example and not an exact representation of the processing
  140. 140. Edge Service Video Metadata Service Rating service C* store C* store /movie?id=123 Business timeouts are ! for ! a client request. Disclaimer: This is an example and not an exact representation of the processing
  141. 141. Edge Service Video Metadata Service Rating service C* store C* store /movie?id=123 Disclaimer: This is an example and not an exact representation of the processing
  142. 142. Edge Service Video Metadata Service Rating service C* store C* store /movie?id=123 X Disclaimer: This is an example and not an exact representation of the processing
  143. 143. Edge Service Video Metadata Service Rating service C* store C* store /movie?id=123 X X X X X Disclaimer: This is an example and not an exact representation of the processing
  144. 144. Tuning parameters? X X X X
  145. 145. Less tuning
  146. 146. Edge Service Video Metadata Service Rating service C* store C* store Disclaimer: This is an example and not an exact representation of the processing Request Leases Cancellations Observable<Movie>
  147. 147. Edge Service Video Metadata Service Rating service C* store C* store Disclaimer: This is an example and not an exact representation of the processing Request Leases Cancellations Observable<Movie>
  148. 148. ! public Movie getMovie(String movieId) {! Metadata metadata = getMovieMetadata(movieId);! Bookmark bookmark = getBookmark(movieId, userId);! Rating rating = getRatings(movieId);! return new Movie(metadata, bookmark, rating);! }! public Observable<Movie> getMovie(String movieId) {! return Observable.zip(getMovieMetadata(movieId),! getBookmark(movieId, userId),! getRatings(movieId),! (meta,bmark,rating)->new Movie(meta,bmark,rating));! }!
  149. 149. Resources Asynchronous Function composition : I/O : Network Protocol : https://github.com/ReactiveX/RxJava https://github.com/ReactiveX/RxNetty http://reactivesocket.io/
  150. 150. Nitesh Kant, Engineer, Netflix Edge Gateway @NiteshKant
  151. 151. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/netflix- asynchronous-apps

×