O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
The Netflix API for a
global service
Katharina Probst
Engineering Manager, API
DevNexus, February 2016
What is Netflix?
Stream TV
shows and
movies
anywhere,
any time.
Global!
(except China and where we can’t operate for legal reasons)
Netflix
Originals
Scale
❏ Peak
downstream
traffic in the
US is 37%,
upstream
almost 7%.
❏ 75 Million subscribers worldwide and growing
Sourc...
Netflix API
❏ Architecture
❏ Resiliency
❏ Developer velocity
❏ Tooling and DevOps
❏ Current and future directions
API
Netflix API
❏ Architecture
❏ Resiliency
❏ Developer velocity
❏ Tooling and DevOps
❏ Current and future directions
API
API
Personali-
zation
Engine
User
Info
Ratings
Similar
Movies
A/B Test
Engine
….
ELB
Zuul
(gateway)
RxJava Hystrix
JavaService
Layer
Mid-tier
Services
UI
Teams
Client Server
Internet
Application
/tv/home
API
Team
Service
T...
What is the API used
for?
Examples:
❏ Discovery
❏ Recommendations
❏ Move metadata
❏ Ratings
❏ Sign-up and Profiles
❏ Playb...
Direct dependencies on other services
Netflix API
❏ Architecture
❏ Resiliency
❏ Developer velocity
❏ Tooling and DevOps
❏ Current and future directions
API
Hystrix Primer
❏ Protection from and control over
latency and failure from dependencies
❏ Stop cascading failures in a com...
API
Personali-
zation
Engine
User
Info
Ratings
Similar
Movies
A/B Test
Engine
….
API
Personali-
zation
Engine
User
Info
Ratings
Similar
Movies
A/B Test
Engine
….
Don’t let this
happen.
API
Personali-
zation
Engine
User
Info
Ratings
Similar
Movies
A/B Test
Engine
….
Don’t let this
happen.
Fallback
Response
Do this
instead.
API
Personali-
zation
Engine
User
Info
Ratings
Similar
Movies
A/B Test
Engine
….
Failure
Injection
Testing
(FIT)
Goal: Study how the
system behaves when
a failures occur (e.g.,
backend service
unreachabl...
More automated failure testing
Goal: Find groups of service calls that are needed for
success.
http://techblog.netflix.com...
Autoscaling & Capacity Management
http://nflx.it/1LvqLUi
Autoscaling & Capacity Management
❏ Red: traffic for current week (x-axis)
❏ Black: traffic for previous week for comparis...
AWS Controls Reactive, does not
scale up fast enough
Fine-grained Control with Scryer
Complements AWS Controls
❏ Faster scale-up, improved cost
❏ Use reactive policy for organ...
Netflix API
❏ Architecture
❏ Resiliency
❏ Developer velocity
❏ Tooling and DevOps
❏ Current and future directions
API
Lots of devices, lots of variety
Different interaction models
And just to make things a little more
interesting….
❏ A/B tests
❏ profiles
❏ localization
Add server-side scripting capability
❏ Reduce network chattiness
❏ Support device optimizations
❏ Enable faster developmen...
Discrete HTTP requests pay network tax repeatedly
Single, optimized request; pay network tax once
Client data
assembly logic
pushed to server
Local MethodRemote API
GET
/users/{user_id}/lists
getLists(userId)
❏ UI (script) changes can happen
independently
❏ Script changes can be pushed to running
servers, so decoupled from API pu...
Netflix API
❏ Architecture
❏ Resiliency
❏ Developer velocity
❏ Tooling and DevOps
❏ Current and future directions
API
Run 1% of your traffic on the new
code and see how it does
❏ Errors: 2xx, 4xx, 5xx
❏ latency
❏ network
❏ busy threads
❏ load, memory consumption
❏ ...
So you’ve run a canary. Now wh...
Successful canary
red/black push
Continuous Delivery with Spinnaker
http://techblog.netflix.com/2015/09/moving-from-asgard-to-spinnaker.html
Quickly see status of all clusters
http://techblog.netflix.com/2015/09/moving-from-asgard-to-spinnaker.html
Prod is a little different….
The things you can do
… with server groups … with instances
Script Management
Operations
Operations
Operations
Real-time analysis
http://www.slideshare.net/g9yuayon/qcon-talk-on-netflix-mantis-a-stream-processing-system
Submit a quer...
Netflix API
❏ Architecture
❏ Resiliency
❏ Developer velocity
❏ Tooling and DevOps
❏ Current and future
directions
API
● > 900 active
endpoints
● ~60 direct
dependencies
● 78 thread pools
● 1000+ threads
● high memory usage
What we’ve
grown ...
Script isolation & node
❏ Groovy scripts run as
part of API process
❏ UI teams would like to
use other languages
(in parti...
Thin client libraries
❏ Fat client libraries
❏ business logic and
have
❏ multiple dependencies
❏ Move business logic and
d...
Remove metadata from API servers
❏ Metadata takes up
significant memory
in API servers
❏ Challenge: reduce
chattiness to
m...
In the beginning...
The Netflix API for a global service
The Netflix API for a global service
Próximos SlideShares
Carregando em…5
×

The Netflix API for a global service

At Netflix, we provide a Java-based API that supports the content discovery, sign-up, and playback experience on thousands of device types that millions use around the world every day. As our user base and traffic have grown by leaps and bounds, we are continuously evolving this API to enable the best user experience. In this talk, I will give an overview of how and why the Netflix API has evolved to where it is today and where we plan to take it in the future. I will discuss how we make our system resilient against failures using tools such as Hystrix and FIT, while keeping it flexible and nimble enough to support continuous A/B testing.

  • Seja o primeiro a comentar

The Netflix API for a global service

  1. 1. The Netflix API for a global service Katharina Probst Engineering Manager, API DevNexus, February 2016
  2. 2. What is Netflix? Stream TV shows and movies anywhere, any time.
  3. 3. Global! (except China and where we can’t operate for legal reasons)
  4. 4. Netflix Originals
  5. 5. Scale ❏ Peak downstream traffic in the US is 37%, upstream almost 7%. ❏ 75 Million subscribers worldwide and growing Source: http://www.sandvine.com/news/global_broadband_trends.asp
  6. 6. Netflix API ❏ Architecture ❏ Resiliency ❏ Developer velocity ❏ Tooling and DevOps ❏ Current and future directions API
  7. 7. Netflix API ❏ Architecture ❏ Resiliency ❏ Developer velocity ❏ Tooling and DevOps ❏ Current and future directions API
  8. 8. API Personali- zation Engine User Info Ratings Similar Movies A/B Test Engine …. ELB Zuul (gateway)
  9. 9. RxJava Hystrix JavaService Layer Mid-tier Services UI Teams Client Server Internet Application /tv/home API Team Service Teams
  10. 10. What is the API used for? Examples: ❏ Discovery ❏ Recommendations ❏ Move metadata ❏ Ratings ❏ Sign-up and Profiles ❏ Playback ❏ Bookmarks ❏ DRM ❏ A/B testing API
  11. 11. Direct dependencies on other services
  12. 12. Netflix API ❏ Architecture ❏ Resiliency ❏ Developer velocity ❏ Tooling and DevOps ❏ Current and future directions API
  13. 13. Hystrix Primer ❏ Protection from and control over latency and failure from dependencies ❏ Stop cascading failures in a complex distributed system ❏ Fall back and gracefully degrade ❏ Fail fast and rapidly recover https://github.com/Netflix/Hystrix
  14. 14. API Personali- zation Engine User Info Ratings Similar Movies A/B Test Engine ….
  15. 15. API Personali- zation Engine User Info Ratings Similar Movies A/B Test Engine …. Don’t let this happen.
  16. 16. API Personali- zation Engine User Info Ratings Similar Movies A/B Test Engine …. Don’t let this happen.
  17. 17. Fallback Response Do this instead. API Personali- zation Engine User Info Ratings Similar Movies A/B Test Engine ….
  18. 18. Failure Injection Testing (FIT) Goal: Study how the system behaves when a failures occur (e.g., backend service unreachable).
  19. 19. More automated failure testing Goal: Find groups of service calls that are needed for success. http://techblog.netflix.com/2016/01/automated-failure-testing.html
  20. 20. Autoscaling & Capacity Management http://nflx.it/1LvqLUi
  21. 21. Autoscaling & Capacity Management ❏ Red: traffic for current week (x-axis) ❏ Black: traffic for previous week for comparison ❏ What happened on February 7? Superbowl!
  22. 22. AWS Controls Reactive, does not scale up fast enough
  23. 23. Fine-grained Control with Scryer Complements AWS Controls ❏ Faster scale-up, improved cost ❏ Use reactive policy for organic scale down
  24. 24. Netflix API ❏ Architecture ❏ Resiliency ❏ Developer velocity ❏ Tooling and DevOps ❏ Current and future directions API
  25. 25. Lots of devices, lots of variety
  26. 26. Different interaction models
  27. 27. And just to make things a little more interesting…. ❏ A/B tests ❏ profiles ❏ localization
  28. 28. Add server-side scripting capability ❏ Reduce network chattiness ❏ Support device optimizations ❏ Enable faster development for internal users
  29. 29. Discrete HTTP requests pay network tax repeatedly
  30. 30. Single, optimized request; pay network tax once Client data assembly logic pushed to server
  31. 31. Local MethodRemote API GET /users/{user_id}/lists getLists(userId)
  32. 32. ❏ UI (script) changes can happen independently ❏ Script changes can be pushed to running servers, so decoupled from API push schedule ❏ Decoupling leads to greater developer velocity Impact on velocity and collaboration
  33. 33. Netflix API ❏ Architecture ❏ Resiliency ❏ Developer velocity ❏ Tooling and DevOps ❏ Current and future directions API
  34. 34. Run 1% of your traffic on the new code and see how it does
  35. 35. ❏ Errors: 2xx, 4xx, 5xx ❏ latency ❏ network ❏ busy threads ❏ load, memory consumption ❏ ... So you’ve run a canary. Now what? Control Canary
  36. 36. Successful canary red/black push
  37. 37. Continuous Delivery with Spinnaker http://techblog.netflix.com/2015/09/moving-from-asgard-to-spinnaker.html
  38. 38. Quickly see status of all clusters http://techblog.netflix.com/2015/09/moving-from-asgard-to-spinnaker.html
  39. 39. Prod is a little different….
  40. 40. The things you can do … with server groups … with instances
  41. 41. Script Management
  42. 42. Operations
  43. 43. Operations
  44. 44. Operations
  45. 45. Real-time analysis http://www.slideshare.net/g9yuayon/qcon-talk-on-netflix-mantis-a-stream-processing-system Submit a query, see requests in real time.
  46. 46. Netflix API ❏ Architecture ❏ Resiliency ❏ Developer velocity ❏ Tooling and DevOps ❏ Current and future directions API
  47. 47. ● > 900 active endpoints ● ~60 direct dependencies ● 78 thread pools ● 1000+ threads ● high memory usage What we’ve grown to
  48. 48. Script isolation & node ❏ Groovy scripts run as part of API process ❏ UI teams would like to use other languages (in particular node.js) var response = model.get("todos[0..2] ['name','done']"); API remote service layer Client libs UI/device scripts (node) Falcor Services
  49. 49. Thin client libraries ❏ Fat client libraries ❏ business logic and have ❏ multiple dependencies ❏ Move business logic and dependencies to services API remote service layer Thin client libs UI/device scripts (node) Falcor Services
  50. 50. Remove metadata from API servers ❏ Metadata takes up significant memory in API servers ❏ Challenge: reduce chattiness to metadata Metadata Service API remote service layer Thin client libs UI/device scripts (node) Falcor Services
  51. 51. In the beginning...

×