This summary provides an overview of the key points from the document in 3 sentences:
The document outlines the agenda for Season 3 Episode 1 of the Netflix OSS podcast, which includes lightning talks on 8 new projects including Atlas, Prana, Raigad, Genie 2, Inviso, Dynomite, Nicobar, and MSL. Representatives from Netflix, IBM Watson, Nike Digital, and Pivotal then each provide a 3-5 minute presentation on their featured project. The presentations describe the motivation, features and benefits of each project for observability, integration with the Netflix ecosystem, automation of Elasticsearch deployments, job scheduling, dynamic scripting for Java, message security, and developing microservices
3. ● One new way to eval
○ Zero To Docker
● Three community users
○ IBM Watson
○ Nike Digital
○ Pivotal
Agenda - Lightning Talks
● Eight new projects
○ Atlas
○ Prana
○ Raigad
○ Genie 2
○ Inviso
○ Dynomite
○ Nicobar
○ MSL
13. Motivations
● The Netflix Platform stack is JVM based
● Platform features are provided to developers
via client libraries
○ Service Discovery
○ Client Side Load Balancing
○ Monitoring and alerting client libraries
15. Meet Prana
● Prana provides the same set of features to
non-jvm or non-netflix-platform based
software
● It allows applications to gel with the Netflix
Ecosystem
16. Prana Features
● Easy to use http based api
○ Load Balancing via Ribbon
○ Service discovery via Eureka client
○ Monitoring via Atlas Client
● Extensible via a plugin framework
● Highly Configurable
18. Raigad - Motivation
● Elasticsearch Side Car – Co-process runs along side
ES process
● Helps to automate ES deployment
○ ~50 Clusters in test -- ~180TB data
○ ~45 Clusters in prod -- ~780TB data
● Node Discovery and Tracking
● Automatic Index Management
● Scheduled Backup and Restore
● Geared towards running in AWS Environment
19. Auto ES Deployments
● Based on configuration parameters; tunes
Elasticsearch.yml file
● Multi-region support
● Currently follows dedicated Master-Data-
Search deployment based on ASG Names
20. Node Discovery and Tracking
● Sample implementation using Cassandra
● C* keeps track of metadata information of ES Clusters
● ES instance reads C* to discover other nodes during
bootstrap
● Storing metadata in C* helps in Multi-Region
deployments
21. Auto Index Management
● Provides configuration properties for Auto Index
Management
● Based on specific index date suffix (YYYYMMDD), old
indices are cleaned and new indices are created
● Before running Index Manager
22. Auto Index Management … continued
● After running Index Manager
● Index Manager job can be scheduled
23. Running in AWS
● Automatic updates to Security Groups when
new nodes are added or removed
● Supports IAM Credentials
● Scheduled Snapshot Backup to S3 -- uses
elasticsearch-cloud-aws plugin
● Publish ES Metrics to Servo - Centralized
Monitoring System
26. Goals For Genie 2
● Develop a generic data model, which would let jobs run
on any multi-tenant distributed processing cluster.
● Implement a flexible cluster and command selection
algorithm for running a job.
● Provide richer API support.
● Implement a more flexible, extensible and robust
codebase.
28. Our Current Deployment
● 19 i2.2xlarge nodes in prod cluster
○ Configured to allow room to scale up as needed
● 34 max jobs per node
● ~17,000 Jobs Per Day
34. What is Dynomite?
● Dynamo layer on top of a non-distributed system
(Redis/Memcache)
○ Peer-to-peer
○ Replication
○ Sharding
○ Gossipping
○ Multi-datacenters and
racks awareness
○ Encryption
○ Linear scale
37. Operation features
● Florida - sidecar application to manage
Dynomite clusters (like Priam for Cassandra)
● Data backup (Redis only)de replacement
● Data warm-up (Redis only)
● Client failover strategy - Dyno (our java
client)
● Atlas/Servo integration for operation metrics
38. Incoming features
● Higher read/write consistencies
● Data reconciliation or data repair
● Other data storages besides Redis and
Memcache
● Better/more generic warm-up method
● Spark driver integration
● and others
39. Performance
● AWS:
○ 126 nodes total in us-east-1, us-west-2, eu-west-1
○ r3-xlarge
○ 1K data payload
● 250K Write RPS, 250K Read RPS
● Client observed latencies
○ average less than 1ms
○ 99th at ~1.5ms
○ 99.5th at ~2.5ms
46. What is MSL?
MSL = Message Security Layer
MSL is a modern security protocol which
enables arbitrary application protocols to be
secured over arbitrary transport protocols.
47. Performance
● sub-second playback start
● MSL messaging stacked with app protocol
○ request can have: device authentication, user
authentication, key exchange & application message
○ response can have: key exchange, authentication
renewal, application message
● Netflix streaming should start faster than
changing channels on your cable box!
48. Reliability
● We need 4-5 “9s” of reliability
● MSL has automatic error recovery
● We had to remove reliance on 3rd party PKI
● Client time: not needed by MSL
49. Modern Protocol Design
● Human readable JSON vs. complex binary
format
○ ASN.1 security issues go away
● Multiple implementations: Java, JS, C#, …
○ JS: updateable in-field
51. Deployment Models
● Trusted Services Network
○ All servers shares a common master key allowing
the same level of trust across the network
● Peer-to-Peer
○ Every pair of entities shares connection specific keys
& credentials
54. ● Up and running in minutes
● Before - Documented technology
that we expected you to assemble
● Now - Running technology that
we assembled and validated
● Not - Production Ready.
Examples only, not run this way at
Netflix (security, HA, monitoring,
etc.)
Netflix OSS on your laptop
Docker Host (ex. Virtual Box on OSX)
Ubuntu 14.04
single kernel
Container#1
Filesystem+
process
Eureka
Container
ZuulContainer
Another
Container
...
55. Trusted and Transparent Builds
● Start with Dockerhub registry
○ Pull images that you know were built securely
○ All you need if you just want to run them
● Inspect the linked github Dockerfile
○ Want to know how NetflixOSS was configured?
○ Want to know how NetflixOSS code was built?
○ All code and configuration explicitly documented
56. What is available?
From https://hub.docker.com/u/netflixoss/
● asgard
● eureka
● edda
● sketchy
● security monkey
● exhibitor
● sample karyon
application
● zuul
● atlas
58. Where we started...
> Datacenter 2 Cloud (AWS)
> Cloud native architecture using microservices
> Defined a Cloud Blueprint
> Pioneered a REST application bootstrap
> Maintain a boilerplate to define transitive dependencies.
59. How do we do metrics
across billions (or 100s) of
microservices?
62. How are we going to store
all of these metrics?
63. ● Graphite Carbon compliant
● Cassandra metric storage
● Elasticsearch metric search
● C* and ES cross-region replication
enable a global view of the metrics
Cyanite
https://github.com/pyr/cyanite
64. How do we make these
tools Blueprint
compliant?
74. Kevin Haverlock kbh@us.ibm.com
Aroop Pandya apandya@us.ibm.com
Kelly Abuelsaad kna@us.ibm.com
Susan Diamond lsyang@us.ibm.com
IBM Watson Developer Cloud
77. Visual Recognition
Image/Video recognition and classification service to
provide assessment of a user from their images
Extract information from text: People, Organizations, Locations,
Events, and the relationships between them
User Modeling
Improved understanding of people's preferences to help
engage users on own terms
Language Identification
Machine Translation
Concept Expansion
Message Resonance
Question and Answer
Relationship Extraction
Text to Speech
The conversion of text to outputted audio stream
Speech to Text
Converts speech into text
Tradeoff Analytics
helps people make better choices while taking into
account multiple, often conflicting, goals that matter
Concept Analytics
Links documents that you provide with a pre-
existing graph of concepts