Netflix Season 3 Episode 1 Recap

Season 3 Episode 1
Feb 11, 2015

Ruslan Meshenberg - @rusmeshenberg
Introduction

● One new way to eval
○ Zero To Docker
● Three community users
○ IBM Watson
○ Nike Digital
○ Pivotal
Agenda - Lightning Talks
● Eight new projects
○ Atlas
○ Prana
○ Raigad
○ Genie 2
○ Inviso
○ Dynomite
○ Nicobar
○ MSL

Atlas
Roy Rapoport - @royrapoport

In-House Telemetry? Inconceivable!
● Crowded OSS field!
○ Cacti, InfluxDB, OpenTSDB, Nagios, Icinga, NeDi, Zabbix, Observium, Sensu, Zenoss,
OpenNMS, Bosun, Prometheus, etc
● Not to mention commercial products
● Some shortcomings ...

In-House Telemetry? Inconceivable!
● Agility Mismatch
● Cloud (and Netflix Ecosystem) Integration
● Multiple Data Sources
● Scale
● No, seriously. Scale
○ 2011: 10M/minute
○ ~2x Increase per quarter
○ Now up to 1.3B/minute

Also …
● Decent UIs
● Alerting
○ And alert threshold analysis and recommendations
● Real-Time Analytics
● Integration with Hive and EMR
● Dashboards frameworks

Also …
● Composable
● So we can change our minds later …

For now …
● Query layer
● Back end

Soon …
● Improved deployment
● Publish client
● Alerting
● Better UI

Prana
Diptanu Choudhury - @diptanu

Motivations
● The Netflix Platform stack is JVM based
● Platform features are provided to developers
via client libraries
○ Service Discovery
○ Client Side Load Balancing
○ Monitoring and alerting client libraries

Meet Prana
● Prana provides the same set of features to
non-jvm or non-netflix-platform based
software
● It allows applications to gel with the Netflix
Ecosystem

Prana Features
● Easy to use http based api
○ Load Balancing via Ribbon
○ Service discovery via Eureka client
○ Monitoring via Atlas Client
● Extensible via a plugin framework
● Highly Configurable

Raigad
Sagar Loke - @sagar_loke

Raigad - Motivation
● Elasticsearch Side Car – Co-process runs along side
ES process
● Helps to automate ES deployment
○ ~50 Clusters in test -- ~180TB data
○ ~45 Clusters in prod -- ~780TB data
● Node Discovery and Tracking
● Automatic Index Management
● Scheduled Backup and Restore
● Geared towards running in AWS Environment

Auto ES Deployments
● Based on configuration parameters; tunes
Elasticsearch.yml file
● Multi-region support
● Currently follows dedicated Master-Data-
Search deployment based on ASG Names

Node Discovery and Tracking
● Sample implementation using Cassandra
● C* keeps track of metadata information of ES Clusters
● ES instance reads C* to discover other nodes during
bootstrap
● Storing metadata in C* helps in Multi-Region
deployments

Auto Index Management
● Provides configuration properties for Auto Index
Management
● Based on specific index date suffix (YYYYMMDD), old
indices are cleaned and new indices are created
● Before running Index Manager

Auto Index Management … continued
● After running Index Manager
● Index Manager job can be scheduled

Running in AWS
● Automatic updates to Security Groups when
new nodes are added or removed
● Supports IAM Credentials
● Scheduled Snapshot Backup to S3 -- uses
elasticsearch-cloud-aws plugin
● Publish ES Metrics to Servo - Centralized
Monitoring System

Data
Warehouse
Prod VPCBonusQuery Prod Test
Processing
Clusters
Clients
Service
Tools
Our Current Architecture
CLI’s

Goals For Genie 2
● Develop a generic data model, which would let jobs run
on any multi-tenant distributed processing cluster.
● Implement a flexible cluster and command selection
algorithm for running a job.
● Provide richer API support.
● Implement a more flexible, extensible and robust
codebase.

{
"user": "tgianos",
"name": "PrestoJob.1421807841069",
"commandArgs": "-f script.presto ",
"clusterCriterias": [
{
"tags": [
"presto",
"prod"
]
},
{
"tags": [
"adhoc"
]
}
],
"commandCriteria": [
"presto"
],
"tags": [
"headers",
"presto",
"BigDataPortal"
]
}

Our Current Deployment
● 19 i2.2xlarge nodes in prod cluster
○ Configured to allow room to scale up as needed
● 34 max jobs per node
● ~17,000 Jobs Per Day

Dynomite
Minh Do - @timiblossom

What is Dynomite?
● Dynamo layer on top of a non-distributed system
(Redis/Memcache)
○ Peer-to-peer
○ Replication
○ Sharding
○ Gossipping
○ Multi-datacenters and
racks awareness
○ Encryption
○ Linear scale

Operation features
● Florida - sidecar application to manage
Dynomite clusters (like Priam for Cassandra)
● Data backup (Redis only)de replacement
● Data warm-up (Redis only)
● Client failover strategy - Dyno (our java
client)
● Atlas/Servo integration for operation metrics

Incoming features
● Higher read/write consistencies
● Data reconciliation or data repair
● Other data storages besides Redis and
Memcache
● Better/more generic warm-up method
● Spark driver integration
● and others

Performance
● AWS:
○ 126 nodes total in us-east-1, us-west-2, eu-west-1
○ r3-xlarge
○ 1K data payload
● 250K Write RPS, 250K Read RPS
● Client observed latencies
○ average less than 1ms
○ 99th at ~1.5ms
○ 99.5th at ~2.5ms

Nicobar
Dynamic Scripting Library for Java
Vasanth Asokan

What is Nicobar?
Mainly, two things:
1. A Pluggable, Dynamic Scripting Framework for Java
(powered by)
2. A Modular Classloading System

Traditional Java Classloader Hierarchy

Powered by
JBoss Modules
Nicobar Module Classloader Hierarchy

What is MSL?
MSL = Message Security Layer
MSL is a modern security protocol which
enables arbitrary application protocols to be
secured over arbitrary transport protocols.

Performance
● sub-second playback start
● MSL messaging stacked with app protocol
○ request can have: device authentication, user
authentication, key exchange & application message
○ response can have: key exchange, authentication
renewal, application message
● Netflix streaming should start faster than
changing channels on your cable box!

Reliability
● We need 4-5 “9s” of reliability
● MSL has automatic error recovery
● We had to remove reliance on 3rd party PKI
● Client time: not needed by MSL

Modern Protocol Design
● Human readable JSON vs. complex binary
format
○ ASN.1 security issues go away
● Multiple implementations: Java, JS, C#, …
○ JS: updateable in-field

Flexibility
● Pluggable
○ authentication
○ crypto algorithms
● Standard porting API
○ Can use W3C WebCrypto, for example

Deployment Models
● Trusted Services Network
○ All servers shares a common master key allowing
the same level of trust across the network
● Peer-to-Peer
○ Every pair of entities shares connection specific keys
& credentials

Security / Feature List
● encrypt / decrypt
● device authentication
● user authentication
● integrity protection
● key exchange
● anti-replay protection
● compression
● chunked messaging

Zero To Docker
Andrew Spyker - @aspyker

● Up and running in minutes
● Before - Documented technology
that we expected you to assemble
● Now - Running technology that
we assembled and validated
● Not - Production Ready.
Examples only, not run this way at
Netflix (security, HA, monitoring,
etc.)
Netflix OSS on your laptop
Docker Host (ex. Virtual Box on OSX)
Ubuntu 14.04
single kernel
Container#1
Filesystem+
process
Eureka
Container
ZuulContainer
Another
Container
...

Trusted and Transparent Builds
● Start with Dockerhub registry
○ Pull images that you know were built securely
○ All you need if you just want to run them
● Inspect the linked github Dockerfile
○ Want to know how NetflixOSS was configured?
○ Want to know how NetflixOSS code was built?
○ All code and configuration explicitly documented

What is available?
From https://hub.docker.com/u/netflixoss/
● asgard
● eureka
● edda
● sketchy
● security monkey
● exhibitor
● sample karyon
application
● zuul
● atlas

Nike Digital
Alan Scherger - @flyinprogrammer

Where we started...
> Datacenter 2 Cloud (AWS)
> Cloud native architecture using microservices
> Defined a Cloud Blueprint
> Pioneered a REST application bootstrap
> Maintain a boilerplate to define transitive dependencies.

How do we do metrics
across billions (or 100s) of
microservices?

● Instrumentation to JMX
● Graphite Observer to capture metrics

Observer Modifications
● Use Eureka to find
a Graphite node
● Use a healthcheck
to timeout the tcp
socket

How are we going to store
all of these metrics?

● Graphite Carbon compliant
● Cassandra metric storage
● Elasticsearch metric search
● C* and ES cross-region replication
enable a global view of the metrics
Cyanite
https://github.com/pyr/cyanite

How do we make these
tools Blueprint
compliant?

Raigad + Elasticsearch ; Prana + Cyanite

So those didn’t exist - sour.

Generic Sidecar
● Application daemon
● Convention over
configuration
groovy scripts.

configure.groovy
Generate 3 config files off
eureka data.

Add the ingredients that produce code.

Kevin Haverlock kbh@us.ibm.com
Aroop Pandya apandya@us.ibm.com
Kelly Abuelsaad kna@us.ibm.com
Susan Diamond lsyang@us.ibm.com
IBM Watson Developer Cloud

…
http://www.msnbc.com/msnbc/how-supercomputer-sees-the-state-the-union

Visual Recognition
Image/Video recognition and classification service to
provide assessment of a user from their images
Extract information from text: People, Organizations, Locations,
Events, and the relationships between them
User Modeling
Improved understanding of people's preferences to help
engage users on own terms
Language Identification
Machine Translation
Concept Expansion
Message Resonance
Question and Answer
Relationship Extraction
Text to Speech
The conversion of text to outputted audio stream
Speech to Text
Converts speech into text
Tradeoff Analytics
helps people make better choices while taking into
account multiple, often conflicting, goals that matter
Concept Analytics
Links documents that you provide with a pre-
existing graph of concepts

Joshua Long - @starbuxman
Pivotal
“bootifuL” microservices
with spring cloud & Netflix oss

@starbuxman
@Grab("spring-boot-starter-actuator")
@RestController
class GreetingsController {
@RequestMapping("/hi/{name}")
def hi(@PathVariable String name){
[ greeting: "Hello, " + name +"!" ]
}
}
> spring run greeting.groovy
> spring jar greeting.groovy greeting.jar

@starbuxman
import org.springframework.cloud.config.server.EnableConfigServer;
@SpringBootApplication
@EnableConfigServer
public class ConfigurationServerApplication {
public static void main(String[] args) throws Exception {
SpringApplication.run(ConfigurationServerApplication.class, args);
}
}
spring:
cloud:
config:
server:
uri: ${MY_CONF:https://github.com/some/git-repository}
@Value("${some.property}")
private String someProperty ;

@starbuxman
@EnableEurekaClient
public class DogeApplication {
// …
// src/main/resources/bootstrap.yml
spring:
application:
name: doge-service

@starbuxman
@Component
class ReliableClient {
@HystrixCommand(
fallbackMethod = "defaultDogeLink")
public Link buildDogeLink() {
// insert volatile
// service-to-service call here
}
}
@EnableHystrixDashboard
public class HystrixApplication {
public static void main(String[] args) {
SpringApplication.run(HystrixApplication.class, args);
}
}

@starbuxman
zuul:
proxy:
mapping: /api
addProxyHeaders: true
route:
account-service: /accounts
doge-service: /doges
zuul:
proxy:
mapping: /api
//: true
route:
/api/accounts
/api/doges

@starbuxman
spring:
oauth2:
client:
clientId: acme
clientSecret: acmesecret
resource:
tokenInfoUri: http://localhost:8002/auth/oauth/check_token
id: openid
serviceId: resource
@RestController
@EnableOAuth2Resource
public class SsoResourceApplication {
public static void main(String[] args) {
SpringApplication.run(SsoResourceApplication.class, args);
}
@RequestMapping("/hi")
String hi(@RequestParam Optional<String> name) {
return "Hello" + name.map(n -> ", " + n).orElse("") + "! ";
}
}

Josh Long (龙之春)
@starbuxman
@springcentral
jlong@pivotal.io
github.com/joshlong
References
spring.io/guides
github.com/spring-cloud/
github.com/spring-cloud-samples/
github.com/joshlong/spring-doge
github.com/joshlong/spring-doge-microservice
docs.spring.io/spring-boot/
Questions?

Please join us next door
for mingling, drinks and food!
@netflixoss
Thank you!

Netflix Season 3 Episode 1 Recap

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Netflix Season 3 Episode 1 Recap

Similar to Netflix Season 3 Episode 1 Recap (20)

More from Ruslan Meshenberg

More from Ruslan Meshenberg (9)

Recently uploaded

Recently uploaded (20)

Netflix Season 3 Episode 1 Recap