SlideShare a Scribd company logo
1 of 59
Netflix Built Its Own
Monitoring System
(And You Probably Shouldn’t)
Roy Rapoport
rsr@netflix.com @royrapoport
6 March 2015
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/netflix-monitoring-system
Presented at QCon London
www.qconlondon.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Not So Much About Telemetry
• I telemetry
• Architecture track Open Space,
11:30AM, Fleming 3rd Floor
The Knights
Who Say
NIH
Agenda
• Introductions
• On Judgment
• Your Problem
• Your (no, really) Solution
• Mitigation and Anecdotes
• (Not) building your own monitoring
system
Introductions: Me
• About 23 years in technology
• Systems engineering, networking, software
development, QA, release management
• Time at Netflix: 2076 days (5y:8m:7d)
• At Netflix:
• Systems Engineering, Service Delivery in IT
• Troubleshooter and Builder of Python Things
in Product Engineering
• Now: Engineering Manager, Insight Engineering
Introductions: Netflix
• Optimize speed of innovation
• Constrain availability
• Cost is what it is
• Hire smart people,

get out of their way
• Anti-process bias
“Freedom and Responsibility”
Judgment
You Have a Problem
(Your job would likely be boring otherwise)
• Are you the first
• To have it?
• To care?
• Are you sure?
One that looks nice
And not too expensive
You Have a Problem
(Your job would likely be boring otherwise)
• You’re not the first, or only
• Good news!
• Then what?
Adventures in IT-Land
• (import disclaimer)
• Not developers
• Cautious about ongoing support
load
• Not well-trusted
Adventures in IT-Land
A Little Bit of …
• Time, courage, knowledge, pride
• Cynicism, hubris, fear
Technical Reasons for Rejection
(Or: It’s Not You, It’s … Actually, It’s You)
• Financial Cost
• Technical incompatibility
Overqualified!
• https://www.flickr.com/photos/54945394@N00
A Moment for Pedantry
Or: Requirements for “Not Invented Here”
The Knights
Who Say
IbPWAU
A Question of Trust
• Technical: I don’t trust your product
• Organizational: I don’t trust you
I Don’t Trust You
To Care About Me as a Customer
• You’re selling me something
• I’m not your only customer
• I’m not an important customer
• You don’t care about your
customers
I Don’t Trust You
To build a good product
• Past performance …
• “Good for me”
• Because you said so, that’s why!
I Don’t Trust You
To build it fast enough
• Unpredictable velocity
• When best-case is too slow
• Or maybe ever (OSS)
What Now?
Eventual Consistency
• Fork n’ merge
• THE model for OSS
• Works better for incremental
changes
• Requires alignment of goals
Eventual Consistency
No Fork Required
• Start With a New Idea
• Eventually merge concepts
Eventual Consistency Example
Mainline
Cloud Orchestration
2011
Eventual Consistency Example
Mainline
Cloud Orchestration
2011 2013
Eventual Consistency Example
Mainline
Cloud Orchestration
2011 2013
Insight Engineering
CD Automation
Eventual Consistency Example
Mainline
Cloud Orchestration
2011 2013
Insight Engineering
CD Automation
2014
Mainline
CD Automation
Eventual Consistency Example
Mainline
Cloud Orchestration
2011 2013
Insight Engineering
CD Automation
2014
Mainline
CD Automation
2015
Eventual Consistency Example
Mainline
Cloud Orchestration
2011 2013 2014
Mainline
CD Automation
2015
Insight Engineering
CD Automation
Composability
• Want this anyway
• Map scope to options’ scopes
Composability: Example
Netflix’s Atlas Telemetry Platform
Global Query
Endpoint
Composability: Example
Netflix’s Atlas Telemetry Platform
Global Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional
Boundary
Composability: Example
Netflix’s Atlas Telemetry Platform
Global Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Memory
Epic
Cloudwatch
Composability: Example
Netflix’s Atlas Telemetry Platform
Global Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Memory
Cloudwatch
Composability: Example
Netflix’s Atlas Telemetry Platform
Global Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Regional Query
Endpoint
Memory
Cloudwatch
OpenTSDB
InfluxDB
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Edge Systems
Deployment
Automation Platform
Edge Systems
Canary Analysis
API
API
Mainline
Deployment
Automation Platform
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Edge Systems
Deployment
Automation Platform
Edge Systems
Canary Analysis
API
Email
Insight Engineering
Canary Analysis
Mainline
Deployment
Automation Platform
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Edge Systems
Deployment
Automation Platform
Edge Systems
Canary Analysis
API
Insight Engineering
Canary Analysis
Mainline
Deployment
Automation Platform
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Edge Systems
Deployment
Automation Platform
Edge Systems
Canary Analysis
Insight Engineering
Canary Analysis
Mainline
Deployment
Automation Platform
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Edge Systems
Deployment
Automation Platform
Insight Engineering
Canary Analysis
Mainline
Deployment
Automation Platform
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Edge Systems
Deployment
Automation Platform
Insight Engineering
Canary Analysis
Mainline
Deployment
Automation Platform
Composability: Example
Deployments and Automated Canary Analysis at Netflix
Insight Engineering
Canary Analysis
Mainline
Deployment
Automation Platform
One More Reason“Think of the glory.
Think of your
reputation. Think how
great it'll look on your
next resume.”
- Lois McMaster Bujold
Judgment
The Grand Example
Netflix’s Monitoring Platform
• Prior system owned by IT
The Grand Example
Netflix’s Monitoring Platform
• Prior system owned by IT
• No great OSS products
The Grand Example
Netflix’s Monitoring Platform
• Prior system owned by IT
• No great OSS products
• Ridiculous scale
The Grand Example
Netflix’s Monitoring Platform
• Prior system owned by IT
• No great OSS products
• Ridiculous scale
• Seriously, how hard can it be?
The Grand Example
Netflix’s Monitoring Platform
• Took longer than expected
• Ongoing maintenance
• UI only recent priority
The Grand Example
Netflix’s Monitoring Platform
• Scales efficientlyish
• impedance match with dev lifestyle
• Nicely pluggable*
• Aggressivish OSS efforts
* Ask me about Real-Time Analytics!
The Grand Example
Netflix’s Monitoring Platform
• Still the right solution
• Worried about Sunk Cost Fallacy
• Most shouldn’t do this
Can You Repeat That?
Or: What’s Your Point?
Or: I was Tweeting. Did I miss something?
• What’s important to you?
• Is this a technical decision? Really?
• Honest and non-judgmental
• Any mitigation?
• Don’t build your own monitoring
system. Seriously.
Name This Group
• United States
• Europe
• China
• Russia
• India
• Japan
• Blue Origin
• SpaceX
• Virgin Galactic
11:30am Frasier Room (3rd Floor)
@royrapoport
rsr@netflix.com
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/netflix-
monitoring-system

More Related Content

Viewers also liked

Spring Boot + Netflix Eureka
Spring Boot + Netflix EurekaSpring Boot + Netflix Eureka
Spring Boot + Netflix Eureka心 谷本
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
AWS Lambda from the trenches
AWS Lambda from the trenchesAWS Lambda from the trenches
AWS Lambda from the trenchesYan Cui
 
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Andreas Grabner
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyAndreas Grabner
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisBrendan Gregg
 

Viewers also liked (9)

Spring Boot + Netflix Eureka
Spring Boot + Netflix EurekaSpring Boot + Netflix Eureka
Spring Boot + Netflix Eureka
 
Scalable Real-time analytics using Druid
Scalable Real-time analytics using DruidScalable Real-time analytics using Druid
Scalable Real-time analytics using Druid
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
AWS Lambda from the trenches
AWS Lambda from the trenchesAWS Lambda from the trenches
AWS Lambda from the trenches
 
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Culture
CultureCulture
Culture
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Netflix Built Its Own Monitoring System - and Why You Probably Shouldn't

  • 1. Netflix Built Its Own Monitoring System (And You Probably Shouldn’t) Roy Rapoport rsr@netflix.com @royrapoport 6 March 2015
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /netflix-monitoring-system
  • 3. Presented at QCon London www.qconlondon.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. Not So Much About Telemetry • I telemetry • Architecture track Open Space, 11:30AM, Fleming 3rd Floor
  • 6. Agenda • Introductions • On Judgment • Your Problem • Your (no, really) Solution • Mitigation and Anecdotes • (Not) building your own monitoring system
  • 7. Introductions: Me • About 23 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 2076 days (5y:8m:7d) • At Netflix: • Systems Engineering, Service Delivery in IT • Troubleshooter and Builder of Python Things in Product Engineering • Now: Engineering Manager, Insight Engineering
  • 8. Introductions: Netflix • Optimize speed of innovation • Constrain availability • Cost is what it is • Hire smart people,
 get out of their way • Anti-process bias “Freedom and Responsibility”
  • 10. You Have a Problem (Your job would likely be boring otherwise) • Are you the first • To have it? • To care? • Are you sure? One that looks nice And not too expensive
  • 11. You Have a Problem (Your job would likely be boring otherwise) • You’re not the first, or only • Good news! • Then what?
  • 12. Adventures in IT-Land • (import disclaimer) • Not developers • Cautious about ongoing support load • Not well-trusted
  • 14. A Little Bit of … • Time, courage, knowledge, pride • Cynicism, hubris, fear
  • 15.
  • 16. Technical Reasons for Rejection (Or: It’s Not You, It’s … Actually, It’s You) • Financial Cost • Technical incompatibility
  • 19. A Moment for Pedantry Or: Requirements for “Not Invented Here”
  • 21. A Question of Trust • Technical: I don’t trust your product • Organizational: I don’t trust you
  • 22. I Don’t Trust You To Care About Me as a Customer • You’re selling me something • I’m not your only customer • I’m not an important customer • You don’t care about your customers
  • 23. I Don’t Trust You To build a good product • Past performance … • “Good for me” • Because you said so, that’s why!
  • 24. I Don’t Trust You To build it fast enough • Unpredictable velocity • When best-case is too slow • Or maybe ever (OSS)
  • 26. Eventual Consistency • Fork n’ merge • THE model for OSS • Works better for incremental changes • Requires alignment of goals
  • 27. Eventual Consistency No Fork Required • Start With a New Idea • Eventually merge concepts
  • 30. Eventual Consistency Example Mainline Cloud Orchestration 2011 2013 Insight Engineering CD Automation
  • 31. Eventual Consistency Example Mainline Cloud Orchestration 2011 2013 Insight Engineering CD Automation 2014 Mainline CD Automation
  • 32. Eventual Consistency Example Mainline Cloud Orchestration 2011 2013 Insight Engineering CD Automation 2014 Mainline CD Automation 2015
  • 33. Eventual Consistency Example Mainline Cloud Orchestration 2011 2013 2014 Mainline CD Automation 2015 Insight Engineering CD Automation
  • 34. Composability • Want this anyway • Map scope to options’ scopes
  • 35. Composability: Example Netflix’s Atlas Telemetry Platform Global Query Endpoint
  • 36. Composability: Example Netflix’s Atlas Telemetry Platform Global Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Boundary
  • 37. Composability: Example Netflix’s Atlas Telemetry Platform Global Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Memory Epic Cloudwatch
  • 38. Composability: Example Netflix’s Atlas Telemetry Platform Global Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Memory Cloudwatch
  • 39. Composability: Example Netflix’s Atlas Telemetry Platform Global Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Regional Query Endpoint Memory Cloudwatch OpenTSDB InfluxDB
  • 40. Composability: Example Deployments and Automated Canary Analysis at Netflix Edge Systems Deployment Automation Platform Edge Systems Canary Analysis API API Mainline Deployment Automation Platform
  • 41. Composability: Example Deployments and Automated Canary Analysis at Netflix Edge Systems Deployment Automation Platform Edge Systems Canary Analysis API Email Insight Engineering Canary Analysis Mainline Deployment Automation Platform
  • 42. Composability: Example Deployments and Automated Canary Analysis at Netflix Edge Systems Deployment Automation Platform Edge Systems Canary Analysis API Insight Engineering Canary Analysis Mainline Deployment Automation Platform
  • 43. Composability: Example Deployments and Automated Canary Analysis at Netflix Edge Systems Deployment Automation Platform Edge Systems Canary Analysis Insight Engineering Canary Analysis Mainline Deployment Automation Platform
  • 44. Composability: Example Deployments and Automated Canary Analysis at Netflix Edge Systems Deployment Automation Platform Insight Engineering Canary Analysis Mainline Deployment Automation Platform
  • 45. Composability: Example Deployments and Automated Canary Analysis at Netflix Edge Systems Deployment Automation Platform Insight Engineering Canary Analysis Mainline Deployment Automation Platform
  • 46. Composability: Example Deployments and Automated Canary Analysis at Netflix Insight Engineering Canary Analysis Mainline Deployment Automation Platform
  • 47. One More Reason“Think of the glory. Think of your reputation. Think how great it'll look on your next resume.” - Lois McMaster Bujold
  • 49. The Grand Example Netflix’s Monitoring Platform • Prior system owned by IT
  • 50. The Grand Example Netflix’s Monitoring Platform • Prior system owned by IT • No great OSS products
  • 51. The Grand Example Netflix’s Monitoring Platform • Prior system owned by IT • No great OSS products • Ridiculous scale
  • 52. The Grand Example Netflix’s Monitoring Platform • Prior system owned by IT • No great OSS products • Ridiculous scale • Seriously, how hard can it be?
  • 53. The Grand Example Netflix’s Monitoring Platform • Took longer than expected • Ongoing maintenance • UI only recent priority
  • 54. The Grand Example Netflix’s Monitoring Platform • Scales efficientlyish • impedance match with dev lifestyle • Nicely pluggable* • Aggressivish OSS efforts * Ask me about Real-Time Analytics!
  • 55. The Grand Example Netflix’s Monitoring Platform • Still the right solution • Worried about Sunk Cost Fallacy • Most shouldn’t do this
  • 56. Can You Repeat That? Or: What’s Your Point? Or: I was Tweeting. Did I miss something? • What’s important to you? • Is this a technical decision? Really? • Honest and non-judgmental • Any mitigation? • Don’t build your own monitoring system. Seriously.
  • 57. Name This Group • United States • Europe • China • Russia • India • Japan • Blue Origin • SpaceX • Virgin Galactic
  • 58. 11:30am Frasier Room (3rd Floor) @royrapoport rsr@netflix.com
  • 59. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/netflix- monitoring-system