SlideShare a Scribd company logo
1 of 30
Download to read offline
Better management of large-scale,
heterogeneous networks
toward a programmable management plane
Joshua George, Anees Shaikh
Google Network Operations
www.openconfig.net
Management plane challenges
Rethinking telemetry -- efficient, large-scale monitoring
OpenConfig -- community-driven API development3
2
1
Agenda
2
Management Plane Challenges
3
Challenges of managing a large-scale network
● more than 8M OIDs collected
every 5 minutes
● more than 20K CLI commands
issued and scraped every 5
minutes
● many tools, and multiple
generations of software
Opportunity for significant OPEX savings: reduced outage impact,
simplification of management stack, better scaling
4
● 20+ network device roles
● more than half dozen vendors,
multiple platforms
● 4M lines of configuration files
● up to ~30K configuration changes
per month
Management plane is way behind
● proprietary CLIs, lots of scripts
● imperative, incremental configuration
● lack of abstractions
● configuration scraping from devices
● SNMP monitoring -- not always “simple” and not
often scalable
5
Configuration
• describes configuration
data structure and content
• common modeling
language: YANG
• multiple data encodings:
protobuf, XML, JSON, ...
Topology
• describes structure of the
network
• common modeling
language: multiple
• data encoding: protobuf, ...
Telemetry
• describes monitoring data
structure and attributes
• common modeling
language: exploring YANG
• data delivery: RPC,
protobuf inside UDP
Model-driven network management
6
Rethinking Network Telemetry
7
Telemetry solutions today
What do we use? Often SNMP is the default choice.
● legacy implementations -- designed for limited processing and
bandwidth
● expensive discoverability -- re-walk MIBs to discover new elements
● no capability advertisement -- test OIDs to determine support
● rigid structure -- limited extensibility to add new data
● proprietary data -- require vendor-specific mappings and multiple
requests to reassemble data
● protocol stagnation -- no absorption of current data modeling and
transmission techniques
8
Telemetry challenges
● SNMP object collection growing with each platform generation
○ e.g., 100K objects on current platforms, expected to grow 3x over
next 2 generations
○ similar for object collection frequency
● Future devices continue to grow in density and drive this trend
○ scale limitations in data acquisition at high frequencies
● Near-real-time acquisition and access to monitoring data is a
requirement for <insert buzzword here>
○ traffic management, tight control loops, fast recovery
9
I get it, you really don’t like SNMP…
10
but do you have a better idea?
Rethinking telemetry...reverse the flow
○ stream data continuously --
with incremental updates
based on subscriptions
○ observe network state
through a time-series data
stream
○ devices programmed with a
data model describing
desired structure and
content
○ efficient, secure transport
protocols
11
Telemetry framework requirements
● network elements stream data to collectors (push model)
● data populated based on vendor-neutral models whenever possible
● utilize a publish/subscribe API to select desired data
● scale for next 10 years of density growth with high data freshness
○ other protocols distribute load to hardware, so should telemetry
● utilize modern transport mechanisms with active development
communities
○ gRPC (HTTP/2), Thrift, etc.
○ protocol buffer over UDP
12
Example telemetry configuration flow
13
Network Management System
Structured inventory and set of
telemetry capabilities pushed to
the NMS
Rules for typical
monitoring
Tactical overrides from
operators
Generate
monitoring
configuration
Publish to
network
element
gRPC endpoint
Example telemetry data flow
14
Collection Infrastructure
Send statistics every X
seconds
Asynchronous
event reporting
Operator requests for ad-hoc data.
3 types of telemetry events:
● Bulk time series data
○ All interface stats every 10
seconds.
● Event/edge driven updates
○ LSP A is now down.
● Operator request/response
○ Show me oper state for all
interfaces.gRPC endpoint
Practical realization
● Streaming telemetry is beyond an idea stage, but is far from a
final product
● Multiple vendor implementations now available for
experimentation
● Development is ongoing -- now is the time to share your
requirements and make your voice heard !
15
OpenConfig
16
OpenConfig
● Informal industry collaboration of network operators
● Focus: define vendor-neutral configuration and operational
state models based on real operations
○ Adopted YANG data modeling language (RFC 6020)
● Participants: Apple, AT&T, BT, Comcast, Cox, Facebook, Google
Level3, Microsoft, Verizon, Yahoo!
● Primary output is model code, published as open source via
public github repo
● Ongoing interactions with standards and open source
communities (e.g., IETF, ONF, ODL, ONOS)
17
Example configuration pipeline
configuration data
vendor-neutral, validated
multiple vendor devices
18
OC
YANG
models
configuration
generation
gRPC req
operators
intent API
“drain peering link”
update topology model
gRPC endpoint
Extending OpenConfig models
● base OpenConfig model as a starting point
● vendors can offer augmentations / deviations
● operators can add locally consumed extensions
base model
X vendor modifications
local modifications
extended model
19
OpenConfig releases and roadmap
Data models (configuration and operational state)
● BGP and routing policy
○ multiple vendor implementations in progress
● MPLS / TE consolidated model
○ RSVP / TE and segment routing model as initial focus
● design patterns for operational state and model composition
● tools for translating YANG models to usable code artifacts
○ e.g., pyangbind
20
Models in progress
● interfaces, system, optical transport, ...
Models must be composed to be useful
● model composition framework is
critical missing piece from existing
model-building efforts
21
Modeling operational state
22
Types of operational state data
● derived, negotiated, set by a protocol, etc. (negotiated BGP hold-time)
● operational state data for counters or statistics (interface counters)
● operational state data representing intended configuration (actual vs.
configured)
Clear benefits from using YANG to model both configuration and
operational state in the same data model
● but … YANG focus has primarily been config, NETCONF-centric, lack of
common conventions
Summary
● New networking paradigms like SDN focus mostly on control
○ it’s time for the management plane to join the age of SDN
● Core principles:
○ model-driven management
○ streaming telemetry to scale monitoring and improve freshness
○ vendor-neutral, extensible APIs for managing devices
● Architecture and emerging vendor implementations of multi-
mode telemetry solutions
● OpenConfig is a focused effort by operators to develop vendor-
neutral models to define management APIs
23
Operators: get involved and push your vendors for support on your gear!
thank you !
gRPC: multi-platform RPC framework
gRPC features
● load-balancing, app-level flow control, call-cancellation
● serialization with protobuf (efficient wire encoding)
● multi-platform, many supported languages
● open source, under active development
gRPC leverages HTTP/2 as its transport layer
● binary framing, header compression
● bidirectional streams, server push support
● connection multiplexing across requests and streams
25
http://grpc.io
OpenConfig and standards (e.g., IETF)
● primary goal is native implementation of OpenConfig models
○ OpenConfig is not a standards group
● publish models and documentation in IETF to inject operator
perspective into standards process
● adoption of OpenConfig models and ideas into standards can
simplify development efforts for vendors
26
Additional “observations”
● YANG and NETCONF should be decoupled -- each are
independently useful
● YANG needs to evolve more rapidly at this early phase, stabilize
as real usage increases
● current YANG model versioning is not helpful -- treat models
like software artifacts, not dated documents
● current standard models should be open for revisiting and
revising
● should not rush to standardize more models until they are
deployed and used in production
27
these are not necessarily OpenConfig consensus views
Current OpenConfig “process”
● initial models developed
by OpenConfig
● extensive collaboration
with vendors
● leverage existing work
where possible
● publish models and docs
28
Intent-based configuration flow
abstract configuration models
Config
Model
Topology
Model
configuration intent
operators
declarative API
configurationflow
configuration pusher
NETCONF, RESTCONF, JSON-RPC, ...
...
authoritative
config store
config
generation
device-level configuration
standard
models
vendor-neutral configuration models
generated
configuration instances
config
generation
authoritative
config storeapplication
NB APIs
Network OS
SB protocols
analagous SDN stack
29
Telemetry required for a full solution
● Popular network technology topics today focus on more
granular control of network traffic decisions.
○ APIs, RPC frameworks, agents, controllers, overlays…
● Developing an awesome method to control network elements is
only half of the solution.
● Data accuracy and freshness are limiting factors in a control
solution’s optimality.
30

More Related Content

What's hot

Overview Of I E C61850 Presentation..... W S M
Overview Of  I E C61850  Presentation..... W S MOverview Of  I E C61850  Presentation..... W S M
Overview Of I E C61850 Presentation..... W S M
ginquesada
 
Mpls Traffic Engineering ppt
Mpls Traffic Engineering pptMpls Traffic Engineering ppt
Mpls Traffic Engineering ppt
Nitin Gehlot
 

What's hot (20)

NETCONF YANG tutorial
NETCONF YANG tutorialNETCONF YANG tutorial
NETCONF YANG tutorial
 
Aci presentation
Aci presentationAci presentation
Aci presentation
 
Cloud SDN: BGP Peering and RPKI
Cloud SDN: BGP Peering and RPKICloud SDN: BGP Peering and RPKI
Cloud SDN: BGP Peering and RPKI
 
netconf, restconf, grpc_basic
netconf, restconf, grpc_basicnetconf, restconf, grpc_basic
netconf, restconf, grpc_basic
 
Cisco Connect Toronto 2018 DNA automation-the evolution to intent-based net...
Cisco Connect Toronto 2018   DNA automation-the evolution to intent-based net...Cisco Connect Toronto 2018   DNA automation-the evolution to intent-based net...
Cisco Connect Toronto 2018 DNA automation-the evolution to intent-based net...
 
Demystifying Networking Webinar Series- Routing on the Host
Demystifying Networking Webinar Series- Routing on the HostDemystifying Networking Webinar Series- Routing on the Host
Demystifying Networking Webinar Series- Routing on the Host
 
A 30-minute Introduction to NETCONF and YANG
A 30-minute Introduction to NETCONF and YANGA 30-minute Introduction to NETCONF and YANG
A 30-minute Introduction to NETCONF and YANG
 
Cisco Application Centric Infrastructure
Cisco Application Centric InfrastructureCisco Application Centric Infrastructure
Cisco Application Centric Infrastructure
 
WINS: Peering and IXPs
WINS: Peering and IXPsWINS: Peering and IXPs
WINS: Peering and IXPs
 
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014
 
2022.03.23 Conda and Conda environments.pptx
2022.03.23 Conda and Conda environments.pptx2022.03.23 Conda and Conda environments.pptx
2022.03.23 Conda and Conda environments.pptx
 
Module 1: ConfD Technical Introduction
Module 1: ConfD Technical IntroductionModule 1: ConfD Technical Introduction
Module 1: ConfD Technical Introduction
 
Overview Of I E C61850 Presentation..... W S M
Overview Of  I E C61850  Presentation..... W S MOverview Of  I E C61850  Presentation..... W S M
Overview Of I E C61850 Presentation..... W S M
 
Turnkey Network Services - Aviat Networks
Turnkey Network Services - Aviat NetworksTurnkey Network Services - Aviat Networks
Turnkey Network Services - Aviat Networks
 
Open ROADM Design and Technical Scope
Open ROADM Design and Technical ScopeOpen ROADM Design and Technical Scope
Open ROADM Design and Technical Scope
 
NETCONFとYANGの話
NETCONFとYANGの話NETCONFとYANGの話
NETCONFとYANGの話
 
Enabling Multi-access Edge Computing (MEC) Platform-as-a-Service for Enterprises
Enabling Multi-access Edge Computing (MEC) Platform-as-a-Service for EnterprisesEnabling Multi-access Edge Computing (MEC) Platform-as-a-Service for Enterprises
Enabling Multi-access Edge Computing (MEC) Platform-as-a-Service for Enterprises
 
Mpls Traffic Engineering ppt
Mpls Traffic Engineering pptMpls Traffic Engineering ppt
Mpls Traffic Engineering ppt
 
How BGP Works
How BGP WorksHow BGP Works
How BGP Works
 
Module 2: Why NETCONF and YANG
Module 2: Why NETCONF and YANGModule 2: Why NETCONF and YANG
Module 2: Why NETCONF and YANG
 

Viewers also liked

Heterogeneous Self-Service Automation for SDN Dev/Test
Heterogeneous Self-Service Automation for SDN Dev/TestHeterogeneous Self-Service Automation for SDN Dev/Test
Heterogeneous Self-Service Automation for SDN Dev/Test
QualiQuali
 
MC-LAG Configuration with BGP-base VPLS
MC-LAG Configuration with BGP-base VPLSMC-LAG Configuration with BGP-base VPLS
MC-LAG Configuration with BGP-base VPLS
Johnson Liu
 

Viewers also liked (20)

Model-driven Network Automation
Model-driven Network AutomationModel-driven Network Automation
Model-driven Network Automation
 
Bringing SDN to the Management Plane
Bringing SDN to the Management PlaneBringing SDN to the Management Plane
Bringing SDN to the Management Plane
 
Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)
Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)
Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)
 
Software-Defined Network Management
Software-Defined Network ManagementSoftware-Defined Network Management
Software-Defined Network Management
 
An open management plane (2015 Open Networking Summit)
An open management plane (2015 Open Networking Summit)An open management plane (2015 Open Networking Summit)
An open management plane (2015 Open Networking Summit)
 
gRPC: The Story of Microservices at Square
gRPC: The Story of Microservices at SquaregRPC: The Story of Microservices at Square
gRPC: The Story of Microservices at Square
 
Telemetry Onboarding
Telemetry OnboardingTelemetry Onboarding
Telemetry Onboarding
 
Telemetry Onboarding
Telemetry OnboardingTelemetry Onboarding
Telemetry Onboarding
 
Heterogeneous Self-Service Automation for SDN Dev/Test
Heterogeneous Self-Service Automation for SDN Dev/TestHeterogeneous Self-Service Automation for SDN Dev/Test
Heterogeneous Self-Service Automation for SDN Dev/Test
 
Stop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Stop Flying Blind! Quantifying Risk with Monte Carlo SimulationStop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Stop Flying Blind! Quantifying Risk with Monte Carlo Simulation
 
Industrial internet of things (IIoT) - 4th industrial revolution
Industrial internet of things (IIoT) - 4th industrial revolutionIndustrial internet of things (IIoT) - 4th industrial revolution
Industrial internet of things (IIoT) - 4th industrial revolution
 
MC-LAG Configuration with BGP-base VPLS
MC-LAG Configuration with BGP-base VPLSMC-LAG Configuration with BGP-base VPLS
MC-LAG Configuration with BGP-base VPLS
 
Googleのオープンなビーコン規格「Eddystone」とはなんなのか?
Googleのオープンなビーコン規格「Eddystone」とはなんなのか?Googleのオープンなビーコン規格「Eddystone」とはなんなのか?
Googleのオープンなビーコン規格「Eddystone」とはなんなのか?
 
Network Intent Composition in OpenDaylight
Network Intent Composition in OpenDaylightNetwork Intent Composition in OpenDaylight
Network Intent Composition in OpenDaylight
 
NANOG 68: Decoding Performance Data from Large-Scale Internet Outages
NANOG 68: Decoding Performance Data from Large-Scale Internet OutagesNANOG 68: Decoding Performance Data from Large-Scale Internet Outages
NANOG 68: Decoding Performance Data from Large-Scale Internet Outages
 
Software forwarding path
Software forwarding pathSoftware forwarding path
Software forwarding path
 
インフラ屋の友:Tera Term
インフラ屋の友:Tera Termインフラ屋の友:Tera Term
インフラ屋の友:Tera Term
 
ネットワークAPI のあれこれ (ENOG37)
ネットワークAPI のあれこれ (ENOG37)ネットワークAPI のあれこれ (ENOG37)
ネットワークAPI のあれこれ (ENOG37)
 
Arista reinventing data center switching
Arista   reinventing data center switchingArista   reinventing data center switching
Arista reinventing data center switching
 
Infrastructure as Code for Network
Infrastructure as Code for NetworkInfrastructure as Code for Network
Infrastructure as Code for Network
 

Similar to SDN in the Management Plane: OpenConfig and Streaming Telemetry

Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Lightbend
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
Walid Shaari
 
The differing ways to monitor and instrument
The differing ways to monitor and instrumentThe differing ways to monitor and instrument
The differing ways to monitor and instrument
Jonah Kowall
 
Parallel analytics as a service
Parallel analytics as a serviceParallel analytics as a service
Parallel analytics as a service
Petrie Wong
 

Similar to SDN in the Management Plane: OpenConfig and Streaming Telemetry (20)

Extending SDN beyond the control plane
Extending SDN beyond the control planeExtending SDN beyond the control plane
Extending SDN beyond the control plane
 
Open management interfaces for NFV
Open management interfaces for NFVOpen management interfaces for NFV
Open management interfaces for NFV
 
Using Kubernetes to make cellular data plans cheaper for 50M users
Using Kubernetes to make cellular data plans cheaper for 50M usersUsing Kubernetes to make cellular data plans cheaper for 50M users
Using Kubernetes to make cellular data plans cheaper for 50M users
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 
Kick starting Network Automation
Kick starting Network AutomationKick starting Network Automation
Kick starting Network Automation
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control Interface
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
 
Yotpo microservices
Yotpo microservicesYotpo microservices
Yotpo microservices
 
Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...
Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...
Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...
 
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by FlinkFlink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
Flink Forward Berlin 2017: Hao Wu - Large Scale User Behavior Analytics by Flink
 
ietf115-network-telemetry-data-mesh-challenges.pptx
ietf115-network-telemetry-data-mesh-challenges.pptxietf115-network-telemetry-data-mesh-challenges.pptx
ietf115-network-telemetry-data-mesh-challenges.pptx
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
btNOG 5: Network Automation
btNOG 5: Network AutomationbtNOG 5: Network Automation
btNOG 5: Network Automation
 
OpenDaylight Update (June 2018)
OpenDaylight Update (June 2018)OpenDaylight Update (June 2018)
OpenDaylight Update (June 2018)
 
The differing ways to monitor and instrument
The differing ways to monitor and instrumentThe differing ways to monitor and instrument
The differing ways to monitor and instrument
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
Parallel analytics as a service
Parallel analytics as a serviceParallel analytics as a service
Parallel analytics as a service
 

Recently uploaded

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

SDN in the Management Plane: OpenConfig and Streaming Telemetry

  • 1. Better management of large-scale, heterogeneous networks toward a programmable management plane Joshua George, Anees Shaikh Google Network Operations www.openconfig.net
  • 2. Management plane challenges Rethinking telemetry -- efficient, large-scale monitoring OpenConfig -- community-driven API development3 2 1 Agenda 2
  • 4. Challenges of managing a large-scale network ● more than 8M OIDs collected every 5 minutes ● more than 20K CLI commands issued and scraped every 5 minutes ● many tools, and multiple generations of software Opportunity for significant OPEX savings: reduced outage impact, simplification of management stack, better scaling 4 ● 20+ network device roles ● more than half dozen vendors, multiple platforms ● 4M lines of configuration files ● up to ~30K configuration changes per month
  • 5. Management plane is way behind ● proprietary CLIs, lots of scripts ● imperative, incremental configuration ● lack of abstractions ● configuration scraping from devices ● SNMP monitoring -- not always “simple” and not often scalable 5
  • 6. Configuration • describes configuration data structure and content • common modeling language: YANG • multiple data encodings: protobuf, XML, JSON, ... Topology • describes structure of the network • common modeling language: multiple • data encoding: protobuf, ... Telemetry • describes monitoring data structure and attributes • common modeling language: exploring YANG • data delivery: RPC, protobuf inside UDP Model-driven network management 6
  • 8. Telemetry solutions today What do we use? Often SNMP is the default choice. ● legacy implementations -- designed for limited processing and bandwidth ● expensive discoverability -- re-walk MIBs to discover new elements ● no capability advertisement -- test OIDs to determine support ● rigid structure -- limited extensibility to add new data ● proprietary data -- require vendor-specific mappings and multiple requests to reassemble data ● protocol stagnation -- no absorption of current data modeling and transmission techniques 8
  • 9. Telemetry challenges ● SNMP object collection growing with each platform generation ○ e.g., 100K objects on current platforms, expected to grow 3x over next 2 generations ○ similar for object collection frequency ● Future devices continue to grow in density and drive this trend ○ scale limitations in data acquisition at high frequencies ● Near-real-time acquisition and access to monitoring data is a requirement for <insert buzzword here> ○ traffic management, tight control loops, fast recovery 9
  • 10. I get it, you really don’t like SNMP… 10 but do you have a better idea?
  • 11. Rethinking telemetry...reverse the flow ○ stream data continuously -- with incremental updates based on subscriptions ○ observe network state through a time-series data stream ○ devices programmed with a data model describing desired structure and content ○ efficient, secure transport protocols 11
  • 12. Telemetry framework requirements ● network elements stream data to collectors (push model) ● data populated based on vendor-neutral models whenever possible ● utilize a publish/subscribe API to select desired data ● scale for next 10 years of density growth with high data freshness ○ other protocols distribute load to hardware, so should telemetry ● utilize modern transport mechanisms with active development communities ○ gRPC (HTTP/2), Thrift, etc. ○ protocol buffer over UDP 12
  • 13. Example telemetry configuration flow 13 Network Management System Structured inventory and set of telemetry capabilities pushed to the NMS Rules for typical monitoring Tactical overrides from operators Generate monitoring configuration Publish to network element gRPC endpoint
  • 14. Example telemetry data flow 14 Collection Infrastructure Send statistics every X seconds Asynchronous event reporting Operator requests for ad-hoc data. 3 types of telemetry events: ● Bulk time series data ○ All interface stats every 10 seconds. ● Event/edge driven updates ○ LSP A is now down. ● Operator request/response ○ Show me oper state for all interfaces.gRPC endpoint
  • 15. Practical realization ● Streaming telemetry is beyond an idea stage, but is far from a final product ● Multiple vendor implementations now available for experimentation ● Development is ongoing -- now is the time to share your requirements and make your voice heard ! 15
  • 17. OpenConfig ● Informal industry collaboration of network operators ● Focus: define vendor-neutral configuration and operational state models based on real operations ○ Adopted YANG data modeling language (RFC 6020) ● Participants: Apple, AT&T, BT, Comcast, Cox, Facebook, Google Level3, Microsoft, Verizon, Yahoo! ● Primary output is model code, published as open source via public github repo ● Ongoing interactions with standards and open source communities (e.g., IETF, ONF, ODL, ONOS) 17
  • 18. Example configuration pipeline configuration data vendor-neutral, validated multiple vendor devices 18 OC YANG models configuration generation gRPC req operators intent API “drain peering link” update topology model gRPC endpoint
  • 19. Extending OpenConfig models ● base OpenConfig model as a starting point ● vendors can offer augmentations / deviations ● operators can add locally consumed extensions base model X vendor modifications local modifications extended model 19
  • 20. OpenConfig releases and roadmap Data models (configuration and operational state) ● BGP and routing policy ○ multiple vendor implementations in progress ● MPLS / TE consolidated model ○ RSVP / TE and segment routing model as initial focus ● design patterns for operational state and model composition ● tools for translating YANG models to usable code artifacts ○ e.g., pyangbind 20 Models in progress ● interfaces, system, optical transport, ...
  • 21. Models must be composed to be useful ● model composition framework is critical missing piece from existing model-building efforts 21
  • 22. Modeling operational state 22 Types of operational state data ● derived, negotiated, set by a protocol, etc. (negotiated BGP hold-time) ● operational state data for counters or statistics (interface counters) ● operational state data representing intended configuration (actual vs. configured) Clear benefits from using YANG to model both configuration and operational state in the same data model ● but … YANG focus has primarily been config, NETCONF-centric, lack of common conventions
  • 23. Summary ● New networking paradigms like SDN focus mostly on control ○ it’s time for the management plane to join the age of SDN ● Core principles: ○ model-driven management ○ streaming telemetry to scale monitoring and improve freshness ○ vendor-neutral, extensible APIs for managing devices ● Architecture and emerging vendor implementations of multi- mode telemetry solutions ● OpenConfig is a focused effort by operators to develop vendor- neutral models to define management APIs 23 Operators: get involved and push your vendors for support on your gear!
  • 25. gRPC: multi-platform RPC framework gRPC features ● load-balancing, app-level flow control, call-cancellation ● serialization with protobuf (efficient wire encoding) ● multi-platform, many supported languages ● open source, under active development gRPC leverages HTTP/2 as its transport layer ● binary framing, header compression ● bidirectional streams, server push support ● connection multiplexing across requests and streams 25 http://grpc.io
  • 26. OpenConfig and standards (e.g., IETF) ● primary goal is native implementation of OpenConfig models ○ OpenConfig is not a standards group ● publish models and documentation in IETF to inject operator perspective into standards process ● adoption of OpenConfig models and ideas into standards can simplify development efforts for vendors 26
  • 27. Additional “observations” ● YANG and NETCONF should be decoupled -- each are independently useful ● YANG needs to evolve more rapidly at this early phase, stabilize as real usage increases ● current YANG model versioning is not helpful -- treat models like software artifacts, not dated documents ● current standard models should be open for revisiting and revising ● should not rush to standardize more models until they are deployed and used in production 27 these are not necessarily OpenConfig consensus views
  • 28. Current OpenConfig “process” ● initial models developed by OpenConfig ● extensive collaboration with vendors ● leverage existing work where possible ● publish models and docs 28
  • 29. Intent-based configuration flow abstract configuration models Config Model Topology Model configuration intent operators declarative API configurationflow configuration pusher NETCONF, RESTCONF, JSON-RPC, ... ... authoritative config store config generation device-level configuration standard models vendor-neutral configuration models generated configuration instances config generation authoritative config storeapplication NB APIs Network OS SB protocols analagous SDN stack 29
  • 30. Telemetry required for a full solution ● Popular network technology topics today focus on more granular control of network traffic decisions. ○ APIs, RPC frameworks, agents, controllers, overlays… ● Developing an awesome method to control network elements is only half of the solution. ● Data accuracy and freshness are limiting factors in a control solution’s optimality. 30