Service Assurance Constructs for Achieving Network Transformation - Sunku Ranganath

Sunku Ranganath
https://www.linkedin.com/in/sunkuranganath/

Legal Disclaimer
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any
warranty arising from course of performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel
representative to obtain the latest forecast, schedule, specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on
request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm.
Intel, the Intel logo, Intel Resource Director Technology, Intel Run Sure Technology, Intel Node Manager, are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others
Copyright © 2017 Intel Corporation. All rights reserved.

3
Acknowledgementsto
• Tim Verrall
• John Browne
• Damien Power
• Emma Collins
• Jean-Christophe Bouche
• Jim Greene
• Krzysztof Kepka
• Jabir K Kadavathu
• Michal Kobylinski

4
Agenda
• Service Assurance
• Monitoring & Metrics
• OPNFV Barometer
• Integration & Provisioning
• Prometheus
• Kafka
• ONAP & VES
• PNDA
• Fitting Together

5
WhatisServiceAssurance
The application of policies/processes to ensure
that services offered over networks meet a pre-
defined service quality level for an optimal user
or subscriber experience.
SA Technologies enable to monitor FCAPS
(Fault, Configuration, Accounting, Performance
& Security) attributes on existing network
infrastructure
Figure: Service Assurance mapped to ETSI model

6
ThreekeyelementsofaServiceAssurancePlatform
 Monitoring: Enabling deeper management and tracking of specific service levels
– Platform & Network counters to track usage and performance to configured parameters
 Presentation: Reporting to enable reaction to service level changes:
– Support for the detection of trending against configured parameters and the enabling of capacity plan
changes based on those trends
 Provisioning: Enable configuration of service levels based on workload or service priority:
– Allocate or partition platform resources such as CPU, memory, cache, and network bandwidth

7
ServiceAssurance“Phased”EvolutionforNFV/SDN
 Phase 1 - Equivalence (Virtualized + Interworking with existing management systems)
 Phase 2 - Automated by MANO+SDN Controller
 Phase 3 - Predict failures and adapt automatically
Platform Service Assurance
- Equivalence
• Platform Service Assurance supporting:
•Intel RAS Technologies
•Cache Config & Monitoring
•Bios Config & Reporting
•Fastpath DPDK Interface Reporting
•Fastpath DPDK Keep Alive
•Virtual Switch Health
•Host Health
• …….
Platform Service
Assurance (MANO +
SDN Controller)
•VIM and above, support:
• Enable RAS Technologies
• Enable Watchdog Metrics
• Enable DPDK and Keep Alive
• Enable Host Health
• Policy Based Provisioning
• …
Predictive Platform
Service Assurance
•Predict Failures and Adapt
Automatically:
• Automated and Adaptive to
changes notified in metrics
• Closed loop and Dynamic SA
environment
•
Phase 1 Phase 2 Phase 3
If you can’t measure and control the underlying platform resources, it is hard to
measure, monitor and guarantee services running on that infrastructure

PlatformObservability&ServiceAssurance(SA)
• Observability: Ability to expose state of the platform to ensure Service Level
Objectives are met
• Observability Considerations: Logging, Metrics & Tracing
• Communications Service Provider Context:
• Care about overall Service Assurance
• Both Monitoring & Observability are important
• Service Assurance encompasses aspects of Observability

9
NetworkingClosedLoops
Platform Resources
Forwarding Plane
Interfaces
Interfaces
TrafficTraffic
Platform
Analytics
Systems
Business Applications
Setting of Policy
SDN/NMS
Network Services
Cloud and Virtual
Management
MANO
EMS VNFM
Infrastructure
Control
Application
Independent Closed Loops: SDN, Cloud & Virtual Mgt, Platform
Local
Platform
Agent
Telemetry
distribution
or storage or
…..
Platform
Telemetry
Policy Based Provisioning
Control Loops

11
CollectdMonitoringAgent
Collectd: Why & What
• Statistics collection daemon
• Uses read or write plugins to collect metrics write to an end
point
• Open source
• Widely adopted
• Configurable Collection Interval
Various Plugin types:
• Input/Output
• Binding Plugins
• Logging Plugins
• Notification Plugins
• Other: Network plugin with both send/receive feature
Figure: Collectd Architecture
https://github.com/collectd/collec
td

12
PlatformTelemetryExposure&Integration
Compute Network Storage
Hypervisor [RT/SA KVM4NFV extensions]
NFVI
IPFIX
Virtualised
Compute
Virtualised
Network
Virtualised
Storage
E.g.
Working/Protect
Failover
Local
Corrective
Action
Enterprise
MIB
SYSLO
G
Collectd
PMU^
counters NIC
counters
vSwitch
counters
SNMP API
Perfmon
MIB
Common / Standard Open APIs
Fast Path
Triggers on events or
counters
VM Stall
Detection/
RT Stall Detection
Monitoring/
Analytics
Systems
Slow Path
Periodic Pull 1/15mins
RAS Hypervisor/Contain
er Counters
Container
Monitoring
Solutions
(Prometheus
….)
Includes
NetFlow Collectors
Vendor SA
Middleware
Intel® Node
Manager
NFV Platform
MIB
Standard Open APIs
Intel Components
Open Platform
Collectors
Intel® Run Sure Technology
MCA* PCIe AER
Resilient System Technology
Resilient Memory Technology
SDDC DDDC+1 Mirroring
RAID/
NVMe*
Intel® Rapid
Storage
Technology
sFlow
Intel®
Management
Engine
IPMI
Ceilometer
Aodh
Vitrage
Congress
In progress
Done/Integrated
OpenStack*
Collectd PluginsIntel® Infrastructure
Management Technologies
Gnocchi
VES Plugin
Redfish
C
M
T
Intel® RDT
C
A
T
M
B
M
C
D
P
P
O
W
E
R
Out Of
Band
Telemetry
Kafka Prometheus
OpenStack*
VIM
PMU^: Performance Monitoring Unit

13
PlatformTelemetryOptions-Southbound
Plugin Description
Intel RDT Plugin A read plugin that provides the last level cache utilization and memory bandwidth utilization
Huge Pages Plugin Huge pages plugin allows the monitoring of free and used hugepage numbers/bytes/percentage on platform.
vSwitch Stats Plugin A read plugin that retrieves interface/link stats from OVS.
vSwitch Events
Plugin
A read plugin that retrieves events (like link status changes) & liveliness from OVS.
IPMI Plugin A read plugin that reports platform thermals, voltages, fan speed, current, flow, power etc. Also, the plugin monitors
Intelligent Platform Management Interface (IPMI) System Event Log (SEL) and sends appropriate notifications based on
monitored SEL events.
Virt Plugin (Libvirt) A read plugin that uses virtualization API libvirt to gather statistics about virtualized guests on a system directly from the
hypervisor, without a need to install collectd instance on the guest.
DPDK Stats Plugin A read plugin that retrieve stats from the DPDK extended NIC stats API.
DPDK Events Plugin A read plugin that retrieves DPDK link status and DPDK forwarding cores liveliness status (DPDK Keep Alive).
RAS Memory Plugin A read plugin that uses mcelog to check for memory Machine Check Exceptions and sends the stats for reported exceptions
PCIe AER plugin A read plugin that monitors PCIe standard and advanced errors and sends notifications about those errors
Note: Not an exhaustive list

14
PlatformTelemetryOptions-Southbound
Plugin Description
DPDK Stats Plugin A read plugin that retrieve stats from the DPDK extended NIC stats API.
PMU Plugin A read plugin that collects performance monitoring events supported by Intel Performance Monitoring Units (PMUs). The
PMU is hardware built inside a processor to measure its performance parameters such as instruction cycles, cache hits, cache
misses, branch misses and many others.
Log parser Plugin A read plugin that uses mcelog to check for cpu, IO, QPI or system Machine Check Exceptions and sends the stats for
reported exceptions
RedFish Plugin A read plugin that collects metrics available via redfish endpoints, e.g. in RSD architecture.
Storage (RAID) Plugin A read plugin responsible for gathering the events from RAID arrays that were written to syslog by mdadm utility.
SMART Plugin A read plugin that gathers Self-Monitoring, Analysis And Reporting Technology (SMART) data from block devices, primarily
adding support for NVMe devices.
DataCenter Persistent
Memory Plugin
Provides metrics from Intel DataCenter persistent memory
Power plugin
enhancements
Added metrics for power and frequency plugins
• CPU Freq Plugin: # of p-state (CPU freq) transitions & time spent in each p-state
• Turbostat plugin: p-states enabled/disabled, Turboboost enabled/disabled, Platform Thermal Design Point, Uncore bus
ratio

15
PlatformTelemetryOptions-Northbound
Plugin Description
Gnocchi Plugin A write plugin that pushes the retrieved stats to Gnocchi. It’s capable of pushing any stats read through collectd to
Gnocchi, not just the DPDK stats.
Write_kafka plugin A write plugin that provides the metrics to Kafka
Write Prometheus Plugin Provides data to Prometheus than the collectd-exporter
Aodh Plugin A write notification plugin that pushes events to Aodh, and creates/updates alarms appropriately.
SNMP Agent Plugin A write plugin that will act as a AgentX subagent that receives and handles queries from SNMP master agent and returns
the data collected by read plugins. The SNMP Agent plugin handles requests only for OIDs specified in configuration file.
Supports SNMP: get, getnext and walk requests. SNMP write plugin is not supported by platform team.
AMQP1 plugin plugin to send metrics and events via amqp1 bus
Network Plugin Sends metrics to connected nodes
write-graphite widely used plugin to store metrics in graphite database

16
Barometer Strategy:
• Ensure platform metrics/events are
accessible through open industry standard
interfaces.
• Demonstrate platform & network
technologies can be monitored, consumed
and actioned in real time
Opnfvbarometer
One Click Install:
 Easy install/configuration
for customers
 One command to install
Collectd/Influxdb/Grafana
• Three container approach for
Collectd:
• Stable Container: latest stable branch
• Master Container: up to date with
master
• Experimental Container: cherry pick
features of interest

• Easier to deploy
• Standard environment
• Scalability
Collectd&Barometermicroservice
Reference container images are hosted @
https://hub.docker.com/r/opnfv/barometer-collectd/

18
Collectd&BarometerMicroservice
Containerisation with Ansible support:
• Installs Collectd, Influxdb , Grafana, Kafka & VES
containers
• Easier installation, configuration, collection and
visualization of the NFVI Metrics.
• Support a HA and a non HA deployment.
• Speed up deployment of collectd by providing
golden images.
Openstack kolla also builds containers based
on collectd & configurable through Ansible
Automation
• OPNFV CI ensure successful
Barometer deployment with
OPNFV installers
• Supports Apex & will be
adding Compass support
Fastest way to Introduce Platform Telemetry to ‘Your’ Infrastructure

19
EarlyAdoptionofIAFeatures–Upstream&Downstream
• Showcase IA feature’s telemetry via OPNFV
Barometer upstream
• Three container approach for Collectd:
• Stable Container: latest stable branch
• Master Container: up to date with master
• Experimental Container: cherry pick features of
interest
• Downstream IA specific plugins via Red Hat
OpenStack Platform
Experimental:
Latest & greatest
of IA Metrics
Master: Latest
accepted by the
community
Stable: Latest
stable release

Instructions: https://etherpad.opnfv.org/p/TelemetryTutorial
20

ONAP, Prometheus, Kafka, PNDA, VES, etc.
21

22
NsbprovidingAI/MLdata-sets
NSB framework used to run
test cases over varying
intervals on a commercial
EPC or similar use cases
Barometer used to set up
InfluxDB and Collectd
containers.
NSB
Compute StorageNetwork
Context
Bare Metal StandAlone Openstack
Traffic
generator
Commercial
VNF
Sample VNF
Test
Case
(s)
HTML
Report
DashB
oard
NFVi
App
Network
NFVi
Collectd pushes the platform metrics to InfluxDB while the test cases are been
executed.
The metrics from the VNF, traffic generator, and platform are all converted to csv
and sent to the data scientists.

23
• Open-source systems monitoring and
alerting toolkit
• Pulling model:
• Collectd native plugin
• Prometheus exporter
• Red Hat Service Assurance Framework uses
AMQP1 to push metrics to prometheus
collectd
or exporter
Barometer container
Img src: https://prometheus.io/docs/introduction/overview/

24
The telemetry framework
is a dynamic application
running atop OpenShift
(Kubernetes) using
several components such
as Prometheus, the Smart
Gateway, collectd and the
Apache QPID Dispatch
Router
Github:
https://github.com/redhat
-service-assurance
Red Hat Telemetry Framework
Source: https://telemetry-framework.readthedocs.io/en/master/overview.html

26
• Addresses need for common
global scale orchestration &
automation platform for
Telco, Cable & cloud
operators
• Framework that allows
specification of service in all
aspects – policy, control,
behaviour, analytics, closed
loop, etc.
Img src: https://www.onap.org/wp-content/uploads/sites/20/2018/06/ONAP_CaseSolution_Architecture_0618FNL.pdf
Figure: ONAP Architecture

27
• VES provides converged event stream format
to simplify closed loop automation
• Reduces effort to integrate VNF telemetry
• Integrate platform & VNF telemetry into
automated VNF management systems, like
DCAE
• Convergence to a common event stream
format and collection system
• Feeds VES collector in DCAE with unified data
VNFEventStream(VES)
Img src: https://wiki.opnfv.org/display/fastpath/VES+plugin+updates

28
It’s a
• Messaging System
• Pub-Sub Model
• Fault tolerant
Why Kafka:
• Build real-time streaming data
pipelines
• Build real-time applications to react
to streaming data
TopicKafka
Replicas – copies of partitions
Brokers – maintains published data (kafka
server)
Zookeeper – manages kafka brokers & notifies
producer/consumers
Cluster –
• more than one broker
• Manages persistence & replication of message
data

PlatformforNetworkDataAnalyticsPNDA.ioOverview
Simple, scalable open data platform
Provides a common set of services for
developing analytics applications
Accelerates the process of developing
big data analytics applications whilst
significantly reducing the TCO
PNDA provides a platform for
convergence of network data analytics
PNDA
Plugins
ODL
Logstash
OpenBPM
pmacct
Telemetry
Real
-time
DataDistribution
File
Store
Platform Services: Installation, Mgmt,
Security, Data Privacy
App Packaging
and Mgmt
Strea
m
Batch
Processing
SQL
Query
OLAP
Cube
Search
/
Lucene
NoSQL Time
Series
Data
Exploration
Metric
Visualisation
Event
Visualisation
PNDA Managed
App
PNDA Managed
App
Unmanaged
App
Unmanaged
App
Query
Visualisation
and Exploration
PNDA
Applications
PNDA
Producer API
PNDA
Consumer API

31
Source: https://www.cisco.com/c/dam/m/en_us/network-intelligence/service-provider/digital-transformation/knowledge-network-webinars/pdfs/1129-techad-ckn.pdf

42
Apache Avro
Language neutral data serialization system
Provides rich data structure for formatting
Stores the data definition in JSON format making it easy to read and interpret
The data itself is stored in binary format making it compact and efficient
Supports schemas for defining data structure

33
Source: https://pndablog.com/2017/06/21/introducing-red-pnda-a-pnda-platform-for-development-demonstration-and-education/

34
• Can integrate east/west
with MANO systems
• Collectd data ingestion
goes through kafka
topics
collectd
Img src: https://wiki.onap.org/display/DW/ONAP+Beijing+Release+Developer+Forum%2C+Dec.+11-13%2C+2017%2C+Santa+Clara%2C+CA+US?preview=/16002054/20874945/Telemetry-Analytics-ONAP-11Dec2017.pdf

35
• RAW format
• directly with write_kafka
collectd plugin
• with network collectd plugin
and logstash
• AVRO format
• with network collectd plugin
and logstash
Write_kafka
plugin
Network
plugin
Kafka
raw.log.localtest
Logstash
{ IN: udp w/
collectd codec
FILTER: none
OUT: kafka w/
default json }
Kafka
processor
Kafka
consumer
/ PNDA
analytic app
Network
plugin
Kafka
avro.log.localtest
Logstash
{ IN: udp w/
collectd codec
FILTER: mutate
pnda schema
OUT: kafka w/
(pnda)avro codec}
Kafka
processor
Kafka
consumer
/ PNDA
analytic app
PNDAdataingestion

MultipleClosedLoopsPlan & Provision
Offline
feedback loop
Design Analyze
Use cases (Loops)
• Capacity planning
• Peering planning
• Cache placement
• …
Optimize
MonitorOrchestrate
Near-real
Time
Feedback loop Real-Time
Feedback loop
Use cases (Loops)
• Service assurance
• Security operations
• …
Use cases (Loops)
• Traffic Engineering:
Network Optimization
• Demand placement
• Workload placement…
Telemetry
Telemetry
Real-time/Near Real-time Loops - Automated
Telemetry
Offline Processing
Online Processing
Source: https://pndablog.com/2017/06/05/feedback-loops-and-closed-loop-control/

38
ClosedLoops–NetworkingStack
Application Layer
Network Data Analytics
Orchestration, Management, Policy
Cloud & Virtual Management
Network Control
Operating Systems
Data Path
Hardware/
Disaggregated Hardware
ServicesManagement&ControlInfrastructure
Micro-seconds/
Milliseconds
Mins/Hours/Days
Closed Loop
Reaction Time
Domain Knowledge
Local to
Platform
End to End
Enforce Local
Policy
Deployment
Policies
Enforce Network
Domain Policy
Map Policies
HW Enabled
Loops (eg RAS)
Enforce DP
Loops (HA etc.)
Analyze/
Plan Policies
High Speed Control Loops are Close to the Platform
Seconds/Mins

Analytics
39
Closed Loops – Business Cases
Improved Customer
Experience
Cloud Optimization
& Efficiency
Edge Placement
Service Healing
Differentiated QoS
Service Optimization
Energy Optimization
Capacity Optimization
Cloud Configurations
Business
Use Cases
AI/ML/DL
Platform(s)
Feature Exposure Provisioning Telemetry
Local Policy Enforcement Agent(s)
For Local Dynamic Control
Intel® Infrastructure
Management Tech
Intel®
RDT
Power
Monitoring/Storage
NFV Orchestrator (NFVO) [eg ONAP/OSM]
Security
Threat Detection
Threat Response
Business Applications
collectd
Policy Based Provisioning
Control Loops
VNF Manager (VNFM)
OpenStack* Kubernetes* Telemetry
I/F
Telemetry
I/F
Intel® Run Sure
Technology
Bare Metal
Telemetry
I/F

42
Collectd 101 materials
• Collectd 101
• https://wiki.opnfv.org/display/fastpath/Collectd+101
• Write simple read plugin
• https://wiki.opnfv.org/display/fastpath/Collectd+how+to+implement+a+si
mple+plugin
• Beyond the plugin
• https://wiki.ith.intel.com/display/HA/SA+Transition+Beyond+the+Plugin

Barometer Links
Barometer Home: https://wiki.opnfv.org/display/fastpath/Barometer+Home
Collectd advantages, etc.:
https://wiki.opnfv.org/display/fastpath/Collectd+advantages%2C+disadvantages
+and+a+few+asides
Collectd integration with Prometheus:
https://wiki.opnfv.org/display/fastpath/Collectd+integration+with+prometheus
Metrics/Events through Barometer (not on Collectd site):
https://wiki.opnfv.org/display/fastpath/Collectd+Metrics+and+Events#CollectdM
etricsandEvents-Metrics
43

44
• Industry Standard Software Defined
Management for Converged, Hybrid
IT
• REST API / HTTPS / JSON
• Providing among others ability to
collect OOB telemetry
• v1.0 – power, temperature, fan speed
• Last release 2018.2
• Eventing (Metric Reports)
Redfish
Src: https://www.dmtf.org/sites/default/files/2017_12_Redfish_Introduction_and_Overview.pdf

41
• Read plugin for OOB telemetry
• Configurable by collectd.conf
• Queries – list of redfish path definitions to
metric collections
• Services – list of endpoints to send requests
for chosen set of queries
• Plugin future direction (wip)
• Extended Telemetry
• Eventing mechanism (TelemetryService)
• More dynamic config, autodiscovery,
wildcards
Redfish collectd plugin
Config / Ctx
- Queries path defs
- Services:
- Endpoint
- Queries
Libredfish queues
Redfish API
[PODM, PSME, ...]
request
Parse json
dispatch

Service Assurance Constructs for Achieving Network Transformation - Sunku Ranganath

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Semelhante a Service Assurance Constructs for Achieving Network Transformation - Sunku Ranganath

Semelhante a Service Assurance Constructs for Achieving Network Transformation - Sunku Ranganath (20)

Mais de Liz Warner

Mais de Liz Warner (19)

Último

Último (20)

Service Assurance Constructs for Achieving Network Transformation - Sunku Ranganath

Notas do Editor