@Iben Rodriguez from @Spirent talks at the SDN World Congress about the importance of and issues with NFV VNF and SDN Testing in the cloud.
#Layer123 Dusseldorf Germany 20141016
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Iben from Spirent talks at the SDN World Congress about the importance of and issues with NFV Testing in the cloud.
1. Performance – Scaling out NFV implementation...
Iben Rodriguez
Principal Architect
Cloud / Virtualization
October 16th 2014 - SDN World Congress - Dusseldorf, Germany
version 03–10-16-2014
2. Let me tell you a story...
Background – virtualization - SDN (NVo3) - NFV (VNF)
Decision Process
Technology Adoption Lifecycle
Typical use cases for Virtualized Network Functions
Virtualization Impact on the Datacenter
Options for Testing and traffic generation
Importance of Testing Methodologies
Example python script for automation and test case generation
CPU Core Distribution – lessons learned
Continuous Testing – integrating all this into the development and
release deployment lifecycle.
2 PROPRIETARY AND CONFIDENTIAL
5. SDN for Service Providers
Must maintain SLAs
Limited bandwidth available in network – adding links is expensive
Increased VoIP/Video applications putting a stress on networks
Network resilience – convergence, failover, protection switching, fast reroute,
minimal service disruption
Creation and management of Traffic Engineering service paths
Stringent requirements for fault management & OAM
5 PROPRIETARY AND CONFIDENTIAL
9. SDN/NFV Timeline
2013
9 PROPRIETARY AND CONFIDENTIAL
2015
2016
2017-2020
2014
POC
Field Trials
Start of Commercial
Deployment
Widespread small
Commercial
Deployment
The new
normal
10. 10 PROPRIETARY AND CONFIDENTIAL
Performance Testing
Performance benchmarking of VNFs, hypervisors and COTS H/W
Portability & Interoperability
Security & Reliability
Performance isolation
Service continuity
On demand scale testing
Management & Orchestration
VM Migration
Fail-over convergence time
Testing security for resources shared across VNFs
Test VM M&O system for in lab environment
Topology validation
Seamless integration of test VMs with SPs M&O systems for live &
post deployment environment
Test fault detection capability of M&O systems
13. Virtualization Impact
Orchestration
Overlay Network
Cloud DC SP Access/Edge
13 PROPRIETARY AND CONFIDENTIAL
SP Mobility
Underlay Network
Network virtualization
vRouter testing
vBRAS testing
PCE/BGP-LS validation
10/40/100G
vEPC Capacity
Offload testing
Busy hour call Modeling
Service Chaining
Elastic Performance
Service Availability
VM Migration
Multi-tenancy
Virtual Infrastructure
14. Testing for Service Provider and Cloud Datacenter VNFs
Copper
Fiber
Cable
Mobility Testing
vEPC Cloud Testing - Data Center
14 PROPRIETARY AND CONFIDENTIAL
Network Service
Provider
P-GW
Data Center
Interconnect
Cloud Service Provider
Cloud
Services
Cloud
Services
Cloud Services
Cloud Service Provider
Cloud
Services
Cloud
Services
Cloud Services
Intra-DC network
SDN NFV
VM VM …
VNF VNF …
Intra-DC network
SDN NFV
VM VM …
VNF VNF …
SP Core
SP Edge
Wireless
2G
3G
4G
Wifi
Residential
Enterprise
Copper
Fiber
Cable
SDN
NFV
SDN NFV
SDN
NFV
SDN
NFV
Edge
Core
Edge
MME
S-GW
…
P-GW
EPC
Layer 2-3 Testing
Access, Edge, Core
15. Focus Areas – Network Testing
vFW vBNG
vRouter
15 PROPRIETARY AND CONFIDENTIAL
Controller
platform
NETCONF SNMP
Topology / Controller
Config
Manager
BGP-LS
vCE vFW vBNG
vRouter
vCE
OSS/BSS Open Stack /
Cloud Stack
Applications Test tools/
Methodologies
REST API
Open Flow PCEP
NETCONG/YANG
Stats /
Monitoring
Northbound API
Southbound API
Segment
Routing
MPLS Switching
Routing, VPNs
16. DX2 FX2 MX2
16 PROPRIETARY AND CONFIDENTIAL
100G
MODULES
DX2 FX2 MX2
Interface:
CFP2
CFP4 (adaptor Q4)
QSFP-28 (adaptor Q4)
Speed per Interface:
1x100G (Now)
2x40G (4Q)
8x10G (4Q)
Available: Now (100G)
Interface:
CFP2
CFP4 (adaptor Q4)
QSFP-28 (adaptor in Q4)
Speed per Interface:
1x100G
Available: Q4 (Dec), 2014
Interface:
CFP2
CFP4 (adaptor Q4)
QSFP-28 (adaptor in Q4)
Speed per Interface:
1x100G
Available: Q4 (Nov), 2014
17. 17 PROPRIETARY AND CONFIDENTIAL
100G Technology
Flexibility
A Single Module for Multiple Technologies
Native Interface of CFP2
Pluggable & Mixable Adaptors for:
• CFP4
• QSFP-28
• CXP
Available on all DX2, FX2 & MX2 Modules
18. DHCPv6 Multiple Addresses
New Product BPK-1320
Emulate CPE requesting multiple addresses
Pack multiple IA_NA and IA_PD in a single message sequence
• IA_NA (Identity Association for Non-Temporary Address)
• IA_PD (Identity Association for Prefix Delegation)
18 PROPRIETARY AND CONFIDENTIAL
19. Use case–Validate failure convergence of vRouter
19 PROPRIETARY AND CONFIDENTIAL
STC test
orchestrator
(Velocity)
REST
Orchestrator
(e.g. OpenStack)
SDN Controller
Monitor Config
COTS Servers hosting VNFs
Onboard vRouter, vFW and vIDS instances on
COTs server and connect to STC chassis as
shown
Initiate high scale control and data plane
traffic from STC (e.g. BGP, OSPF) &
establish vRouter upper limits
Initiate failure from STC (BFD timeout or
link failure)
Validate the migration of VNFs to another
server and measure convergence times for
control plane and traffic
vFW vIDS vRouter
vFW vIDS vRouter
STC
STC
Primary
Backup
20. The Spirent EVCI Solution
20 PROPRIETARY AND CONFIDENTIAL
Leverage iTest automation to manage the integration
between CI and the virtual environment
Automation Virtualizatio
n
Continuous
Integration
Initiate iTest
Automation Manage VMs
Source &
Artifact
Control
Build artifacts
iTest projects
Test artifacts
iTest automation
projects
Build artifacts
Test Artifacts
Support files
21. Metrics to evaluate during test iterations
• PCI Bus Utilization
• CPU Wait Time per core
• Memory Utilization per socket
• Power usage - efficiency
• Storage Input Output
21 PROPRIETARY AND CONFIDENTIAL
32. Customer / On-Premise
Customer’s
Servers, Spirent
HW/SW
Spirent Hosted Elastic
Virtual Private Test Beds
Benchmark-A-A-S
Velocity EVCI – Virtual Network Test Beds Orchestration
Test Suites
32 PROPRIETARY AND CONFIDENTIAL
Cloud Under Test (CUT)
OpenFlow
Controller & Switch
Emulation
VXLAN/Geneve Switch
Emulation
10/40/100G
Spirent Elements
Test Suites
Test Suites
Test Suites
Topology Templates
vDUT Image Management
Spirent VCT LAB –
NEPHOSCALE
Public Clouds –
RAVELLO
Test VMs
10/40/100G Test VMs
Amazon
Google
Test VMs
Azure
Customer’s CI
Orchestration
Bare Metal
Servers
Bare Metal
Servers
Results
Jenkin Jobs (instantiate test
environment, run test)
Virtual
Test Bed
Instances
/ Jenkins
Jobs
Virtual Test Bed
Instances /
Jenkins Jobs =
33. Spirent Communications
Thank You – Questions?
For this and other exciting testing products for SDN and
OpenFlow please see us at booth #28
• Emulate 1000+ OpenFlow 1.3
Switches using pre-canned
topologies per port
• Support LLDP Topology Discovery
• High Rate Packet-In testing
http://www.spirent.com/go/sdnshowcase
Ralph.Daniels@spirent.com
33 PROPRIETARY AND CONFIDENTIAL
33
• Interactive, multidimensional
network topology view
• 360⁰ navigation with context-aware
network controls
• Clearly see areas of congestion
http://www.real-status.com/sdn
40. 4 Core CPU balanced across PCI BUS
40 PROPRIETARY AND CONFIDENTIAL
41. LACP Hot-Standby & Multi Chassis LAG
New Product LAG Emulation BPK-1312
Support for Active/Stand-by ports in a MC-LAG configuration
Support for DUT configured Min and Max ports in a LAG
DUT is typically the Master (higher System ID)
• DUT determines which ports are Active based on Partner Port ID
• Remaining ports put in (Hot)Standby mode (LACP Out-Of-Sync)
41 PROPRIETARY AND CONFIDENTIAL
DU
T
STC
ICC
P
Traffic
Break one or more links on
the Active set
Measure Frame Loss Duration
for traffic to switch to
Standby-Ports
43. DHCP over L2GRE
New Product – Emulation over L2GRE (BPK-1319)
43 PROPRIETARY AND CONFIDENTIAL
Wi-Fi
Offlo
ad
Gate
way
3G
Core
4G
Core
Data
Networks
STC emulates
UE, SSID, &
AP
STC emulates
Core side
UE SSID Access
GRE
Tunnel Wifi
Point DHCP Server
Offload
Gateway
D
H
C
P
DHCP Discover
DHCP Offer
DHCP Request
GRE Tunnel
DHCP Ack
Data
Packet
Data
44. Segment Routing w/ IGP(OSPF/ISIS)
MPLS Simplified and Optimized
New Part Number BPK-1317 (OSPFv2) & BPK-1318 (ISIS)
Leverage existing MPLS forwarding and VPN services
Reduced State – LDP & RSVP protocols no longer needed
Scalable – Fewer number of MPLS Labels to manage
Reliability & Availability - entirely automated 50msec Fast Reroute or
Failover
IGP determined path
44 PROPRIETARY AND CONFIDENTIAL
DUT B
STC C
STC D
STC A
STC-E
10.1.1.0/24
RID= 1.1.1.12
SID=12
RID= 1.1.1.10
SID=10
RID= 1.1.1.11
SID=11
RID= 1.1.1.99
SID=99
RID= 1.1.1.2
SID=2
RID= 1.1.1.1
SID=1
PHP off
IP dest=10.1.1.1
Label=99
IP dest=10.1.1.1
Label=1,9002,99
IF Cost x=y,
=>ECMP, Equal Cost Multi-path, load sharing
IF Cost x<y,
=> Path through Node C preferred
Explicit path
45. SP-SDN – Testing the PCE controller
New Product BPK-1315 (PCC) & BPK-1316 (PCE)
45 PROPRIETARY AND CONFIDENTIAL
Capacity planning
Calendaring
SDN (PCE)
Controller
(DUT)
REST API
Thrift API
A
D
Request
SLA Path
Data Analytics
Traffic Engineering
Database (TED)
Stateful
PCE
Traffic Engineering status
Report BGP-LS/BGP-TE
N
o
r
t
h
S
o
u
t
h
Update
Initiate
STC
STC
STC
Top Down Design - Use existing
network infrastructure, only update
head-end/ingress node
In built High Availability - No need to
replicate MPLS and IGP Fast
ReRoute(FRR), protection switching
mechanisms
Separates Network Path
Computation from Topology
Determination
Networks nodes still have knowledge
of the topology and can fast reroute
in case of failure
PCE controller – Optimizes paths to
meet SLAs without using the High
Cost Links(Shortest Path)
46. Use case–Validate performance and auto scaling of vBNG
46 PROPRIETARY AND CONFIDENTIAL
STC test
orchestrator
REST
Orchestrator
(e.g. OpenStack)
SDN Controller
Monitor Config
vBNG
vBNG
Virtualization
vSTC
vSTC
Compute Storage Network
COTS Server hosting VNFs
Onboard vBNG and vSTC VMs using vendor
orchestrator and/or Spirent plugin
Assign appropriate cores/memory to VNFs
and originate/terminate traffic on vSTC
Measure the vBNG’s upper limits for control
and data plane performance
Validate the auto scaling capability of the
BNG by ensuring that additional cores are
assigned to vBNG or additional vBNGs are
spawned under following circumstances
• Data plane scale beyond normal limits
• Control plane scale (increasing PPPoE sessions)
48. Spirent‘s Strategic Foundation
Validate high
density edge
& core
routers
48 PROPRIETARY AND CONFIDENTIAL
Next-gen
protocols &
scale testing
Improve
customer
experience
Leader in
SDN/NFV
testing
Embed
Spirent in
millions of
devices
• FX2/MX2100G
• FX2/MX2 10G,
1G
• CFP2/CFP4
• 400G
• Transport
vehicles
• Home
appliances
• Monitoring in
SDN/NFV
environments
• Port Grouping
• MVPN
• LDPv6
• Protocol &
stream scale
• Virtual
infrastructure
testing
• VNF testing
• Methodologies
• PCE, BGP-LS
• Site surveys
• CR reduction
• CET
Currently Available In Progress 1-3 years
Notas do Editor
http://www.layer123.com/sdn-agenda-day2/
version 03
10-16-2014
We are seeing more actually deployed use cases
Open Daylight is strengthening
Spirent has been involved with ETSI NFV since the beginning – we've submitted test methodologies to the performance and reliability working groups
http://www.spirent.com/White-Papers/Broadband/PAB/Validate_NFV_Environments
Application driven networks
Service Agility
Dynamic provisioning of capacity
Reduced CAPEX & OPEX costs
These numbers are already out of date and the rate of adoption is increasing
Optimize the use of resource
Traffic engineering
Stringent service disruption, failover requirements
Superior Fault Management & OAM capabilities
http://www.ibenit.com/post/82614664996/sdn-nfv-logistics-curve
When researching for this post I found that this is related to the economic theory “Diffusion of Innovations” and the “Logistics Function”.
I wonder - where are you now in this adoption curve? What category is your organization in:
innovator - willing to take risks with financial resources help absorb failure
early adopter - new technology will help them stay competitive
early majority - above average social status yet lack opinion leadership
late majority - typically skeptical about innovation
laggard - aversion to change and focused on tradition
With successive groups of consumers adopting a new technology (shown in blue), its market share (yellow) will eventually reach the saturation level. In mathematics the S curve is known as the logistic function.
A logistic function or logistic curve is a common special case of the more general sigmoid function, with equation:
f(x) = \frac{1}{1 + \mathrm e^{-x}}
where e is Euler’s number (approximately equal to 2.71828). For values of x in the range of real numbers from −∞ to +∞, the S-curve shown above is obtained.
The logistic function can be used to illustrate the progress of the diffusion of an innovation through its life cycle. Historically, when new products are introduced there is an intense amount of research and development which leads to dramatic improvements in quality and reductions in cost. This leads to a period of rapid industry growth. Some of the more famous examples are: railroads, incandescent light bulbs, electrification, the Ford Model T, air travel and computers. Eventually, dramatic improvement and cost reduction opportunities are exhausted, the product or process are in widespread use with few remaining potential new customers, and markets become saturated.
Logistic analysis was used in papers by several researchers at the International Institute of Applied Systems Analysis (IIASA). These papers deal with the diffusion of various innovations, infrastructures and energy source substitutions and the role of work in the economy as well as with the long economic cycle. Long economic cycles were investigated by Robert Ayres (1989).[7] Cesare Marchetti published on long economic cycles and on diffusion of innovations.[8][9] Arnulf Grübler’s book (1990) gives a detailed account of the diffusion of infrastructures including canals, railroads, highways and airlines, showing that their diffusion followed logistic shaped curves.[10]
Carlota Perez used a logistic curve to illustrate the long (Kondratiev) business cycle with the following labels: beginning of a technological era as irruption, the ascent as frenzy, the rapid build out as synergy and the completion as maturity.[11]
Everett Rogers’ studies of technology diffusion have a direct application to the examination of Internet use. He describes the time-phased movement of adoption and adaptation in terms of an “S-curve,” which describes a slow initial rise over time, followed by a more rapid acceleration and finally a slowing toward steady state. S curves show the rate of adoption for six technologies in the US, beginning with telephone, followed by radio, television, cable television, VCR, Personal Computers and Internet. Telephone rises slowly. Radio, TV, VCR and Internet rise very steeply. TV seems to have risen fastest, and, like phones and radio, has achieved almost 100% diffusion. (Internet is unlikely to achieve this 100% saturation as rapidly since about half the remaining non-users in the US have declared themselves uninterested in joining the Internet.)
http://www.techknowlogia.org/tkl_active_pages2/CurrentArticles/main.asp?IssueNumber=16&FileType=HTML&ArticleID=398
http://www.ibenit.com/post/82614664996/sdn-nfv-logistics-curve
Security will never be the same again. It’s a losing battle. 40% of SDN adopters paying money for SDN network virtualization are doing it for a security use case implementing micro-segments on a per app basis overcoming the traditional limits of vlans and hard wired firewall policies.
One of the main take-aways I liked from Martin Casado’s talk was his “Technology Adoption Curve” showing the five steps for any new data center concept. This is what a typical CIO is now going through when learning about Virtualization, Cloud, SDN, and now NFV on their path to the SDDC.
Orchestration Testing
CRAN: Cloud Radio Access Network; S-GW: Serving Gateway; P-GW: Packet Gateway; PCRF: Policy Charging & Rules Function:; MME: Mobility Management Engine; BRAS: Broadband Remote Access Server
Architecting for Software Defined Infrastructure – A Microserver Perspective
Douglas Carrigan – Systems Engineer, DCG Center for Innovation, Intel Corporation
Christian Maciocco – Research Scientist, Intel Labs, Intel Corporation
DATS013
CRAN: Cloud Radio Access Network; S-GW: Serving Gateway; P-GW: Packet Gateway; PCRF: Policy Charging & Rules Function:; MME: Mobility Management Engine; BRAS: Broadband Remote Access Server
Architecting for Software Defined Infrastructure – A Microserver Perspective
Douglas Carrigan – Systems Engineer, DCG Center for Innovation, Intel Corporation
Christian Maciocco – Research Scientist, Intel Labs, Intel Corporation
DATS013
Telecomm Service Providers started with NFV and are now moving up the stack. Our target areas for helping them are …
Please check out the NFV POC Zone for demos of how to test for these use cases
These are representative software solutions.
It will all come down to cost for performance including latency and throughput.
There should be an operational component to consider - how will we get these cards installed in our commodity hardware?
Should you go with Netronome, Cavium, Freescale?
Who has the best offload processing hardware for the given technology stack?
Metrics to look at include:
PCI Bus
CPU
Memory
Power
Storage
Each of these already have a well documented set of counters that a Splunk app can report on for correlation during our test runs.
https://apps.splunk.com/app/725/#/documentation <-- for VMware
https://apps.splunk.com/app/1803/ <-- for Linux (KVM) uses Sysstat
iostat(1) reports CPU statistics and input/output statistics for devices, partitions and network filesystems.
mpstat(1) reports individual or combined processor related statistics.
pidstat(1) reports statistics for Linux tasks (processes) : I/O, CPU, memory, etc.
sar(1) collects, reports and saves system activity information (CPU, memory, disks, interrupts, network interfaces, TTY, kernel tables,etc.)
sadc(8) is the system activity data collector, used as a backend for sar.
sa1(8) collects and stores binary data in the system activity daily data file. It is a front end to sadc designed to be run from cron.
sa2(8) writes a summarized daily activity report. It is a front end to sar designed to be run from cron.
sadf(1) displays data collected by sar in multiple formats (CSV, XML, etc.) This is useful to load performance data into a database, or import them in a spreadsheet to make graphs.
sysstat(5) is just a manual page for sysstat configuration file, giving the meaning of environment variables used by sysstat commands.
nfsiostat-sysstat(1) reports input/output statistics for network filesystems (NFS).
cifsiostat(1) reports CIFS statistics.
SDN has become a VERY LOOSELY DEFINED term that means different things to different people.
When we talk to customers / partners, it is really important to understand their place in the eco-systems, and what their vested interests are.
So, first, let’s take a look the basic functional architecture of a Data Center network.
There are the…
Compute Infrastructure – this is where the servers, and the hypervisors and the VMs live
vSwitch – plays an important role in the architecture, especially for Overlay network archtecture, where the Vswitch is the one that handles mapping between VM and virtual networks and various policies.
Underlay Network – this is where the physical network devices and their control plane is. And OpenFlow is the primary focus of the industry in this layer.
Animation – Click
And we can use this 3 architecture layers to categorize the various vendors / players in the SDN space:
NFV – Network Function Virtuation
NV – Netowrk Virtualization
OpenFlow
http://netronome.com/wp-content/uploads/2013/12/Netronome-SDN-NFV-whitepaper_11-13.pdf
From an architectural model perspective, SDN can be integrated into the NFV architecture framework. The orchestration and the management applications can themselves be VNFs running on one or more VMs. In such cases, the OpenFlow protocol is part of the Nf-Vi interface; or it can be part of the Vn-Nf interface.
EVCI Overview
https://www.youtube.com/watch?v=beMsjoA6zdc
Spirent EVCI (Efficient Virtualized Continuous Integration)
Shorter Release Cycles, Predictable Dates and Higher Quality
To be competitive businesses need a robust, quick, flexible development, test and release environment to keep pace with demanding, fast-paced customer requirements. Those able to respond with the right solution command the market.
Doing this is easier said than done.
Most struggle with crippling inefficiencies around:
Inability to manage continuous change and configurations due to poor process visibility
Bandwidth and time to test new and existing features means products are slow to market
Too many resources focused on bugs, CapEx expenses for builds and testing is too high
Inadequate test case coverage and too many customer-found defects erode quality confidence
The solution for many has been migrating from the traditional "Waterfall process" to an Agile framework with Continuous Change Management (CCM) tools overseeing rapid-cycle Continuous Integration (CI), Continuous Test (CT) and Continuous Deployment (CD) phases for a common Integration Branch and Release Branch.
Velocity provisions virtual DUTs, STC
Velocity provisions OpenDaylight Controller with Netconf as southbound interface
iTest or Velocity configures the virtual DUTs and STCv
Send L2/L3 traffic from STCv
Bring the link on the primary path down as shown in the test topology above
Measure the convergence time
Repeat the test with various different trigger mechanisms
Velocity provisions virtual DUTs, STCv
Velocity provisions OpenDaylight Controller with Netconf as southbound interface
iTest or Velocity configures the virtual DUTs and STCv
Configure rate limiting on the first VNF in the chain such that overall end to end throughput is limited
Move the TX STCv to connect directly to the second VNF
Re-run the throughput test
VNF 1 is found to be the bottleneck for end-to-end throughput
Velocity provisions virtual DUTs, STCv
Velocity provisions OpenDaylight Controller with Netconf as southbound interface
iTest or Velocity configures the virtual DUTs and STCv
Run stateful traffic between the two STCv endpoints
Velocity moves the one of the VNFs from Server 1 to Server 2
Measure the VM migration time and the performance impact on QoE of the service provisioned