The Systems Engineering / SRE world has undergone a shift of thinking towards intend driven holistic configuration management a long time ago, but it feels like the majority of network automation solutions are still following the idea of making incremental changes to the routers and switches out there, which at the same time might also be managed manually by operators typing (or copying) magic spells into a CLI. This makes the device configuration the synchronization point and we don’t really have an idea of what this configuration will look like in full without checking back on the device.
I believe we as Network (Automation) Engineers need to follow suit, make the mental shift to the holistic approach, let Perl, Shell and expect scripts be, and bring software engineering methods to network automation. This way we are able to tackle the problems at hand at an abstract level, build solutions which can be reasoned with, tested on their own, and scale to our needs. For the most daunting problem of configuration management this means plugging some of those systems together and building a solution which generates and owns the full device configuration.
Dealing with diverging configuration parts, across the fleet, carefully cleaning up old approaches to configure X, doing incremental changes, and figuring out how to interact with a platform API, a dialect of NETCONF, YANG, etc. would all be from the past –-- wouldn’t that be great?
A recording of this talk can be found at https://media.ccc.de/v/froscon2022-2820-this_is_the_way_-_holistic_network_automation
1. This is the way
Holistic (Network) Automation
FrOSCon 2022
Maximilian Wilhelm
1 / 27
2. Agenda
1. A little bit of history
2. Software Engineering Methods
3. Applying SWE Methods to Network Automation
4. Reality check
5. Q&A
2 / 27
3. Who's who Maximilian Wilhelm
Networker
Open Source Hacker
Fanboy of
(Debian) Linux
(Linux) networking
Occupation:
By day: Network Automation Engineer at Cloudflare
By night: Lead Infrastructure Architect, Freifunk Hochstift
In between: Freelance Infrastructure Architect for hire
Contact
@BarbarossaTM
max@sdn.clinic
3 / 27
5. History When I was a student
First IT job in early 2004
Institute of Mathematics at Paderborn University
More or less separate network
Some hundred clients and servers
Fully automated install + management
Home grown solution
SDeployment
Written in Shell (IIRC)
(Un)install packages + maintain configuration
Owns full configuration files
Helped to find an intruder
Managed to exchange sshd binary
Didn't support Kerberos, so changed the config file
SDeployment changed config file back and service failed to restart
5 / 27
6. History Birth of new automation tools
Intent driven configuration
Describe the desired state
Packages (un)installed
Presence (+content) or absence of a file
Restart services on changes
...
Solution makes sure to reach/keep that state
Timeline of their birth (according to Wikipedia)
2003 bcfg2*
2005 Puppet
2009 Chef
2011 SaltStack
2012 Ansible
2012 Batou*
* 1st GIT commit 6 / 27
7. History State of network configuration today
Broad spectrum
Operators typing or pasting magic spells into a CLI
Deployment helpers called with explicit parameters
Evolution: Expect, Perl, Python scripts
Vendor solutions of different colors and sizes
Up to full vendor lock-in SDN solutions
Home-grown solutions, anywhere on the spectrum
Up to Google size full magic solution
7 / 27
8. History Where does this leave us now?
Wouldn't it be cool ...
To remove all the toil from Network config management!
So Network Engineers can focus on engineering
To have a vendor independent solution?
That can be tested and proven to do the right thing?
That scales well?
That is even Open Source?
But, how would we build that?
8 / 27
10. History
SWE Methods
Software Engineering Methods - Abstraction
Operating Systems
Drivers for hardware components
I/O, Keyboard, Mice, Displays
File systems for data storage
Networking
ISO/OSI or hour glass model
Internet protocols (HTTP, SMTP, ...)
Routing protocols (OSPF, IS-IS, BGP ...)
10 / 27
11. History
SWE Methods
Software Engineering Methods - Testing
Unit tests
Test function/method, class, package with knowledge of the inside
White-box testing
Integration test
Useful for APIs or protocols
Verify BGP implementations work with others
Regression testing
Something broke, we fixed it
Make sure we notice when it breaks again
11 / 27
14. History
SWE Methods
Applying SWE
Methods
Abstraction
Codify network architecture and processes
Topology + rules
Vendor configuration details
One large config file vs. different smaller ones
Different dialects or even languages
Generate vendor neutral config and translate from there
14 / 27
16. History
SWE Methods
Applying SWE
Methods
Topology - Example FFHO
BB/DC-POP 1 (PAD1) DC-POP 2 (PAD2)
DC-POP 3 (remote)
VPN
CR
CSW
CSW
CR
Dark Fiber
Gateway Gateway
Gateway
RF
BB-POP (WBBL-only, w/ APs)
RF
RF
BBR
RF
APs
BB-POP (WBBL + VPN, w/ APs)
BBR
RF APs
POP 4 (PAD3) [planned]
CR Gateway
Dark Fiber [planned]
RF
RF CSW
RF
RF
BB-POP (WBBL-only)
RF
RF
BBR
RF
CR
Internet
FFHO Topologie (schematic)
Legend
Router Layer3 Switch Switch
WiFi PTP link
Access Point
CWDM MUX Gateway KVM
Hypervisor
KVM
KVM
KVM
16 / 27
17. History
SWE Methods
Applying SWE
Methods
Nodes
Represent devices
Attributes
Status, Role
OS
IPs (on interfaces)
Location (rack ... region)
Edges
Represent links
Attributes
Status, Role
Bandwidth, Distance, Priority
...
Abstraction - Topology as a graph
17 / 27
18. History
SWE Methods
Applying SWE
Methods
Abstraction - Rules
What would have an operator configured manually?*
*If they did the right thing™
Examples, based on FFHO infrastructure
Internal routing protocols (OSPS + iBGP)
Learning of edge prefixes
Automagically generated firewall rules (CoPP)
...
18 / 27
19. History
SWE Methods
Applying SWE
Methods
Software Engineering Methods - Pipeline
Input
IRM / DCIM + IPAM
Any solution which offers an API, e.g. NetBox or Nautobot
Any local database(s) holding business relevant information
E.g. subscribers / services
Process(es)
Controller which gathers topology information and applies rules
Generate vendor neutral configuration
Translate configuration into required vendor configuration(s)
Apply the config as an atomic operation (if possible)
Micro service approach beneficial
Output
The complete generated configuration
Vendor independent or vendor specific, depending on POV
19 / 27
20. History
SWE Methods
Applying SWE
Methods
Controller generates the vendor
independent config
Rules could be part of code or
textual
Translator generates vendor specific
language from that
Multiple config files for Linux
routers
Single config file for (e.g. Cisco)
switches
Translator also (can) apply config
SaltStack, Ansible, ...
Home grown
Software Engineering Methods - Pipeline
20 / 27
21. History
SWE Methods
Applying SWE
Methods
Software Engineering Methods - Testing
Unit tests
Controller can be tested without touching production network
Testing Translator can be harder
Integration tests
New controller versions can be tested against live data source
Compare result with currently running production controller
Does it generate the config we expect?
No risk of impacting infrastructure
Translator can be tested offline or in a lab
Input: Static generic config from generator
Apply config to lab device (VM?)
Verify device config against expected result
21 / 27
24. History
SWE Methods
Applying SWE
Methods
Reality check
Reality check - Lessons learned
Data stored in pillar only usable inside Salt
Limits flexibility a lot
Evolution
From input in pillar to NetBox
From logic in Jinja templates to Python modules inside Salt
Abstract NetBox data structures away with NACL
Move more and more logic into NACL (e.g. iBGP mesh computation)
Use Salt as translator instead of controller
24 / 27
26. History
SWE Methods
Applying SWE
Methods
Reality check
Q&A
Questions & answers
Why not generate the config within Salt or Ansible?
How do you test that? Automatically?
Limited to Python and the environment of the solution
Why not buy vendor solution here?
Because it's not vendor independent
What do you do if it can't do X or is discontinued?
Why not use NETCONF/YANG?
Because it's not vendor independent enough
And it's for iterative config changes
Yes NETCONF can to complete config replace, but what's the point?
26 / 27