Anúncio
Anúncio

Mais conteúdo relacionado

Similar a Global Network Advancement Group Next Generation Network-Integrated Systems(20)

Mais de Larry Smarr(20)

Anúncio

Global Network Advancement Group Next Generation Network-Integrated Systems

  1. Global Network Advancement Group Next Generation Network-Integrated Systems 4th National Research Platform Workshop: NRP, GRP, GNA-G Partnership Harvey Newman, Caltech, GNA DIS WG Chair February 9, 2023
  2. Higgs Boson Couples to Mass Ten Year Anniversary of the Higgs Boson Discovery: July 4 2022 Search for Higgs Boson Pairs A portrait of the Higgs boson by the CMS experiment ten years after the discovery | Nature
  3. 3 10th Anniversary of the Higgs Boson Discovery and Beyond; 75 Years of Exploration ! Advanced Networks Were Essential to Higgs Discovery and Every Ph.D Thesis; They will be Essential to All Future Discoveries •NOTE: ~95% of Data Still to be Taken •Greater Intensity: Upgraded detectors for more complex events •To 5X Data Taking Rate in 2029-40 Englert 2013 Nobel Prize Higgs 48 Year Search; 75 Year Exploration Theory (1964): 1950s – 1970s; LHC + Experiments Concept: 1984 Construction: 2001; Operation: 2009 Run1: Higgs Boson Discovery 2012 Run2 and Going Forward: Precision Measurements and BSM Exploration: 2013 - 2042 
  4. Global Data Flow: A Worldwide System Conceived by Caltech and UCSD. First Tier2 in 1999-2001 Tier 1 Tier2 Center Online System CERN Center 100 PBs of Disk; Tape Robot Chicago Institute Institute Institute Institute Workstations ~1000 MBytes/sec 10 to 100 Gbps Physics data cache ~PByte/sec N X100 Gbps Tier2 Center Tier2 Center Tier2 Center 10 to 100 Gbps Tier 0 Tier 3 Tier 4 Tier 2 Experiment London Paris Taipei CACR 13 Tier1, 170 Tier2 + 300 Tier3 Centers: 1k to N X 10k Cores  Tier0: Real Time Data Processing Tier1 and Tier2: Data and Simulation Production Processing Tier2 and Tier3: Data Analysis Increased Use as a Cloud Resource (Any Job Anywhere) Increased “Elastic” Use of Additional HPC and Cloud Resources A Global Dynamic System: Fertile Ground for Control with ML Bologna
  5. Core of LHC Networking LHCOPN, LHCONE, GEANT, ESnet, Internet2, CENIC… + NRENs in Europe, Asia, Latin America, Au/NZ; US State Networks LHCOPN: Simple & Reliable Tier0+1 Ops GEANT Internet2 ESnet (with EEX) CENIC and NRP LHCONE VRF: 170 Tier2s ++
  6. + NRENs in Europe, Asia, Latin America, Au/NZ; US State Networks LHCOPN: Simple & Reliable Tier0+1 Ops GEANT Internet2 ESnet (with EEX) CENIC and NRP LHCONE VRF: 170 Tier2s CENIC and Pacific Wave: A “Regional” Network for California with Global Reach and Vision East and West
  7. WLCG Transfers Dashboard: 3/2017 to 3/2022 HL LHC “10%” Data Challenge in October 2021 exceeded 100 Gbytes/sec for >1 Day The “30%” Challenge Scheduled for 2024 will be a real challenge General WLCG Traffic is again at or above the pre-pandemic level https://monit-grafana.cern.ch/d/AfdonIvGk/wlcg-transfers?orgId=20
  8. Towards a Computing Model for the HL LHC Era Challenges: Capacity in the Core and at the Edges  Programs such as the LHC have experienced rapid exponential traffic growth, at the level of 40-60% per year  At the January 2020 LHCONE/LHCOPN meeting at CERN, CMS and ATLAS expressed the need for Terabit/sec links on major routes, by the start of the HL-LHC in ~2029  This is projected to outstrip the affordable capacity  The needs are further specified in “blueprint” Requirements documents by US CMS and US ATLAS, submitted to the ESnet Requirements Review in August 2020, and captured in a comprehensive DOE Requirements Review report for HEP: https://escholarship.org/uc/item/78j3c9v4  Three areas of particular capacity-concern by 2028-9 were identified: (1) Exceeding the capacity across oceans, notably the Atlantic, served by the Advanced North Atlantic (ANA) network consortium (2) Tier2 centers at universities requiring 100G 24 X 7 X 365 average throughput with sustained 400G bursts (a petabyte in a shift), and (3) Terabit/sec links to labs and HPC centers (and edge systems) to support multi-petabyte transactions in hours rather than days 8
  9. Mission: To Support global research and education using the technology, infrastructures and investments of its participants The GNA-G exists to bring together researchers, National Research and Education Networks (NRENs), Global eXchange Point (GXP) operators, regionals and other R&E providers, in developing a common global infrastructure to support the needs. https://www.gna-g.net/
  10. GREN: Collaboration on the intercontinental transmission layer Telemetry Data Intensive Science AutoGOLE/SENSE GREN Mapping GXP Architectures & Services Connecting Offshore Students Research & Development Operational Link consortia APR ANA … AER AmLight GNA Architecture v2.0 GNA-G Leadership Team Routing Anomalies Network Automation GNA-G Executive Liaison GNA-G participant CEOs etc Global NREN CEO Forum Securing the GREN Strategic Security Focus Group GNA-G overview and Vision
  11. The GNA-G Data Intensive Sciences WG  Principal aims of the GNA-G DIS WG: (1) To meet the needs and address the challenges faced by major data intensive science programs  In a manner consistent and compatible with support for the needs of individuals and smaller groups in the at large A&R communities (2) To provide a forum for discussion, a framework and shared tools for short and longer term developments meeting the program and group needs  To develop a persistent global testbed as a platform, to foster ongoing developments among the science and network communities  While sharing and advancing the (new) concepts, tools & systems needed  Members of the WG partner in joint deployments and/or developments of generally useful tools and systems that help operate and manage R&E networks with limited resources across national and regional boundaries  A special focus of the group is to address the growing demand for  Network-integrated workflows  Comprehensive cross-institution data management  Automation, and  Federated infrastructures encompassing networking, compute, and storage  Working Closely with the AutoGOLE/SENSE WG, NRP and GRP 1 1 Charter: https://www.dropbox.com/s/4my5mjl8xd8a3y9/GNA-G_DataIntensiveSciencesWGCharter.docx?dl=0
  12. The GNA-G Data Intensive Sciences WG  Mission: Meet the challenges of globally distributed data and computation faced by the major science programs  Coordinate provisioning the feasible capacity across a global footprint, and enable best use of the infrastructure:  While meeting the needs of the participating groups, large and small  In a manner Compatible and Consistent with use by the at-large A&R communities  Members: Alberto Santoro, Azher Mughal, Bijan Jabbari, Brian Yang, Buseung Cho, Caio Costa, Carlos Antonio Ruggiero, Carlyn Ann-Lee, Chin Guok, Ciprian Popoviciu, Dale Carder, David Lange, David Wilde, Dima Mishin, Edoardo Martelli, Eduardo Revoredo, Eli Dart, Eoin Kenney, Frank Wuerthwein, Frederic Loui, Harvey Newman, Heidi Morgan, Iara Machado, Inder Monga, Jeferson Souza, Jensen Zhang, Jeonghoon Moon, Jeronimo Bezerra, Jerry Sobieski, Joao Eduardo Ferreira, Joe Mambretti, John Graham, John Hess, John Macauley, Julio Ibarra, Justas Balcas, Kai Gao, Karl Newell, Kaushik De, Kevin Sale, Lars Fischer, Liang Zhang, Mahdi Solemani, Maria Del Carmen Misa Moreira, Marcos Schwarz, Mariam Kiran, Matt Zekauskas, Michael Stanton, Mike Hildreth, Mike Simpson, Ney Lemke, Phil Demar, Raimondas Sirvinskas, Richard Hughes-Jones, Rogerio Iope, Sergio Novaes, Shawn McKee, Siju Mammen, Susanne Naegele- Jackson, Tom de Fanti, Tom Hutton, Tom Lehman, William Johnston, Xi Yang, Y. Richard Yang  Participating Organizations/Projects:  ESnet, Nordunet, SURFnet, AARNet, AmLight, KISTI, SANReN, GEANT, RNP, CERN, Internet2, CENIC/Pacific Wave, StarLight, NetherLight, Southern Light, Pacific Research Platform, FABRIC, RENATER, ATLAS, CMS, VRO, SKAO, OSG, Caltech, UCSD, Yale, FIU, UERJ, GridUNESP, Fermilab, Michigan, UT Arlington, George Mason, East Carolina, KAUST  Working Closely with the AutoGOLE/SENSE WG  Meets Weekly or Bi-weekly; All are welcome to join. 12 Charter: https://www.dropbox.com/s/4my5mjl8xd8a3y9/GNA-G_DataIntensiveSciencesWGCharter.docx?dl=0
  13. AutoGOLE / SENSE Working Group  Worldwide collaboration of open exchange points and R&E networks interconnected to deliver network services end-to-end in a fully automated way. NSI for network connections, SENSE for integration of End Systems and Domain Science Workflow facing APIs.  Key Objective:  The AutoGOLE Infrastructure should be persistent and reliable, to allow most of the time to be spent on experiments and research.  Key Work areas:  Control Plane Monitoring: Prometheus based, deployment now underway  Data Plane Verification and Troubleshooting Service: Study and design group formed  AutoGOLE related software: Ongoing enhancements to facilitate deployment and maintenance (Kubernetes, Docker based systems)  Experiment, Research, Multiple Activity, Use Case support: Including NOTED, Gradient Graph, P4 Topologies, Named Data Networking (NDN), Data Transfer Systems integration and testing.  WG information https://www.gna-g.net/join-working-group/autogole-sense
  14. 14 Towards a Next Generation Network-Integrated System For Data Intensive Sciences The Global Network Advancement Group The AutoGOLE/SENSE and Data Intensive Sciences WGs Partnering with GRP and NRP
  15. GNA-G AutoGOLE/SENSE WG Global Persistent Testbed Software Driven Network OSes; Programmability at the Edges and in the Core
  16. P4 + SONIC Programmable Global PersistentTestbed 19 Active GNA-G/RARE P4, SONIC Testbed Sites • Caltech, Pasadena-US: 4 FreeRtr/P4 + SONIC • CERN, Geneva-CH: FreeRtr/P4 • FIU, Miami-US: FreeRtr/P4 • GEANT, Amsterdam-NL: FreeRtr/P4 • GEANT, Budapest-HU: FreeRtr/P4 • GEANT, Frankfurt-DE: FreeRtr/P4 • GEANT, Paris-FR: FreeRtr/DPDK • GEANT, Poznan-PL: FreeRtr/P4 • GEANT, Prague-CZ: FreeRtr/DPDK • RENATER, Paris-FR: FreeRtr/P4 • SouthernLight (FIU/RedClara/Rednesp/RNP), São Paulo-BR: FreeRtr/P4 • StarLight, Chicago-US: FreeRtr/P4 • SWITCH, Geneva-CH: FreeRtr/P4 • TCD, Dublin-IE: FreeRtr/P4 • Tennessee Tech: FreeRtr/P4 + 9 Expected sites (by SC22): • HEAnet, Dublin-IE: FreeRtr/P4 • JISC, London-UK: FreeRtr/P4 • KAUST, Saudi Arabia: FreeRtr/DPDK • KISTI, South Korea: SONiC/P4 • RNP, Rio de Janeiro-BR: FreeRtr/P4, SONiC/P4 • SC22 Caltech Booth, Dallas-US: FreeRtr/P4 • UCSD, San Diego-US: SONiC/P4 • UFES, Vitória-BR: 2x FreeRtr/P4 • UMd, College Park, Maryland-US: FreeRtr/P4
  17. GEANT (Now Global) P4 Lab and Dataplane of P4 Programmable Switches A new worldwide platform for  New agile and flexible feature development  New use case development corresponding to research programs’ requirements  Multiple research network overlay slices A global playing field for  Next generation network monitoring tools and systems  Development of fully automated network deployment  New network operations paradigm development VISION: Federate and integrate multiple testbeds and toolsets such as AutoGOLE / SENSE Frederic Loui, Casaba Mate, Marcos Schwarz et al.
  18. SC15-22: SDN Next Generation Terabit/sec Ecosystem for Exascale Science SC16+: Consistent Operations with Agile Feedback Major Science Flow Classes Up to High Water Marks supercomputing.caltech.edu Tbps Rings for SC18-22: Caltech, Ciena, Scinet, StarLight + Many HEP, Network, Vendor Partners 45 SDN-driven flow steering, load balancing, site orchestration Over Terabit/sec Global Networks LHC at SC15: Asynchronous Stageout (ASO) with Caltech’s SDN Controller Preview PetaByte Transfers to/ from Site Edges of Exascale Facilities With 100G -1000G DTNs
  19. Global Petascale to Exascale Workflows for Data Intensive Sciences Accelerated by Next Generation Programmable Network Architectures and Machine Learning Applications  A Vast Partnership of Science and Computer Science Teams, R&E Networks and R&D Projects  Convened by the GNA-G Data Intensive Sciences WG  Mission  Meeting the challenges faced by leading edge data intensive experimental programs in high energy physics, astrophysics, genomics and other fields of data intensive science  Clearing the path to the next round of discoveries  Demonstrating a wide range of the latest advances in:  Software defined and Terabit/sec networks  Intelligent global operations and monitoring systems  Workflow optimization methodologies with real time analytics  State of the art long distance data transfer methods and tools, local and metro optical networks and server designs  Emerging technologies and concepts in programmable networks and global-scale distributed system Network Research Exhibition NRE-19. Abstract: https://www.dropbox.com/s/qcm41g7f7etjxvy/SC22_NRE_GlobalPetascaleWorkflows_V8072022.docx?dl=0
  20. Global Petascale to Exascale Workflows Cornerstone system components and concepts (I)  Integrated operations and orchestrated management of resources: interworking with and advancing the site (Site-RM) and network resource managers (Network-RM) developed in the SENSE program.  An ontological model-driven framework with integration of an analytics engine, API and workflow orchestrator extending work in the SENSE project, enhanced by efficient multi-domain resource state abstractions and discovery mechanisms.  Fine-grained end-to-end monitoring, data collection and analytics, supporting applications with path selection + load balancing mechanisms optimized by machine learning.  The integration of Qualcomm Technology’s GradientGraph (G2) with 5G / OpenALTO / SRv6. These will be used to demonstrate how applications including XR, auto/V2X, and IoT can benefit from the intelligent routing, rate limiting, and service placement decisions computed by G2.  Demonstrating how the integration of G2 with science networks, through the integration of the LHC Rucio / FTS data management system, AutoGOLE / SENSE virtual circuit and orchestration services / and OpenALTO monitoring may be used towards optimizing large-scale data transfers, sharing data among CERN and scientists at sites across the globe.
  21. Global Petascale to Exascale Workflows for Data Intensive Sciences  Advances Embedded and Interoperate within a ‘composable’ architecture of subsystems, components and interfaces, organized into several areas:  Visibility: Monitoring and information tracking and management including IETF ALTO/OpenALTO, BGP-LS, sFlow/NetFlow, Perfsonar, Traceroute, Qualcomm Gradient Graph congestion information, Kubernetes statistics, Prometheus, P4/Inband telemetry  Intelligence: Stateful decisions using composable metrics (policy, priority, network- and site-state, SLA constraints, responses to ‘events’ at sites and in the networks, ...), using NetPredict, Hecate, RL-G2, Yale Bilevel optimization, Coral, Elastiflow/Elastic Stack  Controllability: SENSE/OpenNSA/AutoGOLE, P4/PINS, segment routing with SRv6 and/or PolKA, BGP/PCEP  Network OSes and Tools: GEANT RARE/freeRtr, SONIC, Calico VPP, Bstruct-Mininet environment, ...  Orchestration: SENSE, Kubernetes (+k8s namespace), dedicated code and APIs for interoperation and progressive integration
  22. Caltech, StarLight, NRL and Partners at SC22 Microcosm: Creating the Future of SCinet and of Networks for Science
  23. GNA-G and its Working Groups Worldwide Partnerships at SC22 and Beyond
  24. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (Cont’d)  400 Gbps Next Generation Networks  High-Performance Routing of Science Network Traffic  Creation of an overlay network with PolKA tunnels forming virtual circuits  5G/Edge Computing Application Performance Optimization  Integration of OpenALTO, SRv6 and Qualcomm Technologies’ GradientGraph  N-DISE: Named Data Networking for Data Intensive Experiments  Coral: Scalable Data Plane Checking via Distributed, On-Device Verification  High performance networking with the Bella Link and the Sao Paulo Backbone  AmLight Express and Protect (AmLight-EXP)  Global Research Platform (GRP)  Software Defined International Open Exchanges (SDXs) For further Information on NRE-19 Elements and Goals: See the Next Slides
  25. NSF CC* Program: Integration and Innovation SCIENTIFIC and BROADER IMPACT ▪ Lay groundwork for an NDN-based data distribution and access system for data-intensive science fields ▪ Benefit user community through lowered costs, faster data access and standardized naming structures ▪ Engage next generation of scientists in emerging concepts of future Internet architectures for data intensive applications ▪ Advance, extend and test the NDN paradigm to encompass the most data intensive research applications of global extent SOLUTIONS + Deliverables CHALLENGES ▪ LHC program in HEP is world’s largest data intensive application: handling One Exabyte by 2019 at hundreds of sites ▪ Global data distribution, processing, access, analysis; large but limited computing, storage, network resources APPROACH ▪ Use Named Data Networking (NDN) to redesign LHC HEP network; optimize workflow N-DISE: NDN for Data Intensive Science Experiments TEAM ▪ Northeastern (PI: Edmund Yeh), Caltech (PI: Harvey Newman) UCLA (PI: Jason Cong), Tennessee Tech (PI: Susmit Shannigrahi) ▪ In partnership with NIST, other LHC sites and the NDN project team ▪ Simultaneously optimize caching (“hot” datasets), forwarding, and congestion control in both the network core and site edges ▪ Development of naming scheme and attributes for fast access and efficient communication in HEP and other fields
  26.  Feasibility of NDN-based data distribution system for LHC first demonstrated at SC 18 (Dallas)  SANDIE demo at SC 19 (Denver) showed improved throughput and delay performance  N-DISE demo at SC 21 (done remotely) showed further improvements, ICN ‘22 paper introduced the project to the literature  SC22  Jointly optimized caching and forwarding algorithms developed by Northeastern  High-speed NDN-DPDK forwarder developed by NIST  NDN-based filesystem plugin for XRootD developed by Caltech  FPGA acceleration of NDN-DPDK forwarder by UCLA  Deploying genomics data lakes using K8s over NDN and name-based compute placement to K8 clusters by Tennessee Tech  Goals: (i) rapid and reliable delivery of LHC data over transcontinental layer-2 demo testbed, (ii) demonstration of Kubernetes integration for genomics applications N-DISE
  27. Caltech Campus A New Generation Persistent 400G/100G Super-DMZ: CENIC, Pacific Wave, ESnet, Internet2, Caltech, UCSD, StarLight ++ SEA 100GE SNVL SDSC LA Pacific Wave Seattle 1-1 Pacific Wave Sunnyvale 2-1 NCS/Ciena Transponders LAX Agg10 Riser Panel NCS/Ciena Transponders Riser Panel Caltech IMSS Waveserver Ai 200G 200G Caltech IMSS Waveserver Ai CENIC LA 818 W 7th 6th Floor CENIC LA 818 W 7th 10th Floor Cisco NCS 1K4 Arista 7060 DX4 400GE & 100GE Internet2 818 W 7th 10th Floor Starlight CENIC/ Pacific Wave I2 Optical I2 Packet To StarLight To SC22 Dallas 400G 400G ESnet6 Fabric ESnet/Fabric 818 W 7th 10th Floor Pending Connections to CENIC/PacWave Pacific Wave LosA 2-1 and Beyond To Juniper QFX after SC22
  28. Next Generation Network-Integrated System  Top Line Message: In order to meet the needs, we need to transition to a new dynamic and adaptive software-driven system, which  Coordinates worldwide networks as a first class resource along with computing and storage, across multiple domains  Simultaneously supports the LHC experiments, other major DIS programs and the larger worldwide academic and research community  Systems design approach: A global dynamic fabric that flexibly allocates, balances and conserves the available network resources Negotiating with site systems that aim to accelerate workflow; Use of ML Builds on ongoing R&D projects: from regional caches/data lakes to intelligent control and data planes to ML-based optimization [E.g. SENSE/AutoGOLE, NOTED, ESNet HT, GEANT/RARE, AmLight, Fabric, Bridges; NetPredict, DeepRoute, ALTO, PolKA ...]  A key milestone: integration of SENSE + network services with FTS & Rucio  We are also leveraging the worldwide move towards a fully programmable ecosystem of networks and end-systems (P4, PINS; SRv6; PolKA), plus operations platforms (OSG, NRP; global SENSE and P4 Testbeds)  The GNA-G and its Working Groups, The LHC experiments together with the WLCG and the worldwide R&E network community, are the key players  Together with leaders and teams from: LBNF/DUNE, VRO, SKA
  29. Global Petascale to Exascale Workflows for Data Intensive Sciences  Development Trajectory: Parallel developments + mission-driven progressive interfacing and system-level integration  Architecture: Data Center Analogue  Classes of “Work” (work = transfers, or overall workflow), defined by task parameters and/or priority and policy  Adjust rate of progress in each class to respond to network or site state changes, and “events”  Moderate/balance the rates among the classes to optimize a multivariate objective function with constraints  Overarching Concept: Consistent Network Operations:  Stable load balanced high throughput workflows cross optimally chosen network paths  Provided by autonomous site-resident services dynamically interacting with network-resident services  Up to preset or fliexible high water marks to accommodate other traffic  Responding to (or negotiating with) site demands from the science programs’ principal data distribution and management systems
  30. 31 Towards a Next Generation Network-Integrated System For Data Intensive Sciences The Global Network Advancement Group The AutoGOLE/SENSE and Data Intensive Sciences WGs Partnering with GRP and NRP
  31. Extra Slides Follow 32
  32. Steps to Arrive at a Fully Functional System by 2028 the Data Challenge Perspective  Three Types of Challenges 1. Functionality Challenge : Where we establish the functionality we want in our software stack, and do so incrementally over time 2. Software Scalability Challenge: Where we take the products that passed the previous challenge, and exercise them at full scale but not on the final hardware infrastructure 3. End-to-end Systems Challenge: On the actual hardware; can only be done once the actual hardware systems are in place.  Targets are 2022-24 for 1 and 2 (or 2024-25 if not all components are ready earlier); 2026-2027 for 3  Remark: it’s conceivable, maybe even likely that it takes multiple attempts to achieve sustained performance at scale with all of the new software we need, with the functionality we want.  + Scaling Challenges: Demonstrate capability to fill ~50% full bandwidth required in the minimal scenario with production-like traffic: Storage to storage, using third party copy protocols and data management services used in production: 2021: 10%; 2023-4: 30%; 2025-6: 60%; 2028: 100% 33 F. Wuerthwein, UCSD/SDSC
  33. Rednesp and RNP: Expanding Capacity Among Latin America, US, Europe and Africa  Rednesp (formerly ANSP) has 3 links to the USA, connecting Sao Paulo to Florida.  Atlantic 100G link: direct São Paulo- Miami  Pacific 100G link: São Paulo to Santiago (Chile), from there to Panama, San Juan and Miami.  200 Gbps link: São Paulo to Florida through Fortaleza, on the Monet cable (Angola cables).  RNP also a 200 Gbps link using the Monet cable:  Total 600 Gbps capacity between Brazil and the USA.  Transatlantic Links: The Bella link between São Paulo and Sines in Portugal, and a link connecting São Paulo, Angola and South Africa.  The deployment of the "Backbone SP" has just started. Its main goal is to interconnect most universities in Sao Paulo (state) with 100 gbps links. • The first link connects SP4 to UNIFESP (Federal University at São Paulo). Next, links should connect UNESP (State University of São Paulo), USP (University of São Paulo), UNICAMP
  34. BRIDGES: “Binding Research Infrastructures for the Deployment of Global Experimental Science” Across the Atlantic  George Mason and East Carolina Universities (NSF IRNC Program)  High Level View of the Infrastructure and Collaborators:  Virtualization as an Architectural (systems-level) concept, targeting:  Deterministic, predictable performance  Agile, customizable, integrated virtual resource model  Operates as a “fully virtualized” services environment: Users (and Operations personnel) can interact with the BRIDGES services through an interactive web portal, or software agents may interact via a programmatic API to automate resource provisioning and orchestration.
  35. (APONet): Closing the Global Ring East and West KREONet2: Closing the Global Ring East and West
  36. BRIDGES: “Binding Research Infrastructures for the Deployment of Global Experimental Science” Across the Atlantic
  37. AER Update - Aug 2022 ● Since the AER MoU, KAUST is coordinating with REN partners deployment of sharing spare capacity ● KAUST is supporting the following partners by offering point-to-point circuits for submarine cable backup paths: ○ AARnet ○ GÉANT ○ NetherLight ○ NII/SINET ○ SingAREN ● The SC22 NRE Demonstrations will also be supported by KAUST closing the ring from Amsterdam to Singapore and back to the US ○ SC22 NRE Seattle Los Angeles Chicag o SC22 NRE Demos (Caltech) KAUST and the AsiaPacific Oceana Network (APONet): Closing the Global Ring East and West
  38. 39 Gua m SingAR EN Hong Kong Daejeo n Los Angeles Seattl e Chicag o Amsterda m Londo n Hawa ii Jedda h Toky o KAU ST Singapo re Cenic KISTI StarLigh t NII/SINE T NetherLig ht GÉANT Open Exchange SC22 Caltech NRE Demonstrations * The dotted paths in the diagram are the ones to be used for the SC22 TransP AC KAU ST SingARE N NII / SINET KISTI Univ. of Hawaii KAUST SC22 Demonstrations (NRE-19)
  39. SC22 NREs: AutoGOLE / SENSE and NOTED (Draft) Bill Johnston 9/23/22 SC21: Global footprint. Multiple 400G Optical Wide Area Links SC22: Global footprint. Terabit/sec Triangle Starlight – McLean – Dallas; 400G to LA; 4 X 100G to Caltech and UCSD/SDSC; 3 X 400G to the Caltech Booth
  40. Partner NREs at SC22 NRE-001 Edmund Yeh eyeh@ece.neu.edu N-DISE: NDN for Data Intensive Science Experiments NRE-004 Joe Mambretti j-mambretti@northwestern.edu 1.2 Tbps Services WAN Services: Architecture, Technology and Control Systems NRE-005 Joe Mambretti j-mambretti@northwestern.edu 400 Gbps E2E WAN Services: Architecture, Technology and Control Systems NRE-007 Edoardo Martelli edoardo.martelli@cern.ch LHC Networking And NOTED NRE-008 Joe Mambretti j-mambretti@northwestern.edu IRNC Software Defined Exchange (SDX) Multi-Services for Petascale Science NRE-009 Jim Chen jim-chen@northwestern.edu High Speed Network with International P4 Experimental Networks for The Global Research Platform and Other Research Platforms NRE-010 Magnos Martinello magnos.martinello@ufes.br Demonstrating PolKA Routing Approach to Support Traffic Engineering for Data-intensive Science NRE-011 Qiao Xiang xiangq27@gmail.com Coral: Fast Data Plane Verification for Large-Scale Science Networks via Distributed, On-Device Verification NRE-013 Tom Lehman tlehman@es.net AutoGOLE/SENSE: End-to-End Network Services and Workflow Integration NRE-015 Tom Lehman tlehman@es.net SENSE and Rucio/FTS/XRootD Interoperation NRE-016 Marcos Schwarz marcos.schwarz@rnp.br Programmable Networking with P4, GEANT RARE/freeRtr and SONIC/PINS The NRE demonstrations hosted at or partnering with the Caltech Booth 2820:
  41. Global Petascale to Exascale Workflows Cornerstone system components and concepts (II)  Adapting NDN for data intensive sciences including advanced cache design and algorithms and parallel code development and methods for fast and efficient access over a global testbed, leveraging the experience in the NDN for Data Intensive Science Experiments (N-DISE) project  Demonstrating a paragon network at many sites equipped with P4 programmable devices, including Tofino and Tofino2-based switches, Smart NICs and Xilinx FPGA- based network interfaces providing packet-by- packet inspection, agile flow management and state tracking, and real-time decisions ● High throughput platform demonstrations in support of workflows for the science programs. Including reference designs of NVMe server systems to match a 400G network core, as well as servers with multi-GPUs and programmable smart NICs with FPGAs. ● Integration of edge-focused extreme telemetry data (from P4 switches, smart edge devices and end hosts) and end facility/application caching stats and other metrics, to facilitate automated decision-making processes. ● Development of dynamic regional caches or “data lakes” that treat nearby as a unified data resource, building on the successful petabyte cache currently in operation between Caltech and UCSD and in ESNet based on the XRootD federated access protocol; extension of the cache concept to more remote sites such as Fermilab, Nebraska and Vanderbilt.
  42. Global Petascale to Exascale Workflows Cornerstone system components and concepts (III) ● Creation of an overlay network with PolKA tunnels forming virtual circuits, integrating persistent resources from the GNA-G AutoGOLE/SENSE and GEANT RARE testbeds, to validate the PolKA source routing mechanisms using 100G+ transfers of science data. ● Underlay congestion will be detected by tunnel monitoring and signaled to the overlay so that the traffic is steered away from congested tunnels to other paths. ● Comparisons between segment routing and PolKA regarding controllability and performance metrics are also planned. From the application's point of view. ● PolKA full deployment enables the extreme traffic engineering demands of data- intensive sciences to be met, by offering a new range of network functionalities such as: multipath routing, in-network telemetry and proof-of-transit with path attributes to support higher level stateful traffic engineering decisions. ● Network traffic prediction and engineering optimizations using the latest graph neural network and other emerging deep learning methods, developed by ESnet’s Hecate /DeepRoute project.
  43.  LHC:  End to end workflows for large scale data distribution and analysis in support of the CMS experiment’s LHC workflow among Caltech, UCSD, LBL + Fermilab, Nebraska, Vanderbilt, and GridUNESP (Sao Paulo) including automated flow steering, negotiation and DTN autoconfiguration; bursting of some of these workflows to the NERSC HPC facility and the cloud; use of both edge and in-network caches to increase data access and processing efficiency.  An ontological model-driven framework with integration of an analytics engine, API and workflow orchestrator extending work in the SENSE project, enhanced by efficient multi- domain resource state abstractions and discovery mechanisms.  SENSE/AutoGOLE:  The GNA-G SENSE/AutoGOLE WG demonstration will present key technologies, methods and a system of dynamic Layer 2 and Layer 3 virtual circuit services to meet the challenges and address the requirements of the largest data intensive science programs, including the Large Hadron Collider (LHC) the Vera Rubin Observatory and programs challenges and programs in many other disciplines. The services are designed to support multiple petabyte transactions across a global footprint, represented by a persistent testbed spanning the US, Europe, Asia Pacific and Latin American regions.  Global Ring and KAUST: These demonstrations will also showcase the power of collaboration in the global research and education network (REN) community. The demo will have KAUST as a collaborator in the Asia-Pacific Global Ring (AER [*]), closing the global ring by interconnecting Amsterdam to Singapore, and then onto Los Angeles with support from partners including SingaREN, JGN-X and APOnet.  [*] See Fast and stable Network Ring between Asia-Pacific and Europe | AARNet https://www.aarnet.edu.au/fast-and-stable-network-ring-between-asia-pacific-and-europe/ Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (I)
  44. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (II)  400 Gbps Next Generation Networks  Science research collaborations continuing to prepare for increased network capacity, e.g., transitioning from 100 Gbps paths to 400 Gbps, 800 Gbps and beyond. An international collaboration led by the StarLight/ Caltech/ UCSD/ NRP/GRP/SCinet consortium, with many will demonstrate the architecture, services and techniques to enable efficient management of high volume science data streams within multi-hundred Gbps channels.  Global Research Platform (GRP):  An international collaboration will demonstrate services and capabilities of the prototype Global Research Platform, an international Science DMZ, optimized for large scale world-wide science projects, especially those based on instruments producing high volumes of data.  Software Defined Exchanges (SDXs):  Several international open exchanges that are developing capabilities for Software Defined Exchanges (SDXs) will showcase the capabilities of these facilities for providing highly programmable capabilities for international science data transport.  AmLight Express and Protect (AmLight-EXP)  In support of the SENSE/AutoGOLE demonstrations and LHC-related use cases will be shown, in association with high-throughput low latency experiments, and demonstrations of auto-recovery from network events, using optical spectrum on the Monet submarine cable, and its 100G ring network that interconnects the research and education communities in the U.S. and South America.
  45. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (III)  5G/Edge Computing Application Performance Optimization  UCSD, the Pacific Research Platform, Caltech, Yale and Qualcomm Technologies will attempt to demonstrate the first end-to-end integrated traffic optimization framework based on bottleneck structure analysis for 5G applications. This intended demonstration will focus on showing the integration of Qualcomm technology’s GradientGraph with the IETF ALTO open standard, to support the optimization of edge computing applications such as XR, holographic telepresence, and vehicle networks. Holodecks at UCSD and NYU will be interconnected across the CENIC and NYSERNet regional networks via a transcontinental link.  High-Performance Routing of Science Network Traffic  UCSD, the Pacific Research Platform, Caltech, Yale and Qualcomm Technologies will attempt to demonstrate the first end-to-end integrated traffic optimization framework based on bottleneck structure analysis for 5G applications. This intended demonstration will focus on showing the integration of Qualcomm technology’s GradientGraph with the IETF ALTO open standard, to support the optimization of edge computing applications such as XR, holographic telepresence, and vehicle networks. Holodecks at UCSD and NYU will be interconnected across the CENIC and NYSERNet regional networks via a transcontinental link.  Integration of OpenALTO, SRv6 and Qualcomm Technologies’ GradientGraph  We intend to demonstrate the integration of OpenALTO, SRv6 and Qualcomm Technologies’ GradientGraph, showing how applications including XR, Auto/V2X, and holographic telepresence can benefit from the intelligent routing, rate limiting, and service placement decisions computed by the GradientGraph platform. We also intend to demonstrate the integration of OpenALTO and GradientGraph with science networks, Rucio, FTS, AutoGOLE, SENSE and OpenALTO towards optimizing large-scale data transfers from the Large Hadron Collider at CERN to scientists across the globe.
  46. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (IV)  SENSE: The Software-defined network for End-to-end Networked Science at Exascale  The SENSE project is building smart network services to accelerate scientific discovery in the era of ‘big data’ driven by Exascale, cloud computing, machine learning and AI. The SENSE SC22 demonstration showcases a comprehensive approach to request and provision end- to- end network services across domains that combines deployment of infrastructure across multiple labs/campuses, SC booths and WAN with a focus on usability, performance and resilience through:  Priority QoS for SENSE enabled flows  Multi-point and point-to-point services  Auto-provisioning of network devices and Data Transfer Nodes (DTNs);  Intent-based, interactive, real time application interfaces providing intuitive access to intelligent SDN services for Virtual Organization (VO) services and managers;  Policy-guided end-to-end orchestration of network resources, coordinated with the science programs' systems, to enable real time orchestration of computing and storage resources.  Real time network measurement, analytics and feedback to provide the foundation for full lifecycle status, problem resolution, resilience and coordination between the SENSE intelligent network services, and the science programs' system services.
  47. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (V)  N-DISE: Named Data Networking for Data Intensive Science Experiments  The NDN for Data Intensive Science Experiments (N-DISE) project aims to accelerate the pace of breakthroughs and innovations in data-intensive science fields such as the Large Hadron Collider (LHC) high energy physics program and the BioGenome and human genome projects. Based on Named Data Networking (NDN), a data-centric future Internet architecture, N-DISE will deploy and commission a highly efficient and field-tested petascale data distribution, caching, access and analysis system serving major science programs. The N-DISE project will build on recently developed high-throughput NDN caching and forwarding methods, containerization techniques, leverage the integration of NDN and SDN systems concepts and algorithms with the mainstream data distribution, processing, and management systems of CMS, as well as the integration with Field Programmable Gate Arrays (FPGA) acceleration subsystems, to produce a system capable of delivering LHC and genomic data over a wide area network at throughputs approaching 100 Gbits per second, while dramatically decreasing download times. N-DISE will leverage existing infrastructure and build an enhanced testbed with high performance NDN data cache servers at participating institutions.  The N-DISE demonstration is designed to exhibit improved performance of the N-DISE system for workflow acceleration within large-scale data-intensive programs such as the LHC high energy physics, BioGenome and human genome programs. To achieve high performance, the demonstration will leverage the following key components: (1) the transparent integration of NDN with the current CMS software stack via an NDN based XRootD Open Storage System plugin, (2) joint caching and multipath forwarding capabilities of the VIP algorithm, (3) integration with FPGA acceleration subsystems, (4) SDN support for NDN through the work of the Global Network Advancement Group (GNA-G) and its AutoGOLE/SENSE and Data Intensive Sciences Working Group.
  48. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (VI)  High performance networking with the Bella Link and the Sao Paulo Backbone  With the recent availability of an L2 circuit linking the University of São Paulo State (UNESP) to CERN in Geneva, through the Bella Link, Brazil (and São Paulo State) now has three 100 gbps academic links to the USA and to Europe. This new circuit is supposed to present low latency, and that should open new opportunities for collaboration among Brazilian and European academic institutions. All links pass through the SP4 Equinix datacenter in the greater Sao Paulo area. At the SP4 datacenter, rednesp (Research and Education Network at Sao Paulo) has a node called SouthernLight which is part of the GNA Autogole/SENSE testbed.  At the same time, the Rednesp Sao Paulo regional network in finishing the "Backbone SP" connecting 9 important universities in Sao Paulo state. In this demo, we intend to compare properties such as effective bandwidth, latency to different continents and jitter using different paths of the 3 intercontinental links as well as the integration of the "Backbone SP" to international connections. We also expect to show results and metrics of parallel computations using some of the national and international computing facilities connected to this infrastructure.
  49. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (VII)  Coral: Scalable Data Plane Checking via Distributed, On-Device Verification  Data plane verification (DPV) is important for finding network errors, and therefore a fundamental pillar for achieving consistently operating, autonomous-driving, high-performance science networks. Current DPV tools employ a centralized architecture, where a server collects the data planes of all devices and verifies them. Despite substantial efforts on accelerating DPV, this centralized architecture is inherently non-scalable. To tackle the scalability challenge of DPV, the team of Xiamen University, China, circumvents the scalability bottleneck of centralized design and design Coral, a distributed, on-device DPV framework.  The key insight of Coral is that DPV can be transformed into a counting problem on a directed acyclic graph, which can be naturally decomposed into lightweight tasks executed at network devices, enabling scalability. Coral consists of (1) a declarative requirement specification language, (2) a planner that employs a novel data structure DVNet to systematically decompose global verification into on-device counting tasks, and (3) a distributed verification (DV) protocol that specifies how on-device verifiers communicate task results efficiently to collaboratively verify the requirements. During SC'22, we will demonstrate, through both a testbed reconstruction of the Internet2 WAN environment and large-scale simulations of WAN/LAN/DC with real-world datasets, that Coral consistently achieves scalable DPV under various scenarios. A preliminary manuscript of Coral can be found at [1] and a set of small-scale demos can be found at [2].  [1] Qiao Xiang et al., Switch as a Verifier: Toward Scalable Data Plane Checking via Distributed, On- Device Verification, https://arxiv.org/abs/2205.07808  [2] Qiao Xiang et al., Coral System Functionality Demonstration, http://distributeddpvdemo.tech/
  50. Network Traffic Evolution 2021-22 to and from CERN LHCOPN LHCOPN Traffic CERN to Tier1s Overall CERN Traffic: LHCOPN, LHCONE, Internet Out In E. Martelli, CERN/IT
  51. HL-LHC Network Needs and Data Challenges Current Understanding: Q2 2021  Export of Raw Data from CERN to the Tier1s (350 Pbytes/Year):  400 Gbps Flat each for ATLAS and CMS Tier1s; +100G each for other data formats; +100 G each for ALICE, LHCb  “Minimal” Scenario [*]: Network Infrastructure from CERN to Tier1s Required  4.8 Tbps Aggregate: Includes 1.2 Tbps Flat (24 X 7 X 365) from the above, x2 to Accommodate Bursts, and x2 for overprovisioning, for operational headroom: including both non-LHC use, and other LHC use.  This includes 1.4 Tbps Across the Atlantic for ATLAS and CMS alone  Note that the above Minimal scenario is where the network is treated as a scarce resource, unlike LHC Run1 and Run2 experience in 2009-18.  In a “Flexible Scenario” [**]: 9.6 Tbps, including 2.7 Tbps Across the Atlantic Leveraging the Network to obtain more flexibility in workload scheduling, increase efficiency, improve turnaround time for production & analysis  In this scenario: Links to Larger Tier1s in the US and Europe: ~ 1 Tbps (some more); Links to Other Tier1s: ~500 Gbps  Tier2 provisioning: 400Gbps bursts, 100G Yearly Avg: ~Petabyte Import in a shift  Need to work with campuses to accommodate this: it may take years [*] NOTE: Matches numbers presented at ESnet Requirements Review (Summer 2020) [**] NOTE: Matches numbers presented at the January 2020 LHCONE/LHCOPN Meeting
  52. (Southern) California ((So)Cal) Cache 53 Roughly 30,000 cores across Caltech & UCSD … half typically used for analysis A 2 Pbyte Working Example in Production Plan to include Riverside and other SoCal Tier3s; ESnet plan to install additional in- network caches near US Tier2s; Move to Ceph underway Scaling to HL LHC: ~30 Pbytes Per Tier2, ~10 Pbyte Caches, 1 Pbyte Refresh in a Shift Requires 400G Link. Still relies on use of compact event forms, efficiently managed data transport
  53. SENSE: SDN for End-to-end Networked Science at the Exascale ESnet Caltech Fermilab LBNL/NERSC Argonne Maryland 54  Use Cases: Inter-DTN Priority Flows, LHC File Transfer Service (FTS), SENSE/AutoGole Integration (GNA-G), ExaFEL, Big Data Express
  54. Internet2 AL2S ESnet AutoGOLE / SENSE Deployment StarLight NetherLight SURFNet MOXY CENIC PacWave RNP AMPATH Southern Light CERN UCSD LOSA SEAT SUNY KREONET/ KRLight KISTI AMPATH RNP MANLAN WIX Site/End Systems Exchange Point Network ESnet Testbed STAR
  55. SENSE: SDN for End-to-end Networked Science at the Exascale ESnet Caltech Fermilab LBNL/NERSC Argonne Maryland 56
  56. SENSE: SDN for End-to-end Networked Science at the Exascale ESnet Caltech Fermilab LBNL/NERSC Argonne Maryland 57
  57. Self-driving Networks| BERKELEY LAB Self-Driving Networks “Networks should learn to drive themselves”* 58 [*] Why (and How) Networks Should Run Themselves, Feamster and Rexford https://doi.org/10.48550/arXiv.1710.11583 Example controllers: Ryu, OpenDaylight, OSCARS Listener APIs Supervised/Unsupervised classification: anomaly detection Regression: forecasting Action plan Network Telemetry Execute Control Reward function Inputs to AI agent Try different optimization strategies and/or objective functions Can take simple actions to achieve key goals. Such as: improving availability, attack resilience and dealing with scale. Our argument is AI is needed for mission critical actions.
  58. M. Kiran, C, Guok Talk at APAN2021 Intelligent Controller (HECATE) Case Studies: 1. Model free: Path selection for large data transfers: better load balancing 2. Model Free: Forwarding decisions for complex network topologies: Deep RL to learn optimal packet delivery policies vs. network load level 3. Model Based: Predicting network patterns with Netpredict Self Driving Network Adaptive Routing (e.g. Real time data for routing decisions) Learns to Avoid Congestion Congestion Free  Loss Free Towards 100% Utilization Proactive Fault Repair Mariam Kiran (ESnet) et al. Intelligent Networks DOE Project Use Deep Reinforcement Learning to Optimize network traffic engineering
  59. Self-Driving Network for Science 60 Model-based learning Model-free learning Predicting Future Congestion Study traffic patterns action observations e.g.Network telemetry or monitoring reward e.g. flow completion time e.g. Reconfigure network e.g. update flow rules Using Deep Reinforcement Learning for optimizing traffic engineering over networks Kiran et al. Intelligent Networks DOE Project  D-DCRNN: Spatiotemporal forecasting with applications in neuroscience, climate, traffic flow, smart grid, logistics, supply chain: https://arxiv.org/pdf/1707.01926.pdf  Future: Hooks for a Richer, More Stateful Objective Function (Policy, Priority, Deadlines, Path Quality, Co-Flows...); Event Response; Applications to 5G and Quantum Networks Mariam Kiran (ESnet) mkiran@lbl.gov et al
  60. Self-driving Networks| BERKELEY LAB HECATE Simulation 61 UR: link utilization rate Moves incoming traffic to less used paths
  61. Science Networks are Expanding to Wireless 5G, 6G Sensor Networks and Beyond 62  Statically deployed, unattended network sensors need full-time monitoring  While sensors in mobile networks (including unmanned aerial systems) can self-manage and optimize data collection  For continual monitoring of the sites, data is processed in real-time and transmitted to a central location with 5G speed for further processing  Novel mesh network topologies for flexible inclusion of new sources - much like new cells in the body - are being developed  Can use 5G latency to handle the tradeoff between edge and cloud processing
  62. SENSE: SDN Enabled Networks for Science at the Exascale ESnet, Caltech, Fermilab, LBNL https://arxiv.org/abs/2004.05953 Creates Virtual Circuit Overlays. Site and Network RMs, Orchestrator
  63. SC20 to 22+: AutoGOLE/SENSE Persistent Testbed: ESnet, SURFnet, Internet2, StarLight, CENIC, Pacific Wave, AmLight, RNP, USP, UFES, GridUNESP, Hawaii, Guam, KISTI, Tokyo,Caltech, UCSD, PRP, FIU, CERN, Fermilab, UMd, DE-KIT Federation with the StarLight GEANT/RARE and AmLight P4 Testbeds Underway 2022-23 Outlook: ESnet6/High Touch KAUST FABRIC BRIDGES US CMS Tier2s GridUNESP, UERJ SKAO AarNet TIFR et al APONet Courtesy T. Lehman Caltech/ UCSD/ LA 100G to 4 X 100 to 400G with CENIC Enhanced for SC22 and Beyond 400G Link(s) NetherLight- CERN Automation Following SENSE, NRP, ALTO, Atlantic Wave SDX Persistent Operations: Beginning this Year NRP
  64. ●Goal is to more flexibly control how these are utilized on a per flow, group, or use basis ●Do not want to manage "every" flow in the network, however should be able to manage "any" flow in the network ●There are multiple transatlantic and transpacific links, operated by multiple organizations transatlantic links Important Link Management ● Two timescales: SENSE overlay network of virtual circuits with BW guarantees is relatively stable; IPv6 subnets and Directors provide more dynamic flow handling ● An equally important goal is to understand the load and leave room for other traffic ● Compatible with other network operations ● Result is to make better use of the available resources overall
  65. SENSE Services for LHC and Other Open Science Grid (OSG) Workflows: Rucio, FTS, XRootD Integration  Goals center around SENSE providing key network-related capabilities to LHC + 30 other science programs  Primary Motivations:  Developing mechanisms for an application workflow to obtain information regarding the network services, capabilities, and options; to a degree similar to what is possible for compute resources  Giving applications the ability to interact with the network: to exchange information, negotiate performance parameters, discover expected performance metrics, and receive status/ troubleshooting information in real time, during long transactions.  Key pathways: (1) Interfacing and interaction with RUCIO, FTS and XRootD data federation and storage systems, through Engagement: with CMS, ATLAS and those development teams (2) Enabled by the ESnet6 plan: with up to ~75% of link capacity on key routes available for policy-driven dynamic provisioning and management by ~2027-28  4 Phase 1 Year Schedule, Completing in Q1 2023: from system evaluation to design, to prototype deployments and tests, to a plan for the transition to operations and the additional R&D needed 66
  66. European Science Data Center OSG Data Federation Vera Rubin Observatory Interfacing to Multiple VOs With FTS/Rucio/XRootD LHC, Dark Matter, n, Heavy Ions, VRO, SKAO, LIGO/Virgo/Kagra; Bioinformatics
  67. Beyond Programmability Alone: A Systems Approach Qualcomm Technologies GradientGraph Analytics 68  System Wide information to identify, deal with root causes Key Components • Bottleneck precedence + flow gradient graphs • Impactful flow and flow group ID • Alternate path recommendations www.reservoir.com/gradientgraph/  Objective: Flow performance optimization in high speed networks, with fairness  Approach: Built on a new mathematical Theory of Bottleneck Structures and an analytical framework  Enables operators to understand and precisely control flow(-group) and bottleneck performance  Value: Improved capacity planning, traffic engineering  Greater, more effective network throughput and stability as a function of capacity and cost  "On the Bottleneck Structure of Congestion-Controlled Networks," ACM SIGMETRICS, Boston, June 2020 [https://bit.ly/3eGOPrb]  "Designing Data Center Networks Using Bottleneck Structures," accepted for publication at ACM SIGCOMM 2021 [https://bit.ly/2UZCb1M]  "Computing Bottleneck Structures at Scale for High-Precision Network Performance Analysis," SC 2020 INDIS, November 2020 [https://bit.ly/3BriwaB]  "A Quantitative Theory of Bottleneck Structures for Data Networks", Technical Report, available upon request [https://bit.ly/38u8ARs]
  68. Global P4 Lab and Production Grade Programmable Switches Marcos Schwarz SC22 November 14-17 2022, Dallas - TX
  69. Motivation Can we increase the rate of evolution without interfering with production networks? To develop and operate end-to-end / multi-domain orchestration services: • Resource reservation (guaranteed bandwidth) • Resource provisioning (Circuits, VRFs) • Underlay observability • Dynamic traffic steering/engineering • Dynamic creation of L3 VPNs • Closed loop multi-domain visibility/intelligence/controllability How can we create/sustain an integration initiative/platform to propose and validate next generation protocol and services? • Proposition: Use programmable P4 devices to experiment on pre-production networks leveraging industry/R&E open ecosystems
  70. Global P4 Lab Goals Current (SC22) • Persistent global L3 overlay network based on P4 switches • Intercontinental high capacity transfers (100G and over) exploring multiple source routing solutions (Segment Routing and PolKA) • Management infrastructure and tools used to operate this global network • Core network based on RARE/FreeRtr and edge networks based also on SONiC Future (2023) • Capability to support multiple virtual networks, that implement on the same devices different choices of routing stacks, traditional and SDN based • Integration with initiatives for visibility, controllability and intelligence • Work as a reference state of the art / next generation R&E network
  71. Integration Path with other Initiatives
Anúncio