SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Holistic Aggregate
Resource Environment
   Eric Van Hensbergen (IBM Research)
   Ron Minnich (Sandia National Labs)
           Jim McKie (Bell Labs)
       Charles Forsyth (Vita Nuova)
          David Eckhardt (CMU)
Overview


           Sequoia


            BG/L


           Red Storm
Research Topics
•   Pre-requisite: reliability and application driven design is pervasive in all
    explored areas
•   Offload/Acceleration Deployment Model
    •   Supercomputer needs to become an extension of scientist's desktop
        as opposed to batch driven, non-standard run-time environment.
•   Leverage aggregation as a first-class systems construct to help manage
    complexity and provide a foundation for scalability, reliability, and
    efficiency.
•   Distribute system services throughout the machine (not just on io-node)
•   Interconnect Abstractions & Utilization
    •   Leverage HPC interconnects in system services (file system, etc.)
    •   sockets & TCP/IP don't map well to HPC interconnects (torus and
        collective) and are inefficient when hardware provides reliability
Right Weight Kernel
•   General purpose multi-thread, multi-user environment
•   Pleasantly Portable
•   Relatively Lightweight (relative to Linux)
•   Core Principles
    •   All resources are synthetic file hierarchies
    •   Local & remote resources accessed via simple API
    •   Each thread can dynamically organize local and
        remote resources via dynamic private namespace
Aggregation
•   extend BG/P aggregation model
    beyond I/O and CPU node barrier

•   allow grouping of nodes into
    collaborating aggregates with
    distributed system services and
    dedicated service nodes

•   allow specialized kernel for file         local service proxy service aggregate service
    service, monitoring, checkpointing,
    and network routing

•   parameterized redundancy, reliability,
                                             local service
    and scaling

•   allows dynamic (re-) organization of
    programming model to match the
    (changing) workload
                                                                    remote services
Topology
Desktop Extension
•   Users want super computers to be an extension of their desktop
•   Current parallel model is traditional batch model
•   Workloads must use specialized compilers and be scheduled from special
    front-end node. Results are collected into a separate file system
•   Monitoring and job control through web interface or MMCS command line
•   Very difficult development environment and lack of interactivity limits
    productivity of execution environment
•   Proposed Research
    •   leverage library OS commercial scale-out work to allow tighter
        coupling between desktop environment and super computer resources
    •   Construct runtime environment which includes some reasonable subset
        of support for typical Linux run-time requirements (glibc, python, etc.)
Extension Example
app   brasil                  brasil       app                   app

  osx          internet   Linux                Plan 9                Plan 9
  Mac                     pSeries               I/O                  CPU
                  ssh             10GB Ether            collective
                                                                                 ...
                                                                              torus
Native Interconnects
•   Blue Gene specialized networks are used primarily by user
    space run-time
•   Hardware is directly accessed by user space runtime time
    environment and are not shared leading to poor utilization
•   Exclusive use of tree network for I/O limits bandwidth and
    reliability
•   Proposed Solution
    •   Light weight system software interfaces to interconnects
        so that they can be leveraged for system management,
        monitoring, and resource sharing as well as user
        applications
Protocol Exploration
•   The Blue Gene networks are unusual (eg, 3D torus carrying 240-byte payloads)
•   IP Works, but isn’t well matched to the underlying capabilities
•   We want an efficient transport protocol to carry 9P messages & other data
    streams
•   Related Work: IBM’s ‘one-sided’ messaging operations [Blocksome et al]
    •   It supports both MPI and non-MPI applications such as Global Arrays


•   Inspired by the IBM messaging protocol, we think we might do better than just IP
    •   Years ago there was much work on lightweight protocols for high-speed
        networks
•   We are using ideas from that earlier research to implement an efficient protocol
    to carry 9P conversations
Project Roadmap
                                0   1          2   3

         Hardware Support

     Systems Infrastructure
Evaluation, Scaling, & Tuning
                                        Year
Milestones (Year 1)




                                          09
 BASIC




                                        20
                Initial Boot
          10 GB Ethernet
      Collective Network
     Initial Infrastructure
             SMP Support
              Large Pages
           Torus Network
          Native Protocol

BASELINE         Baseline
                               0   10    20    30   40   50
                                          Weeks
BG/L (circa 2007)
PUSH
                                                           !"#$$%
                                                  ,-.#              ,-.#
                                                         &'(()*+


                                                  ,-.#     !"#$$%   ,-.#
                                                         &'(()*+


                                                  ,-.#     !"#$$%   ,-.#
                                                         &'(()*+
                    !"#$$%                                                                          !"#$$%
                             ,-.#   /0$1-.$#2'3                            4#(0$1-.$#2'3   ,-.#
                  &'(()*+                                                                         &'(()*+
                                                  ,-.#     !"#$$%   ,-.#
                                                         &'(()*+


                                                  ,-.#     !"#$$%   ,-.#
                                                         &'(()*+


                                                           !"#$$%
                                                  ,-.#   &'(()*+    ,-.#




push -c ’{                 Figure 1: The structure of the PUSH shell
   ORS=./blm.dis
   du -an files |< xargs os chasen | awk ’{print $1}’ | sort | uniq -c >| sort -rn
}’
    We have added two additional pipeline operators, a multiplexing fan-out(|<[n]), and a coalescing
 fan-in(>|). This combination allows PUSH to distribute I/O to and from multiple simultaneous
 threads of control. The fan-out argument n specifies the desired degree of parallel threading. If no
 argument is specified, the default of spawning a new thread per record (up to the limit of available
 cores) is used. This can also be overriden by command line options or environment variables. The
 pipeline operators provide implicit grouping semantics allowing natural nesting and composibility.
 While their complimentary nature usually lead to symmetric mappings (where the number of fan-
 outs equal the number of fan-ins), there is nothing within our implementation which enforces it.
Early FTQ Results
Strid3
                  Y= AX + Y
           Time for 1024 iterations


 Time
   in
seconds




          “Stride”, i.e. distance between scalars
Application Support
•   Native
•   Inferno Virtual Machine
•   CNK Binary Support
    •   Elf Converter
    •   Extended proc interface to mark processes as “cnk procs”
    •   Transition once the process execs, and not before
    •   Shim in syscall trap code to adapt arg passing conventions
•   Linux Binary Support
    •   Basic Linux binary support
    •   Functional enough to run basic programs (Python, etc.)
Publications
•   Unified Execution Model for Cloud Computing; Eric Van Hensbergen, Noah Evans, Phillip Stanley-
    Marbell. Submitted to LADIS 2009; October 2009.
•   PUSH, a DISC Shell; Eric Van Hensbergen, Noah Evans. To Appear in the Proceedings of the Principles of
    Distributed Computing Conference; August 2009.
•   Measuring Kernel Throughput on BG/P with the Plan 9 Research Operating System; Ron Minnich, John
    Floren, Aki Nyrhinen. Submitted to SC 09; November 2009.
•   XCPU2: Distributed Seamless Desktop Extension; Eric Van Hensbergen, Latchesar Ionkov. Submitted to
    IEEE Clusters 2009; October 2009.
•   Service Oriented File Systems; Eric Van Hensbergen, Noah Evans, Phillip Stanley-Marbell. IBM Research
    Report (RC24788), June 2009
•   Experiences Porting the Plan 9 Research Operating System to the IBM Blue Gene Supercomputers; Ron
    Minnich, Jim McKie. To appear in the Proceedings of the International Conference on Supercomputing
    (ISC); June 2009.
•   System Support for Many Task Computing; Eric Van Hensbergen and Ron Minnich. In the Proceedings of
    the Workshop on Many Task Computing on Grids and Supercomputers; November 2008.
•   Holistic Aggregate Resource Environment; Charles Forsyth, Jim McKie, Ron Minnich and Eric Van
    Hensbergen. In the ACM Operating Systems Review; January 2008.

•   Night of the Lepus: A Plan 9 Perspective on Blue Gene's Interconnects; Charles Forsyth, Jim McKie, Ron
    Minnich and Eric Van Hensbergen. In the proceedings of the second annual international workshop on
    Plan 9; December 2007
•   Petascale Plan 9. USENIX 2007
Next Steps
•   Infrastructure Scale Out
    •   File Services
    •   Command Execution
    •   Alternate Internode Communication Models
    •   Fail in place software RAS models
•   Applications (Linux binaries and native support)
    •   Large Scale LINPACK Run
    •   Explore Mantevo Application Suite
        •   (http://software.sandia.gov/mantevo)
    •   CMU Working on Native Quake port
Acknowledgments
• Computational Resources Provided by
  DOE INCITE Program. Thanks to the
  patient folks at ANL who have supported
  us bringing up Plan 9 on their development
  BG/P
• Thanks to IBM Research Blue Gene team
  and the Kittyhawk Team for guidance and
  support.
Questions? Discussion?
    http://www.research.ibm.com/hare
Backup
IBM Research, Sandia National Labs, Bell Labs, and CMU




24        Systems Support for Many Task Computing             11/17/2008   (c) 2008 IBM Corporation
IBM Research, Sandia National Labs, Bell Labs, and CMU



Plan 9 Characteristics

   Kernel Breakdown - Lines of Code
       Architecture Specific Code
         BG/P:                          ~14,000 lines of code
       Portable Code
        Port:                           ~25,000 lines of code
       TCP/IP Stack:                    ~14,000 lines of code

   Binary Sizes
       415k Text + 140k Data + 107k BSS




  25         Systems Support for Many Task Computing             11/17/2008   (c) 2008 IBM Corporation
IBM Research, Sandia National Labs, Bell Labs, and CMU



Why not Linux?
   Not a distributed system

   Core systems inflexible
   VM based on x86 MMU
   Networking tightly tied to sockets & TCP/IP w/long call-path
   Typical installations extremely overweight and noisy
   Benefits of modularity and open-source advantages overcome by complexity, dependencies, and rapid rate
   of change

   Community has become conservative
   Support for alternative interfaces waning
   Support for large systems which hurts small systems not acceptable

   Ultimately a customer constraint
   FastOS was developed to prevent OS monoculture in HPC
   Few Linux projects were even invited to submit final proposals




  26          Systems Support for Many Task Computing                   11/17/2008             (c) 2008 IBM Corporation
IBM Research, Sandia National Labs, Bell Labs, and CMU



Everything Represented as File Systems
          Hardware                                         System                            Application
           Devices                                         Services                           Services


  Disk                                         TCP/IP Stack                            DNS
                     /dev/hda1                /net                                   /net
                                                 /arp                                               /cs
                                                 /udp                                               /dns
                    /dev/hda2                    /tcp
                                                    /clone
                                                    /stats                            GUI
                                                                                             /win
                                                      /0
                                                                                                           /clone
                                                      /1
 Network                                                 /ctl
                                                                                                           /0
                                                         /data
                                                                                                           /1
                                                         /listen                                              /ctl
                                                         /local                                               /
                      /dev/eth0
                                                         /remote                             data
                                                         /status                                                /
                                                                                             refresh
                                                                                                           /2
 Console, Audio, Etc.                         Process Control,                     Wiki, Authentication,
                                              Debug, Etc.                          and Service Control

  27          Systems Support for Many Task Computing                 11/17/2008                (c) 2008 IBM Corporation
IBM Research, Sandia National Labs, Bell Labs, and CMU



Plan 9 Networks                                                                           Screen
                                                                                          Phone                   PDA
                                                                                                                Smartphone
                                                          Set Top Box
                                                                                                        ‫)‏‬
                                                                                                     ‫)‏ )‏‬

                                                                 Term

   Term           Term              Term                                                             Wifi/Edge
                                                                           Cable/DSL


                                                                                                   Internet
                                                 LAN (1 GB/s) Network
                                   File              CPU                 CPU
                                  Server            Servers             Servers




            Content
          Addressable
            Storage                 High Bandwidth (10 GB/s) Network




  28            Systems Support for Many Task Computing                      11/17/2008                      (c) 2008 IBM Corporation
IBM Research, Sandia National Labs, Bell Labs, and CMU



Aggregation as a First Class Concept




       Local Service                         Proxy Service            Aggregate Service




                                           Remote Service       Remote Service    Remote Service

  29        Systems Support for Many Task Computing             11/17/2008           (c) 2008 IBM Corporation
IBM Research, Sandia National Labs, Bell Labs, and CMU



Issues of Topology




  30        Systems Support for Many Task Computing             11/17/2008   (c) 2008 IBM Corporation
IBM Research, Sandia National Labs, Bell Labs, and CMU



File Cache Example
   Proxy Service
       Monitors access to remote file server & local resources
       Local cache mode
       Collaborative cache mode
       Designated cache server(s)‫‏‬
       Integrate replication and redundancy
       Explore write coherence via “territories” ala Envoy
   Based on experiences with Xget deployment model
   Leverage natural topology of machine where
    possible.


  31         Systems Support for Many Task Computing             11/17/2008   (c) 2008 IBM Corporation
IBM Research, Sandia National Labs, Bell Labs, and CMU



Monitoring Example
   Distribute monitoring throughout the system
       Use for system health monitoring and load balancing
       Allow for application-specific monitoring agents
   Distribute filtering & control agents at key points in
    topology
   Allow for localized monitoring and control as well as
    high-level global reporting and control
   Explore both push and pull methods of modeling
   Based on experiences with supermon system.



  32         Systems Support for Many Task Computing             11/17/2008   (c) 2008 IBM Corporation
IBM Research, Sandia National Labs, Bell Labs, and CMU



Workload Management Example
   Provide file system interface to job execution and
    scheduling.
   Allows scheduling of new work from within the
    cluster, using localized as well as global scheduling
    controls.
   Can allow for more organic growth of workloads as
    well as top-down and bottom-up models.
   Can be extended to allow direct access from end-
    user workstations.
   Based on experiences with Xcpu mechanism.


  33        Systems Support for Many Task Computing             11/17/2008   (c) 2008 IBM Corporation
IBM Research, Sandia National Labs, Bell Labs, and CMU



Right Weight Kernels Project (Phase I)‫‏‬

   Motivation
   OS Effect on Applications
   Metric is based on OS Interference on FWQ & FTQ benchmarks.
   AIX/Linux has more capability than many apps need
   LWK and CNK have less capability than apps want
   Approach
   Customize the kernel to the application
   Ongoing Challenges
   Need to balance capability with overhead




  34        Systems Support for Many Task Computing             11/17/2008   (c) 2008 IBM Corporation
IBM Research, Sandia National Labs, Bell Labs, and CMU



Why Blue Gene?
   Readily available large-scale cluster
   Minimum allocation is 37 nodes
   Easy to get 512 and 1024 node configurations
   Up to 8192 nodes available upon request internally
   FastOS will make 64k configuration available
   DOE interest – Blue Gene was a specified target
   Variety of interconnects allows exploration of alternatives
   Embedded core design provides simple architecture that is quick to port to
   and doesn't require heavy weight systems software management, device
   drivers, or firmware




  35        Systems Support for Many Task Computing             11/17/2008   (c) 2008 IBM Corporation

Mais conteúdo relacionado

Mais procurados

Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramViswanath Gangavaram
 
A Network Architecture for the Web of Things
A Network Architecture for the Web of ThingsA Network Architecture for the Web of Things
A Network Architecture for the Web of Thingsbenaam
 
apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010Thejas Nair
 
Stateful Containers: Flocker on CoreOS
Stateful Containers: Flocker on CoreOSStateful Containers: Flocker on CoreOS
Stateful Containers: Flocker on CoreOSStephen Nguyen
 
QGIS plugin for parallel processing in terrain analysis
QGIS plugin for parallel processing in terrain analysisQGIS plugin for parallel processing in terrain analysis
QGIS plugin for parallel processing in terrain analysisRoss McDonald
 
[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network TroubleshootingOpen Source Consulting
 
Bgpcep odl summit 2015
Bgpcep odl summit 2015Bgpcep odl summit 2015
Bgpcep odl summit 2015Giles Heron
 
Gofer 200707
Gofer 200707Gofer 200707
Gofer 200707oscon2007
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with PrometheusOpenStack Korea Community
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 

Mais procurados (13)

Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaram
 
A Network Architecture for the Web of Things
A Network Architecture for the Web of ThingsA Network Architecture for the Web of Things
A Network Architecture for the Web of Things
 
apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010
 
HARE 2010 Review
HARE 2010 ReviewHARE 2010 Review
HARE 2010 Review
 
Stateful Containers: Flocker on CoreOS
Stateful Containers: Flocker on CoreOSStateful Containers: Flocker on CoreOS
Stateful Containers: Flocker on CoreOS
 
QGIS plugin for parallel processing in terrain analysis
QGIS plugin for parallel processing in terrain analysisQGIS plugin for parallel processing in terrain analysis
QGIS plugin for parallel processing in terrain analysis
 
PD_Tcl_Examples
PD_Tcl_ExamplesPD_Tcl_Examples
PD_Tcl_Examples
 
[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting
 
Bgpcep odl summit 2015
Bgpcep odl summit 2015Bgpcep odl summit 2015
Bgpcep odl summit 2015
 
Gofer 200707
Gofer 200707Gofer 200707
Gofer 200707
 
06 pig etl features
06 pig etl features06 pig etl features
06 pig etl features
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 

Semelhante a Holistic Aggregate Resource Environment

Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersJulien Anguenot
 
EEDC - Apache Pig
EEDC - Apache PigEEDC - Apache Pig
EEDC - Apache Pigjavicid
 
Mk network programmability-03_en
Mk network programmability-03_enMk network programmability-03_en
Mk network programmability-03_enMiya Kohno
 
Declarative Programming and a form of SDN
Declarative Programming and a form of SDN Declarative Programming and a form of SDN
Declarative Programming and a form of SDN Miya Kohno
 
One tool, two fabrics: Ansible and Nexus 9000
One tool, two fabrics: Ansible and Nexus 9000One tool, two fabrics: Ansible and Nexus 9000
One tool, two fabrics: Ansible and Nexus 9000Joel W. King
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Hajime Tazaki
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systemsSri Prasanna
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operationsgrim_radical
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialRoger Rafanell Mas
 
A Networking View for the DevOps Crew: SDN
A Networking View for the DevOps Crew: SDNA Networking View for the DevOps Crew: SDN
A Networking View for the DevOps Crew: SDNJeremy Schulman
 
Evolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deploymentsEvolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deploymentsRakuten Group, Inc.
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Anthony Dahanne
 
Osi week10(1) [autosaved] by Gulshan K Maheshwari(QAU)
Osi week10(1) [autosaved] by Gulshan  K Maheshwari(QAU)Osi week10(1) [autosaved] by Gulshan  K Maheshwari(QAU)
Osi week10(1) [autosaved] by Gulshan K Maheshwari(QAU)GulshanKumar368
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with HadoopJosh Devins
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioiguazio
 

Semelhante a Holistic Aggregate Resource Environment (20)

Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
 
EEDC - Apache Pig
EEDC - Apache PigEEDC - Apache Pig
EEDC - Apache Pig
 
EEDC Apache Pig Language
EEDC Apache Pig LanguageEEDC Apache Pig Language
EEDC Apache Pig Language
 
Mk network programmability-03_en
Mk network programmability-03_enMk network programmability-03_en
Mk network programmability-03_en
 
Declarative Programming and a form of SDN
Declarative Programming and a form of SDN Declarative Programming and a form of SDN
Declarative Programming and a form of SDN
 
One tool, two fabrics: Ansible and Nexus 9000
One tool, two fabrics: Ansible and Nexus 9000One tool, two fabrics: Ansible and Nexus 9000
One tool, two fabrics: Ansible and Nexus 9000
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systems
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operations
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorial
 
A Networking View for the DevOps Crew: SDN
A Networking View for the DevOps Crew: SDNA Networking View for the DevOps Crew: SDN
A Networking View for the DevOps Crew: SDN
 
Evolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deploymentsEvolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deployments
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018
 
Osi week10(1) [autosaved] by Gulshan K Maheshwari(QAU)
Osi week10(1) [autosaved] by Gulshan  K Maheshwari(QAU)Osi week10(1) [autosaved] by Gulshan  K Maheshwari(QAU)
Osi week10(1) [autosaved] by Gulshan K Maheshwari(QAU)
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclio
 

Mais de Eric Van Hensbergen

Scaling Arm from One to One Trillion
Scaling Arm from One to One TrillionScaling Arm from One to One Trillion
Scaling Arm from One to One TrillionEric Van Hensbergen
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Eric Van Hensbergen
 
ISC14 Embedded HPC BoF Panel Presentation
ISC14 Embedded HPC BoF Panel PresentationISC14 Embedded HPC BoF Panel Presentation
ISC14 Embedded HPC BoF Panel PresentationEric Van Hensbergen
 
Simulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to SupercomputersSimulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to SupercomputersEric Van Hensbergen
 
Scalable Elastic Systems Architecture (SESA)
Scalable Elastic Systems Architecture (SESA)Scalable Elastic Systems Architecture (SESA)
Scalable Elastic Systems Architecture (SESA)Eric Van Hensbergen
 
XCPU3: Workload Distribution and Aggregation
XCPU3: Workload Distribution and AggregationXCPU3: Workload Distribution and Aggregation
XCPU3: Workload Distribution and AggregationEric Van Hensbergen
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEric Van Hensbergen
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task ComputingEric Van Hensbergen
 

Mais de Eric Van Hensbergen (20)

Scaling Arm from One to One Trillion
Scaling Arm from One to One TrillionScaling Arm from One to One Trillion
Scaling Arm from One to One Trillion
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
ISC14 Embedded HPC BoF Panel Presentation
ISC14 Embedded HPC BoF Panel PresentationISC14 Embedded HPC BoF Panel Presentation
ISC14 Embedded HPC BoF Panel Presentation
 
Simulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to SupercomputersSimulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to Supercomputers
 
Brasil Ross 2011
Brasil Ross 2011Brasil Ross 2011
Brasil Ross 2011
 
Scalable Elastic Systems Architecture (SESA)
Scalable Elastic Systems Architecture (SESA)Scalable Elastic Systems Architecture (SESA)
Scalable Elastic Systems Architecture (SESA)
 
Multipipes
MultipipesMultipipes
Multipipes
 
Multi-pipes
Multi-pipesMulti-pipes
Multi-pipes
 
VirtFS
VirtFSVirtFS
VirtFS
 
PUSH-- a Dataflow Shell
PUSH-- a Dataflow ShellPUSH-- a Dataflow Shell
PUSH-- a Dataflow Shell
 
XCPU3: Workload Distribution and Aggregation
XCPU3: Workload Distribution and AggregationXCPU3: Workload Distribution and Aggregation
XCPU3: Workload Distribution and Aggregation
 
9P Code Walkthrough
9P Code Walkthrough9P Code Walkthrough
9P Code Walkthrough
 
9P Overview
9P Overview9P Overview
9P Overview
 
Push Podc09
Push Podc09Push Podc09
Push Podc09
 
Libra: a Library OS for a JVM
Libra: a Library OS for a JVMLibra: a Library OS for a JVM
Libra: a Library OS for a JVM
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS Interference
 
PROSE
PROSEPROSE
PROSE
 
Libra Library OS
Libra Library OSLibra Library OS
Libra Library OS
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
Paravirtualized File Systems
Paravirtualized File SystemsParavirtualized File Systems
Paravirtualized File Systems
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Holistic Aggregate Resource Environment

  • 1. Holistic Aggregate Resource Environment Eric Van Hensbergen (IBM Research) Ron Minnich (Sandia National Labs) Jim McKie (Bell Labs) Charles Forsyth (Vita Nuova) David Eckhardt (CMU)
  • 2. Overview Sequoia BG/L Red Storm
  • 3. Research Topics • Pre-requisite: reliability and application driven design is pervasive in all explored areas • Offload/Acceleration Deployment Model • Supercomputer needs to become an extension of scientist's desktop as opposed to batch driven, non-standard run-time environment. • Leverage aggregation as a first-class systems construct to help manage complexity and provide a foundation for scalability, reliability, and efficiency. • Distribute system services throughout the machine (not just on io-node) • Interconnect Abstractions & Utilization • Leverage HPC interconnects in system services (file system, etc.) • sockets & TCP/IP don't map well to HPC interconnects (torus and collective) and are inefficient when hardware provides reliability
  • 4. Right Weight Kernel • General purpose multi-thread, multi-user environment • Pleasantly Portable • Relatively Lightweight (relative to Linux) • Core Principles • All resources are synthetic file hierarchies • Local & remote resources accessed via simple API • Each thread can dynamically organize local and remote resources via dynamic private namespace
  • 5. Aggregation • extend BG/P aggregation model beyond I/O and CPU node barrier • allow grouping of nodes into collaborating aggregates with distributed system services and dedicated service nodes • allow specialized kernel for file local service proxy service aggregate service service, monitoring, checkpointing, and network routing • parameterized redundancy, reliability, local service and scaling • allows dynamic (re-) organization of programming model to match the (changing) workload remote services
  • 7. Desktop Extension • Users want super computers to be an extension of their desktop • Current parallel model is traditional batch model • Workloads must use specialized compilers and be scheduled from special front-end node. Results are collected into a separate file system • Monitoring and job control through web interface or MMCS command line • Very difficult development environment and lack of interactivity limits productivity of execution environment • Proposed Research • leverage library OS commercial scale-out work to allow tighter coupling between desktop environment and super computer resources • Construct runtime environment which includes some reasonable subset of support for typical Linux run-time requirements (glibc, python, etc.)
  • 8. Extension Example app brasil brasil app app osx internet Linux Plan 9 Plan 9 Mac pSeries I/O CPU ssh 10GB Ether collective ... torus
  • 9. Native Interconnects • Blue Gene specialized networks are used primarily by user space run-time • Hardware is directly accessed by user space runtime time environment and are not shared leading to poor utilization • Exclusive use of tree network for I/O limits bandwidth and reliability • Proposed Solution • Light weight system software interfaces to interconnects so that they can be leveraged for system management, monitoring, and resource sharing as well as user applications
  • 10. Protocol Exploration • The Blue Gene networks are unusual (eg, 3D torus carrying 240-byte payloads) • IP Works, but isn’t well matched to the underlying capabilities • We want an efficient transport protocol to carry 9P messages & other data streams • Related Work: IBM’s ‘one-sided’ messaging operations [Blocksome et al] • It supports both MPI and non-MPI applications such as Global Arrays • Inspired by the IBM messaging protocol, we think we might do better than just IP • Years ago there was much work on lightweight protocols for high-speed networks • We are using ideas from that earlier research to implement an efficient protocol to carry 9P conversations
  • 11. Project Roadmap 0 1 2 3 Hardware Support Systems Infrastructure Evaluation, Scaling, & Tuning Year
  • 12. Milestones (Year 1) 09 BASIC 20 Initial Boot 10 GB Ethernet Collective Network Initial Infrastructure SMP Support Large Pages Torus Network Native Protocol BASELINE Baseline 0 10 20 30 40 50 Weeks
  • 14.
  • 15. PUSH !"#$$% ,-.# ,-.# &'(()*+ ,-.# !"#$$% ,-.# &'(()*+ ,-.# !"#$$% ,-.# &'(()*+ !"#$$% !"#$$% ,-.# /0$1-.$#2'3 4#(0$1-.$#2'3 ,-.# &'(()*+ &'(()*+ ,-.# !"#$$% ,-.# &'(()*+ ,-.# !"#$$% ,-.# &'(()*+ !"#$$% ,-.# &'(()*+ ,-.# push -c ’{ Figure 1: The structure of the PUSH shell ORS=./blm.dis du -an files |< xargs os chasen | awk ’{print $1}’ | sort | uniq -c >| sort -rn }’ We have added two additional pipeline operators, a multiplexing fan-out(|<[n]), and a coalescing fan-in(>|). This combination allows PUSH to distribute I/O to and from multiple simultaneous threads of control. The fan-out argument n specifies the desired degree of parallel threading. If no argument is specified, the default of spawning a new thread per record (up to the limit of available cores) is used. This can also be overriden by command line options or environment variables. The pipeline operators provide implicit grouping semantics allowing natural nesting and composibility. While their complimentary nature usually lead to symmetric mappings (where the number of fan- outs equal the number of fan-ins), there is nothing within our implementation which enforces it.
  • 17. Strid3 Y= AX + Y Time for 1024 iterations Time in seconds “Stride”, i.e. distance between scalars
  • 18. Application Support • Native • Inferno Virtual Machine • CNK Binary Support • Elf Converter • Extended proc interface to mark processes as “cnk procs” • Transition once the process execs, and not before • Shim in syscall trap code to adapt arg passing conventions • Linux Binary Support • Basic Linux binary support • Functional enough to run basic programs (Python, etc.)
  • 19. Publications • Unified Execution Model for Cloud Computing; Eric Van Hensbergen, Noah Evans, Phillip Stanley- Marbell. Submitted to LADIS 2009; October 2009. • PUSH, a DISC Shell; Eric Van Hensbergen, Noah Evans. To Appear in the Proceedings of the Principles of Distributed Computing Conference; August 2009. • Measuring Kernel Throughput on BG/P with the Plan 9 Research Operating System; Ron Minnich, John Floren, Aki Nyrhinen. Submitted to SC 09; November 2009. • XCPU2: Distributed Seamless Desktop Extension; Eric Van Hensbergen, Latchesar Ionkov. Submitted to IEEE Clusters 2009; October 2009. • Service Oriented File Systems; Eric Van Hensbergen, Noah Evans, Phillip Stanley-Marbell. IBM Research Report (RC24788), June 2009 • Experiences Porting the Plan 9 Research Operating System to the IBM Blue Gene Supercomputers; Ron Minnich, Jim McKie. To appear in the Proceedings of the International Conference on Supercomputing (ISC); June 2009. • System Support for Many Task Computing; Eric Van Hensbergen and Ron Minnich. In the Proceedings of the Workshop on Many Task Computing on Grids and Supercomputers; November 2008. • Holistic Aggregate Resource Environment; Charles Forsyth, Jim McKie, Ron Minnich and Eric Van Hensbergen. In the ACM Operating Systems Review; January 2008. • Night of the Lepus: A Plan 9 Perspective on Blue Gene's Interconnects; Charles Forsyth, Jim McKie, Ron Minnich and Eric Van Hensbergen. In the proceedings of the second annual international workshop on Plan 9; December 2007 • Petascale Plan 9. USENIX 2007
  • 20. Next Steps • Infrastructure Scale Out • File Services • Command Execution • Alternate Internode Communication Models • Fail in place software RAS models • Applications (Linux binaries and native support) • Large Scale LINPACK Run • Explore Mantevo Application Suite • (http://software.sandia.gov/mantevo) • CMU Working on Native Quake port
  • 21. Acknowledgments • Computational Resources Provided by DOE INCITE Program. Thanks to the patient folks at ANL who have supported us bringing up Plan 9 on their development BG/P • Thanks to IBM Research Blue Gene team and the Kittyhawk Team for guidance and support.
  • 22. Questions? Discussion? http://www.research.ibm.com/hare
  • 24. IBM Research, Sandia National Labs, Bell Labs, and CMU 24 Systems Support for Many Task Computing 11/17/2008 (c) 2008 IBM Corporation
  • 25. IBM Research, Sandia National Labs, Bell Labs, and CMU Plan 9 Characteristics Kernel Breakdown - Lines of Code Architecture Specific Code BG/P: ~14,000 lines of code Portable Code Port: ~25,000 lines of code TCP/IP Stack: ~14,000 lines of code Binary Sizes 415k Text + 140k Data + 107k BSS 25 Systems Support for Many Task Computing 11/17/2008 (c) 2008 IBM Corporation
  • 26. IBM Research, Sandia National Labs, Bell Labs, and CMU Why not Linux? Not a distributed system Core systems inflexible VM based on x86 MMU Networking tightly tied to sockets & TCP/IP w/long call-path Typical installations extremely overweight and noisy Benefits of modularity and open-source advantages overcome by complexity, dependencies, and rapid rate of change Community has become conservative Support for alternative interfaces waning Support for large systems which hurts small systems not acceptable Ultimately a customer constraint FastOS was developed to prevent OS monoculture in HPC Few Linux projects were even invited to submit final proposals 26 Systems Support for Many Task Computing 11/17/2008 (c) 2008 IBM Corporation
  • 27. IBM Research, Sandia National Labs, Bell Labs, and CMU Everything Represented as File Systems Hardware System Application Devices Services Services Disk TCP/IP Stack DNS /dev/hda1 /net /net /arp /cs /udp /dns /dev/hda2 /tcp /clone /stats GUI /win /0 /clone /1 Network /ctl /0 /data /1 /listen /ctl /local / /dev/eth0 /remote data /status / refresh /2 Console, Audio, Etc. Process Control, Wiki, Authentication, Debug, Etc. and Service Control 27 Systems Support for Many Task Computing 11/17/2008 (c) 2008 IBM Corporation
  • 28. IBM Research, Sandia National Labs, Bell Labs, and CMU Plan 9 Networks Screen Phone PDA Smartphone Set Top Box ‫)‏‬ ‫)‏ )‏‬ Term Term Term Term Wifi/Edge Cable/DSL Internet LAN (1 GB/s) Network File CPU CPU Server Servers Servers Content Addressable Storage High Bandwidth (10 GB/s) Network 28 Systems Support for Many Task Computing 11/17/2008 (c) 2008 IBM Corporation
  • 29. IBM Research, Sandia National Labs, Bell Labs, and CMU Aggregation as a First Class Concept Local Service Proxy Service Aggregate Service Remote Service Remote Service Remote Service 29 Systems Support for Many Task Computing 11/17/2008 (c) 2008 IBM Corporation
  • 30. IBM Research, Sandia National Labs, Bell Labs, and CMU Issues of Topology 30 Systems Support for Many Task Computing 11/17/2008 (c) 2008 IBM Corporation
  • 31. IBM Research, Sandia National Labs, Bell Labs, and CMU File Cache Example Proxy Service Monitors access to remote file server & local resources Local cache mode Collaborative cache mode Designated cache server(s)‫‏‬ Integrate replication and redundancy Explore write coherence via “territories” ala Envoy Based on experiences with Xget deployment model Leverage natural topology of machine where possible. 31 Systems Support for Many Task Computing 11/17/2008 (c) 2008 IBM Corporation
  • 32. IBM Research, Sandia National Labs, Bell Labs, and CMU Monitoring Example Distribute monitoring throughout the system Use for system health monitoring and load balancing Allow for application-specific monitoring agents Distribute filtering & control agents at key points in topology Allow for localized monitoring and control as well as high-level global reporting and control Explore both push and pull methods of modeling Based on experiences with supermon system. 32 Systems Support for Many Task Computing 11/17/2008 (c) 2008 IBM Corporation
  • 33. IBM Research, Sandia National Labs, Bell Labs, and CMU Workload Management Example Provide file system interface to job execution and scheduling. Allows scheduling of new work from within the cluster, using localized as well as global scheduling controls. Can allow for more organic growth of workloads as well as top-down and bottom-up models. Can be extended to allow direct access from end- user workstations. Based on experiences with Xcpu mechanism. 33 Systems Support for Many Task Computing 11/17/2008 (c) 2008 IBM Corporation
  • 34. IBM Research, Sandia National Labs, Bell Labs, and CMU Right Weight Kernels Project (Phase I)‫‏‬ Motivation OS Effect on Applications Metric is based on OS Interference on FWQ & FTQ benchmarks. AIX/Linux has more capability than many apps need LWK and CNK have less capability than apps want Approach Customize the kernel to the application Ongoing Challenges Need to balance capability with overhead 34 Systems Support for Many Task Computing 11/17/2008 (c) 2008 IBM Corporation
  • 35. IBM Research, Sandia National Labs, Bell Labs, and CMU Why Blue Gene? Readily available large-scale cluster Minimum allocation is 37 nodes Easy to get 512 and 1024 node configurations Up to 8192 nodes available upon request internally FastOS will make 64k configuration available DOE interest – Blue Gene was a specified target Variety of interconnects allows exploration of alternatives Embedded core design provides simple architecture that is quick to port to and doesn't require heavy weight systems software management, device drivers, or firmware 35 Systems Support for Many Task Computing 11/17/2008 (c) 2008 IBM Corporation