SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
Pilot Factory using
Schedd Glidein

Barnett Chiu
BNL
10.04.07
Problem to solve (1)
n  Pilot
   ¨  Probe the resource (http, environment,
      interpreter, other executables …etc)
   ¨  Pull jobs from remote server (e.g. Panda
      server)
   ¨  Matchmaking
         n    Group jobs in different categories
                E.g Production jobs, Analysis jobs (CHARMM …), Test jobs …
         n    Other criteria: Number of CPUs, RAM … etc
Problem to Solve (2)

n  Current   approach of pilot submissions
  ¨  Local pool : Vanilla
  ¨  Remote pool: Condor-G
n  Largeamounts of user jobs (production
  + analysis) ~ large amount of Condor-G
  pilot jobs ~ computational overhead on
  gatekeepers
  (e.g. large memory consumptions)
Solution

n  Is there any way to bypass GRAM to
    submit jobs to remote machines?
n  Local submissions, but how?
   ¨  We need something that continuously
      submit local pilot jobs on the gatekeeper
   ¨  Solution: Pilot Factory
Pilot Factory Overview
n    Pilot Factory is an application that combines
      the following ideas:
      ¨    schedd glidein
      ¨    pilot submission program (or pilot generator)
n    What is glidein?
      ¨     Mini-Condor pool on a remote machine
             n    A complete Condor pool has at least 5 components:
                   i.e. master, startd, schedd, collector, negotiator
             n    Glidein: {master, startd}, {master, schedd}, … etc
      ¨     Properly configured condor daemons submitted as
            batch job
Glidein (1)
n    Two major steps
      Condor-G #1: installation
         glidein setup script
            condor configuration file
            glidein startup script
         download Condor binaries (http, gsiftp …etc)
      Condor-G #2: execution
         exec glidein startup script à condor_master
Glidein (2)
                                                                  master
                                       ~/Condor_glidein            startd
                                           Startup script
  Tarball server                           Glidein config
                                          {master, schedd
                                                …}


               Central Manager

                   Collector                                                ?
                                                                master
                                                                 schedd




Submit Host                                    master
                      master       …
                       startd                   startd
 Master
                                                            Glidein types
  schedd                       Execute hosts                  master            master
                                                               schedd            startd
Schedd Glidein
n    Logics based on startd glidein (two Condor-G to set up )
n    Usage: By running glidein schedd on gatekeeper, the schedd then
             serves as a gateway between submit host and grid sites
n    Mini Condor pool with schedd functionalities:
      ¨    Submit host
      ¨    Maintain persistent queue of jobs
      ¨    Communicate with native batch system and forward user jobs
             n    Condor, PBS, LSF, …etc
      ¨    Manipulate job queues through the followoing commands:
             n    condor_submit,condor_rm, condor_q, condor_hold, condor_release, condor_prio
      ¨    Security Features (GSI)
             n    Who is authorized to set up Pilot Factory?
Schedd Glidein Example (1)

n    Command: // schedd glidein #1

      condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork
       gridgk01.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup
                                                Use fork since we want schedd
                                                     to be on gatekeeper!
n    Command: // schedd glidein #2
      condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork
      gridgk02.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup

n    Command: // schedd glidein # 3, #4, #5

      condor_glidein -count 3 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork
      nostos.cs.wisc.edu/jobmanager-fork -type schedd –forcesetup
Schedd Glidein Example (2)

Command: condor_status -schedd

Name           Machine   TotalRunningJobs TotalIdleJobs TotalHeldJobs

agrd0926@gridgk01.ra gridgk01.r           0       0        0
agrd0926@gridgk02.ra gridgk02.r           0       0        0
pleiades@gridui01.us gridui01.u           0       0        0
pleiades@ribera.cs.w ribera.cs.           0       0        0
pleiades@ron.cs.wisc ron.cs.wis           0       0        0
pleiades@vail.cs.wis vail.cs.wi           0       0        0

            TotalRunningJobs      TotalIdleJobs   TotalHeldJobs


          Total          0            0           0
Pilot Submission Program (Generator)

n    Communicate with a DB server that maintains
      information about pilot jobs
      ¨    E.g. pilot_type, pilot_queue
n    Pulls desired pilot script from an external
      server
n    Periodically submit pilot jobs (with pilot script
      as executable)
      ¨    condor_submit
      ¨    qsub? No, not necessary, since …
Build Pilot Factory with Glidein
     Grid Resource
                               n    Schedd glidein installed and executed on
                                     the gatekeeper
  JobManager                   n    User submit a Condor-C job with pilot
                                     generator as the executable
                                      ¨    Generator runs on the gatekeeper as a local
                 LSF                        universe job supervised by the glidein
                    PBS                     schedd
    master
                               n    Generator submits pilots
     schedd           schedd          ¨    Types, frequency adjustable by users
                                      ¨    Depending on the native batch system,
                                            pilots can be submitted as grid universe
       ~                                    jobs
                                      ¨    Along with GAHP and related binaries,
        Pilot generator
                                            schedd has the ability to communicate
                                            different batch systems
Pilot Factory
                                     master
                                      schedd
                                                                   Cluster Worker Nodes


                                        ~

                                   Pilot Factory
                 Connected to
                    Collector
                 Glidein request                   Submit Pilots


   Submit Node
(Collector, Master,                Gatekeeper with
Negotiator, Schedd)                {Globus, Condor|
                                       PBS|…}
Future Work
n    Integrating pilot with Condor startd to implement startd-based
      pilot
      ¨    the startd-based pilot retrieves the payload of a user job in the
            same way as does the generic pilot but in addition, it also inherits
            functionalities of Condor startd.
      ¨    Original intention was to run PFs with the startd-pilots on worker
            nodes (too greedy, unacceptable?)
      ¨    Using Condor started makes it easier to integrate with gLexec
n    Transform Generic PF (GPF) to Startd PF (SPF)
Reference

[1] Schedd Glidein
[2] Pilot Factory
[3] glideinWMS: An advanced application
                  on glideins

Mais conteúdo relacionado

Mais procurados

不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)Douglas Chen
 
Android RenderScript
Android RenderScriptAndroid RenderScript
Android RenderScriptJungsoo Nam
 
The details of CI/CD environment for Ruby
The details of CI/CD environment for RubyThe details of CI/CD environment for Ruby
The details of CI/CD environment for RubyHiroshi SHIBATA
 
Scripting Your Qt Application
Scripting Your Qt ApplicationScripting Your Qt Application
Scripting Your Qt Applicationaccount inactive
 
GDG Devfest 2019 - Build go kit microservices at kubernetes with ease
GDG Devfest 2019 - Build go kit microservices at kubernetes with easeGDG Devfest 2019 - Build go kit microservices at kubernetes with ease
GDG Devfest 2019 - Build go kit microservices at kubernetes with easeKAI CHU CHUNG
 
Java Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoJava Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoValeriia Maliarenko
 
Webinar: Building Embedded Applications from QtCreator with Docker
Webinar: Building Embedded Applications from QtCreator with DockerWebinar: Building Embedded Applications from QtCreator with Docker
Webinar: Building Embedded Applications from QtCreator with DockerBurkhard Stubert
 
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015curryon
 
Secure container: Kata container and gVisor
Secure container: Kata container and gVisorSecure container: Kata container and gVisor
Secure container: Kata container and gVisorChing-Hsuan Yen
 
Kernel Recipes 2014 - Quick state of the art of clang
Kernel Recipes 2014 - Quick state of the art of clangKernel Recipes 2014 - Quick state of the art of clang
Kernel Recipes 2014 - Quick state of the art of clangAnne Nicolas
 
06 - Qt Communication
06 - Qt Communication06 - Qt Communication
06 - Qt CommunicationAndreas Jakl
 
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
[COSCUP 2021] LLVM Project: The Good, The Bad, and The UglyMin-Yih Hsu
 
The Architecture of PicCollage Server
The Architecture of PicCollage ServerThe Architecture of PicCollage Server
The Architecture of PicCollage ServerLin Jen-Shin
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDBLinaro
 
OSDC 2015: Roland Kammerer | DRBD9: Managing High-Available Storage in Many-N...
OSDC 2015: Roland Kammerer | DRBD9: Managing High-Available Storage in Many-N...OSDC 2015: Roland Kammerer | DRBD9: Managing High-Available Storage in Many-N...
OSDC 2015: Roland Kammerer | DRBD9: Managing High-Available Storage in Many-N...NETWAYS
 
gcov和clang中的实现
gcov和clang中的实现gcov和clang中的实现
gcov和clang中的实现Ray Song
 
GDG Cloud Taipei: Meetup #52 - Istio Security: API Authorization
GDG Cloud Taipei: Meetup #52 - Istio Security: API AuthorizationGDG Cloud Taipei: Meetup #52 - Istio Security: API Authorization
GDG Cloud Taipei: Meetup #52 - Istio Security: API AuthorizationKAI CHU CHUNG
 
Translating Qt Applications
Translating Qt ApplicationsTranslating Qt Applications
Translating Qt Applicationsaccount inactive
 

Mais procurados (20)

不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)
 
Android RenderScript
Android RenderScriptAndroid RenderScript
Android RenderScript
 
The details of CI/CD environment for Ruby
The details of CI/CD environment for RubyThe details of CI/CD environment for Ruby
The details of CI/CD environment for Ruby
 
Scripting Your Qt Application
Scripting Your Qt ApplicationScripting Your Qt Application
Scripting Your Qt Application
 
GDG Devfest 2019 - Build go kit microservices at kubernetes with ease
GDG Devfest 2019 - Build go kit microservices at kubernetes with easeGDG Devfest 2019 - Build go kit microservices at kubernetes with ease
GDG Devfest 2019 - Build go kit microservices at kubernetes with ease
 
The Future of Qt Widgets
The Future of Qt WidgetsThe Future of Qt Widgets
The Future of Qt Widgets
 
Opal compiler
Opal compilerOpal compiler
Opal compiler
 
Java Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoJava Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey Kovalenko
 
Webinar: Building Embedded Applications from QtCreator with Docker
Webinar: Building Embedded Applications from QtCreator with DockerWebinar: Building Embedded Applications from QtCreator with Docker
Webinar: Building Embedded Applications from QtCreator with Docker
 
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
Bits of Advice for the VM Writer, by Cliff Click @ Curry On 2015
 
Secure container: Kata container and gVisor
Secure container: Kata container and gVisorSecure container: Kata container and gVisor
Secure container: Kata container and gVisor
 
Kernel Recipes 2014 - Quick state of the art of clang
Kernel Recipes 2014 - Quick state of the art of clangKernel Recipes 2014 - Quick state of the art of clang
Kernel Recipes 2014 - Quick state of the art of clang
 
06 - Qt Communication
06 - Qt Communication06 - Qt Communication
06 - Qt Communication
 
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
 
The Architecture of PicCollage Server
The Architecture of PicCollage ServerThe Architecture of PicCollage Server
The Architecture of PicCollage Server
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
 
OSDC 2015: Roland Kammerer | DRBD9: Managing High-Available Storage in Many-N...
OSDC 2015: Roland Kammerer | DRBD9: Managing High-Available Storage in Many-N...OSDC 2015: Roland Kammerer | DRBD9: Managing High-Available Storage in Many-N...
OSDC 2015: Roland Kammerer | DRBD9: Managing High-Available Storage in Many-N...
 
gcov和clang中的实现
gcov和clang中的实现gcov和clang中的实现
gcov和clang中的实现
 
GDG Cloud Taipei: Meetup #52 - Istio Security: API Authorization
GDG Cloud Taipei: Meetup #52 - Istio Security: API AuthorizationGDG Cloud Taipei: Meetup #52 - Istio Security: API Authorization
GDG Cloud Taipei: Meetup #52 - Istio Security: API Authorization
 
Translating Qt Applications
Translating Qt ApplicationsTranslating Qt Applications
Translating Qt Applications
 

Destaque (8)

Desarrollo de coleccion.
Desarrollo de coleccion.Desarrollo de coleccion.
Desarrollo de coleccion.
 
Ojos Propios en el Callao
Ojos Propios en el CallaoOjos Propios en el Callao
Ojos Propios en el Callao
 
Formatos generales
Formatos generalesFormatos generales
Formatos generales
 
бакалавр
бакалаврбакалавр
бакалавр
 
Presentation 8- Bfuture, Modela y Mcompare
Presentation 8- Bfuture, Modela y McomparePresentation 8- Bfuture, Modela y Mcompare
Presentation 8- Bfuture, Modela y Mcompare
 
Capfitogen side event gb5 wo_animation_nt
Capfitogen side event gb5 wo_animation_ntCapfitogen side event gb5 wo_animation_nt
Capfitogen side event gb5 wo_animation_nt
 
Presentación4 Nivelación DIVA-GIS
Presentación4 Nivelación DIVA-GISPresentación4 Nivelación DIVA-GIS
Presentación4 Nivelación DIVA-GIS
 
DIVA-GIS: A tool to link climate date and crop suitability – PGR management
DIVA-GIS: A tool to link climate date and crop suitability – PGR managementDIVA-GIS: A tool to link climate date and crop suitability – PGR management
DIVA-GIS: A tool to link climate date and crop suitability – PGR management
 

Semelhante a Pilot Factory

Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...Igor Sfiligoi
 
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
 glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM... glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...Igor Sfiligoi
 
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...Igor Sfiligoi
 
glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012Igor Sfiligoi
 
Plone deployment made easy
Plone deployment made easyPlone deployment made easy
Plone deployment made easyKim Chee Leong
 
glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012Igor Sfiligoi
 
glideinWMS Frontend Internals - glideinWMS Training Jan 2012
glideinWMS Frontend Internals - glideinWMS Training Jan 2012glideinWMS Frontend Internals - glideinWMS Training Jan 2012
glideinWMS Frontend Internals - glideinWMS Training Jan 2012Igor Sfiligoi
 
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012Igor Sfiligoi
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski buildacloud
 
ZFConf 2012: Zend Framework 2, a quick start (Enrico Zimuel)
ZFConf 2012: Zend Framework 2, a quick start (Enrico Zimuel)ZFConf 2012: Zend Framework 2, a quick start (Enrico Zimuel)
ZFConf 2012: Zend Framework 2, a quick start (Enrico Zimuel)ZFConf Conference
 
Zend Framework 2 quick start
Zend Framework 2 quick startZend Framework 2 quick start
Zend Framework 2 quick startEnrico Zimuel
 
How to Manage Cloud Infrastructures using Zend Framework
How to Manage Cloud Infrastructures using Zend FrameworkHow to Manage Cloud Infrastructures using Zend Framework
How to Manage Cloud Infrastructures using Zend FrameworkZend by Rogue Wave Software
 
A quick start on Zend Framework 2
A quick start on Zend Framework 2A quick start on Zend Framework 2
A quick start on Zend Framework 2Enrico Zimuel
 
Scaling docker with kubernetes
Scaling docker with kubernetesScaling docker with kubernetes
Scaling docker with kubernetesLiran Cohen
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practiceDocker, Inc.
 
DockerCon EU '17 - Dockerizing Aurea
DockerCon EU '17 - Dockerizing AureaDockerCon EU '17 - Dockerizing Aurea
DockerCon EU '17 - Dockerizing AureaŁukasz Piątkowski
 

Semelhante a Pilot Factory (20)

Glidein internals
Glidein internalsGlidein internals
Glidein internals
 
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
 
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
 glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM... glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
 
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
 
glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012
 
Plone deployment made easy
Plone deployment made easyPlone deployment made easy
Plone deployment made easy
 
glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012
 
glideinWMS Frontend Internals - glideinWMS Training Jan 2012
glideinWMS Frontend Internals - glideinWMS Training Jan 2012glideinWMS Frontend Internals - glideinWMS Training Jan 2012
glideinWMS Frontend Internals - glideinWMS Training Jan 2012
 
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
 
Get your teeth into Plack
Get your teeth into PlackGet your teeth into Plack
Get your teeth into Plack
 
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
 
ZFConf 2012: Zend Framework 2, a quick start (Enrico Zimuel)
ZFConf 2012: Zend Framework 2, a quick start (Enrico Zimuel)ZFConf 2012: Zend Framework 2, a quick start (Enrico Zimuel)
ZFConf 2012: Zend Framework 2, a quick start (Enrico Zimuel)
 
Zend Framework 2 quick start
Zend Framework 2 quick startZend Framework 2 quick start
Zend Framework 2 quick start
 
How to Manage Cloud Infrastructures using Zend Framework
How to Manage Cloud Infrastructures using Zend FrameworkHow to Manage Cloud Infrastructures using Zend Framework
How to Manage Cloud Infrastructures using Zend Framework
 
ZF2 Presentation @PHP Tour 2011 in Lille
ZF2 Presentation @PHP Tour 2011 in LilleZF2 Presentation @PHP Tour 2011 in Lille
ZF2 Presentation @PHP Tour 2011 in Lille
 
A quick start on Zend Framework 2
A quick start on Zend Framework 2A quick start on Zend Framework 2
A quick start on Zend Framework 2
 
Scaling docker with kubernetes
Scaling docker with kubernetesScaling docker with kubernetes
Scaling docker with kubernetes
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
DockerCon EU '17 - Dockerizing Aurea
DockerCon EU '17 - Dockerizing AureaDockerCon EU '17 - Dockerizing Aurea
DockerCon EU '17 - Dockerizing Aurea
 

Pilot Factory

  • 1. Pilot Factory using Schedd Glidein Barnett Chiu BNL 10.04.07
  • 2. Problem to solve (1) n  Pilot ¨  Probe the resource (http, environment, interpreter, other executables …etc) ¨  Pull jobs from remote server (e.g. Panda server) ¨  Matchmaking n  Group jobs in different categories E.g Production jobs, Analysis jobs (CHARMM …), Test jobs … n  Other criteria: Number of CPUs, RAM … etc
  • 3. Problem to Solve (2) n  Current approach of pilot submissions ¨  Local pool : Vanilla ¨  Remote pool: Condor-G n  Largeamounts of user jobs (production + analysis) ~ large amount of Condor-G pilot jobs ~ computational overhead on gatekeepers (e.g. large memory consumptions)
  • 4. Solution n  Is there any way to bypass GRAM to submit jobs to remote machines? n  Local submissions, but how? ¨  We need something that continuously submit local pilot jobs on the gatekeeper ¨  Solution: Pilot Factory
  • 5. Pilot Factory Overview n  Pilot Factory is an application that combines the following ideas: ¨  schedd glidein ¨  pilot submission program (or pilot generator) n  What is glidein? ¨  Mini-Condor pool on a remote machine n  A complete Condor pool has at least 5 components: i.e. master, startd, schedd, collector, negotiator n  Glidein: {master, startd}, {master, schedd}, … etc ¨  Properly configured condor daemons submitted as batch job
  • 6. Glidein (1) n  Two major steps Condor-G #1: installation glidein setup script condor configuration file glidein startup script download Condor binaries (http, gsiftp …etc) Condor-G #2: execution exec glidein startup script à condor_master
  • 7. Glidein (2) master ~/Condor_glidein startd Startup script Tarball server Glidein config {master, schedd …} Central Manager Collector ? master schedd Submit Host master master … startd startd Master Glidein types schedd Execute hosts master master schedd startd
  • 8. Schedd Glidein n  Logics based on startd glidein (two Condor-G to set up ) n  Usage: By running glidein schedd on gatekeeper, the schedd then serves as a gateway between submit host and grid sites n  Mini Condor pool with schedd functionalities: ¨  Submit host ¨  Maintain persistent queue of jobs ¨  Communicate with native batch system and forward user jobs n  Condor, PBS, LSF, …etc ¨  Manipulate job queues through the followoing commands: n  condor_submit,condor_rm, condor_q, condor_hold, condor_release, condor_prio ¨  Security Features (GSI) n  Who is authorized to set up Pilot Factory?
  • 9. Schedd Glidein Example (1) n  Command: // schedd glidein #1 condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk01.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup Use fork since we want schedd to be on gatekeeper! n  Command: // schedd glidein #2 condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk02.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup n  Command: // schedd glidein # 3, #4, #5 condor_glidein -count 3 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork nostos.cs.wisc.edu/jobmanager-fork -type schedd –forcesetup
  • 10. Schedd Glidein Example (2) Command: condor_status -schedd Name Machine TotalRunningJobs TotalIdleJobs TotalHeldJobs agrd0926@gridgk01.ra gridgk01.r 0 0 0 agrd0926@gridgk02.ra gridgk02.r 0 0 0 pleiades@gridui01.us gridui01.u 0 0 0 pleiades@ribera.cs.w ribera.cs. 0 0 0 pleiades@ron.cs.wisc ron.cs.wis 0 0 0 pleiades@vail.cs.wis vail.cs.wi 0 0 0 TotalRunningJobs TotalIdleJobs TotalHeldJobs Total 0 0 0
  • 11. Pilot Submission Program (Generator) n  Communicate with a DB server that maintains information about pilot jobs ¨  E.g. pilot_type, pilot_queue n  Pulls desired pilot script from an external server n  Periodically submit pilot jobs (with pilot script as executable) ¨  condor_submit ¨  qsub? No, not necessary, since …
  • 12. Build Pilot Factory with Glidein Grid Resource n  Schedd glidein installed and executed on the gatekeeper JobManager n  User submit a Condor-C job with pilot generator as the executable ¨  Generator runs on the gatekeeper as a local LSF universe job supervised by the glidein PBS schedd master n  Generator submits pilots schedd schedd ¨  Types, frequency adjustable by users ¨  Depending on the native batch system, pilots can be submitted as grid universe ~ jobs ¨  Along with GAHP and related binaries, Pilot generator schedd has the ability to communicate different batch systems
  • 13. Pilot Factory master schedd Cluster Worker Nodes ~ Pilot Factory Connected to Collector Glidein request Submit Pilots Submit Node (Collector, Master, Gatekeeper with Negotiator, Schedd) {Globus, Condor| PBS|…}
  • 14. Future Work n  Integrating pilot with Condor startd to implement startd-based pilot ¨  the startd-based pilot retrieves the payload of a user job in the same way as does the generic pilot but in addition, it also inherits functionalities of Condor startd. ¨  Original intention was to run PFs with the startd-pilots on worker nodes (too greedy, unacceptable?) ¨  Using Condor started makes it easier to integrate with gLexec n  Transform Generic PF (GPF) to Startd PF (SPF)
  • 15. Reference [1] Schedd Glidein [2] Pilot Factory [3] glideinWMS: An advanced application on glideins