This presentation provides a detailed insight on the internal working of the glideinWMS Frontend. Part of the glideinWMS Training session held in Jan 2012 at UCSD.
glideinWMS Frontend Internals - glideinWMS Training Jan 2012
1. glideinWMS Training @ UCSD
glideinWMS frontend
Internals
by Igor Sfiligoi (UCSD)
UCSD Jan 17th 2012 Frontend Internals 1
2. Refresher - Glideins
● A glidein is just a properly configured Condor
execution node submitted as a Grid job
● glideinWMS Central manager
provides glidein
Execution node
Collector CREAM
automation glidein
Execution node
Negotiator
Submit node
Submit node
glidein
Execution node
Submit node
Execution node
glidein
Schedd Startd
Globus
Job
glideinWMS
UCSD Jan 17th 2012 Frontend Internals 2
3. Refresher – Glidein Frontend
● The frontend monitors the user Condor pool,
does the matchmaking and requests glideins
● Factory a slave Configure Condor G.N.
Submit node
Frontend node Worker node
Monitor Submit node
Frontend Condor glidein
Central manager
Startd
Match
Globus Job
Request
glideins Factory node
Condor glidein
Execution node
CREAM
Factory glidein
Execution node
Submit
glideins
UCSD Jan 17th 2012 Frontend Internals 3
4. Refresher - Cardinality
● N-to-M relationship
● Each Frontend can talk to many Factories
● Each Factory may serve many Frontends
VO Frontend
VO Frontend Glidein Factory Collector
Schedd
Negotiator
Collector Startd
Startd
Schedd
User job User job
Negotiator
Startd
Glidein Factory User job
UCSD Jan 17th 2012 Frontend Internals 4
5. Frontend architecture
● The frontend is composed of:
● The Condor daemons
● The glideinWMS frontend proper
● Condor client – to talk to the factories
● Web server – deliver code and data to glideins
+ monitoring
● The glideinWMS frontend itself composed of:
● Group processes – do the real work
● Master frontend – controls the others and
aggregates monitoring
UCSD Jan 17th 2012 Frontend Internals 5
6. Frontend arch - Picture
Frontend Domain
Factory Submit node
Submit node Factory
Central manager
Frontend node
Group
Entry ... Group glidein
Spawn Web
Server
Frontend
UCSD Jan 17th 2012 Frontend Internals 6
7. Condor processes
● Explained in enough detail in previous talk
● Will not repeat myself
Central manager
Collector
Negotiator
Submit node
Submit node
Submit node
Schedd
UCSD Jan 17th 2012 Frontend Internals 7
8. Frontend processes
● Real work performed by Group process
● glideinFrontendElement.py
● One process x Group Frontend ==
Frontend Group
in the rest of the talk
● They are controlled by master Frontend
● glideinFrontend.py
● Starts the other processes
● Aggregates monitoring
UCSD Jan 17th 2012 Frontend Internals 8
9. Frontend role
● The VO frontend is the brain
of a glideinWMS-based pool
● Like a site-level “negotiator”
VO domain Find Find
Submit node idle jobs entries
Monitor Submit node
Frontend Condor
Match
Central manager
Match
Request
Request glideins
Factory node glideins Factory node
UCSD Jan 17th 2012 Frontend Internals 9
10. Reminder - Two level matchmaking
● The frontend triggers glidein submission
● The “regular” negotiator matches jobs to glideins
Central manager
glidein
Execution node
Collector CREAM
glidein
Execution node
Negotiator
Submit node
Schedd glidein
Execution node
glidein
Execution node
Startd
Globus
Job
Frontend
Factory
UCSD Jan 17th 2012 Frontend Internals 10
11. Matchmaking logic
● The Frontend matchmaking policy is
implemented centrally
● By the VO admin – not by the users
● It can use the attributes from both
the job and Factory ClassAds
● Should be kept in sync with Negotiator policy
● Which is not centralized
● One way to define in the glidein START expression
● Unfortunately, one python expression other ClassAds
UCSD Jan 17th 2012 Frontend Internals 11
12. Example matchmaking logic
● Frontend
job.has_key("DESIRED_Sites") &&
glidein["attrs"].get("GLIDEIN_Site")
in job["DESIRED_Sites"].split(",")
● Negotiator (via glidein START)
GLIDECLIENT_Start =
stringListMember(GLIDEIN_Site,
DESIRED_Sites,",")=?=True
More details at http://tinyurl.com/glideinWMS/doc.prd/factory/custom_vars.html
UCSD Jan 17th 2012 Frontend Internals 12
13. Communication Protocol
● No listen sockets
● All communication one way (Frontend->Factory)
● Each Factory provides a Collector
● Communication based on ClassAds
● All security implemented in the Collector
● Use standard cmdline tools for communication
● condor_status and condor_advertise
UCSD Jan 17th 2012 Frontend Internals 13
14. Protocol sequence
● Polling loop
● Read Factory ClassAds from all factory Collectors
● Match against jobs
● Advertise own existence and requests
● Frontend sends 4 types of info
● Own identity
● Glidein submission regulation instructions
● Glidein parameters
● Pilot Proxy
UCSD Jan 17th 2012 Frontend Internals 14
15. Glidein submission regulation
● The glideinWMS glidein request logic
is based on the principle on “constant pressure”
● Frontend Group requests a certain number of
“idle glideins” in the factory queue at all times
● It does not request a specific number of glideins
● This is done due to the asynchronous nature
of the system
● Both the factory entries and the frontend groups are
in a polling loop and talk to each other indirectly
UCSD Jan 17th 2012 Frontend Internals 15
16. Glidein requests
● Frontend matches job attrs against entry attrs
● It then counts the matched idle jobs
● A fraction of this number becomes the
“pressure requests” (up to 1/3)
● This number is then capped (~20)
● The attribute in the ClassAd is
ReqIdleGlideins
● The Frontend also advertises
ReqMaxRunningGlideins
● Emergency break
UCSD Jan 17th 2012 Frontend Internals 16
17. Scaling back
● The Frontend can also request that existing
glideins in the Factory queues are removed
ReqRemoveExcess
● NO – Default, never remove
● WAIT – Remove any glidein not yet at a site
● IDLE – Remove any glidein that has not started yet
● ALL – Remove all glideins
● Frontend pretty conservative
● Only requests removal if no user jobs in the queues
UCSD Jan 17th 2012 Frontend Internals 17
18. Parameters
● Frontend can send attributes to glideins:
● Dynamically – as parameter in the ClassAd
● Statically – as entry in a config file
● Attributes typically static
● Current Frontend implementation does not really
have much support for dynamicity
UCSD Jan 17th 2012 Frontend Internals 18
19. Pilot proxy delegation
● Pilot proxy is encrypted with factory pub key
● Then published in the ClassAd
● Only owner of priv. key can decrypt it
Frontend node Factory node
Get key
Use
Frontend Collector proxy glidein
Deliver proxy
Entry Globus
(encrypted) glidein
● However
● Must make sure we are talking to a trusted Factory!
– not just anyone providing a pub key
● More details in a few slides
UCSD Jan 17th 2012 Frontend Internals 19
20. Pilot proxy selection
● A Frontend must have at least one pilot proxy
● But can have more than one
● Many proxies can be used for priority reasons
● When competing with non-pilot submission
● Want to have as many proxies as users served
● Proxy selection plugin based
UCSD Jan 17th 2012 Frontend Internals 20
21. Pilot proxy plugins
● Several standard plugins
● ProxyFirst – Only the first listed Most used
● ProxyAll – All listed
● ProxyUserCardinality – First N, with N=#users
● ProxyUserMapWRecycling – N, with pilot-to-user mapping
● VO admin could implemented his own, if desired
UCSD Jan 17th 2012 Frontend Internals 21
24. Security - Authorization
Authentication based
● Mutual authorization on GSI/x509
● The frontend admin decides
Frontend node
which Factories to talk to
● The factory admin decides Frontend
which Frontends to serve
● Based on x509 Dns
● Both sides have whitelists Factory node
Factory node Collector
Frontend node
Collector Factory
Frontend
Factory
Frontend needs a service proxy
UCSD Jan 17th 2012 Frontend Internals 24
25. Trusting the factory key
● It is all just ClassAds!
● Anyone can publish a ClassAd and declare to be a factory
● However, Factory Collector knows who published it
● And advertises it as the attribute AuthenticatedIdentity
● Cannot be faked by the client
a3
● Frontend has a whitelist b3
c3
of trusted factories ID3
Frontend
Collector
Frontend
a1 a2
b1 b2
c1 c2
ID1 ID2
Factory
UCSD Jan 17th 2012 Frontend Internals 25
26. Security handles
● As we said, mutual authentication with Factory
● Frontend provides (and Factory whitelists)
One set
● Service Proxy to talk to Factory Collector for whole
Frontend
● Frontend Security name (all Groups)
● Proxy Security Class One per pilot proxy
● Frontend whitelists (obtained from Factory admins)
● Factory Collector DN
● Own mapping @Factory One set per factory collector
● Factory mapping @Factory
UCSD Jan 17th 2012 Frontend Internals 26
27. Security within the VO domain
● Frontend process, Collector and schedds often
not on the same node
● Need network security
Could be even over WAN
● All processes must CMS setup has nodes
in CA, IL and Europe
whitelist each other
● Again, GSI based
Schedd
Monitor Schedd
Frontend Condor
Collector/Negotiator
UCSD Jan 17th 2012 Frontend Internals 27
29. Pointers
● The official project Web page is
http://tinyurl.com/glideinWMS
● glideinWMS development team is reachable at
glideinwms-support@fnal.gov
● OSG glidein factory at UCSD
http://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/OSGgfactory
http://glidein-1.t2.ucsd.edu:8319/glidefactory/monitor/glidein_Production_v4_1/factoryStatus.html
UCSD Jan 17th 2012 Frontend Internals 29
30. Acknowledgments
● The glideinWMS is a CMS-led project
developed mostly at FNAL, with contributions
from UCSD and ISI
● The glideinWMS factory operations at UCSD is
sponsored by OSG
● The funding comes from NSF, DOE and the
UC system
UCSD Jan 17th 2012 Frontend Internals 30