Семинар «Использование современных информационных технологий для решения современных задач физики частиц» в московском офисе Яндекса, 3 июля 2012
Ulrik Egede, Imperial College
2. Yandex meeting July 2012
The issues involved
● Analysis is in some sense a long data reduction
Trigger
Creation of “group” datasets
Data selection
Fitting
Paper!
Ulrik Egede
3. Yandex meeting July 2012
The issues involved
● Analysis is in some sense a long data reduction
Trigger
Creation of “group” datasets
Data selection
Fitting
This is the
tricky one
Paper!
Ulrik Egede
4. Yandex meeting July 2012
Distributed analysis
● All the LHC experiments
rely on a distributed
analysis model
● Data available for
analysis will be located at
either Tier 1 or Tier 2
sites across the globe
●
● Grid tools are used for performing analysis
● Only a single “sign-on” with Grid certificate
● No remote logins
Ulrik Egede
5. Yandex meeting July 2012
The Ganga User Interface
A fully programmable interface for the processing of data
● Debug code locally, progress to small analysis in batch
farms, run full analysis on Grid
● All done by a one line change of job specification
Configure once – run anywhere
Ulrik Egede
6. Yandex meeting July 2012
Centralised or individualised?
● Centralised data reduction or full scale analysis in
“analysis trains”.
● Easy to deal with from an execution point of view
● Takes time to organise
● Potential waiting time for physicists
Individual physicists submit jobs in “chaotic” manner
● Many, often inexperienced, users attempt large scale data
processing
● Physicists in charge of when and how to perform analysis
Risk of lower efficiency
● LHCb rely on a combination of the above
● Ganga deals (mainly) with the individual aspect
Ulrik Egede
7. Yandex meeting July 2012
A typical work flow for an analysis
Develop user code
Check architecture
Build code
Find data Copy test data locally
Divide up data Debug test code
Submit to sites X,Y,Z Days of effort lost
just in keeping
Keep track of it
track of things
Extract physics
Ulrik Egede
8. Yandex meeting July 2012
A optimised work flow for analysis
Develop user code
Check architecture
Build code
Find data Copy test data locally
Divide up data Debug test code
Submit to sites X,Y,Z Ganga
automatise as
Keep track of it
much as possible
Extract physics
Ulrik Egede
9. Yandex meeting July 2012
The Ganga User Interface
● An analysis process
defined through a set of
building blocks forming a
“job”.
● All building blocks provided
as plugins
● Easy to write your own
●
● Programmable through integration with the Python language
●
lumi = 0
for j in jobs.select(name='Higgs'):
if j.status=='completed':
lumi += j.luminosity()
elif j.status=='failed':
j.resubmit()
print 'Processed %d fb^-1 so far'% lumi
Ulrik Egede
10. Yandex meeting July 2012
Usage outside LHCb
Less information is available about external usage
● Many other smaller HEP projects
Super B-factory, BES-III collaboration, Lattice QCD, SNO,
T2K
● Other science
● Cryptography, Flu virus searches, Water table modelling, ...
● Some commercial projects funded initially from research
council schemes
● Image classification, protein folding, Amazon EC2 usage
●
Ulrik Egede
11. Yandex meeting July 2012
Ganga developments
● Ganga has since its inception been a GridPP controlled
project.
● Future require
● Follow up on developments in usage within core areas
● User support
● Inclusion of new paradigms
● Efficient usage for analysis on multi-core machines
● Abstraction of data input and output in same way as already
done for CPU resources.
● Continued outreach and documented scientific output
● Current standard documentation is “Ganga: J. Moscicki et
al,Comp. Phys. Comm., 180:11, (2009)”
● Ganga released under GPL to maximise impact
Ulrik Egede