2. Outline
1. Event Data Model and Analysis Model
• Where we are.
• Current issues.
2. What is EventView?
• The EventView EDM class.
• EventView framework.
• PhysicsView packages.
3. Where is EventView?
• What is the scope of EventView in PAT.
• The scope of EventView in DA.
• Current use cases.
4. Future developments
• Future developments in EDM
• Future developments in PANDA
2
3. Our Goal
• Is to do physics analysis to the level that we
can publish our result.
• Great deal of effort went in. Lots of trial and
error and it is still evolving very fast.
(Computing Service Commissioning, CSC is
one such place.)
• There are thousands of people trying to
analyse huge amount of data and we need
the right environment to do this.
• Physics analysis tools and distributed analysis
are trying to solve the problems.
3
4. Acronyms
• Athena: The Atlas Software (algorithms in C++, steering with Python). Used for Monte Carlo,
Simulation, Recosntruction, and Analysis. Also used by the High Level Trigger.
• RAW: Events as output by the Event Filter for reconstruction. The computing model assumes
it is only at Tier0 site (1.6MB/event at rate of 200Hz).
• CBNT: “ComBined NTuple” summarizes output of reconstruction. Read into Root, can’t be
read in Athena. Not part of the computing model.
• ESD: Event Summary Data, is the output of Reconstruction can be read into Athena. Only to
be stored at Tier1 sites. (500kB/event).
• AOD: Analysis Object Data, distributed to every one and meant for analysis, can be readin
Athena(100kB/event).
• DPD: Derived Physics Data, included in computing model, something analogous to an ntuple.
• Tags: Event-level metadata used to quickly select “interesting events” (1kB/event).
This talk focuses mainly on DPD production from AOD.
4
5. tier 0 RAW
Production
ESD
tier 1 EV Code
Production
AOD
Slim
AOD
Tag
tier 2 Athena Code
Athena Dump
Event View
Private Dump Private
CBNT EV Ntuple
Ntuple Analysis Ntuple
home (DPD)
impossible no standard standardized personalized
waste of resource common prod.
Less Analysis in ROOT
5
7. ! object usedused by reconstruction algorithm to carry state
! object by reconstruction algorithm to carry state
between various tools
Event Data Model
between various tools History of our data format and
Evolution: MigrationCBNT dumper to be dumped into
Evolution of information toCBNTto POOL dumped into
! carry all Workflow to the the dumper to history of our analysis.
! carry all information be
Evolution “flat” of Workflow
Evolution ntuple
a of Workflow
a “flat”Athena.exe
ntuple Root or paw/hbook
Ancient History
DC1
Athena.exe
Athena.exe Root
Athena Rootpaw/hbook
or or paw/hbook ROOT
Signal significance
H 5 !!
$ L dt = 30 fb!1 ttH (H 5 bb)
(no K!factors) H 5 ZZ
(*)
5 4l
(*)
ATLAS H 5 WW 5 l"l"
10 2 (*)
Event
DC1
qqH 5 qq WW
¯ Hadronic Top Mass qqH 5 qq ##
ttbar (MC NLO)
b
Before we used POOL to save Reconstructionthe input Event reconstruction
files, to
Total significance
Signal significance
H 5 !!
$ L dt = 30 fb!1
Signal significance
Entry
H 5 !! ttH (H 5 bb)
!1
$ L dt = 30 fb ttH (H 5 bb) (no K!factors)
W + Jets (Alpgen) H 5 ZZ
(*)
5 4l
Selection
(*)
(no K!factors) H 5 ZZ
(*)
5 4 l ATLAS H 5 WW 5 l"l"
300
¯
H 5 WW 2 5 l"l"
(*) (*)
Geant3
ATLAS 10
DC1
qqH 5 qq WW
q Event
10 2 (*) qqH 5 qq ##
DC1
qqH 5 qq WW
qqH 5 qq ## Total significance
250 10
Total significance
q
was a ZEBRA file, and the EDM (egamma, etc.) had two purposes:
Analysis
Selection Selection
Reconstruction
Reconstruction 200
Geant3 Geant3
W+ Reconstruction
l+ 150 10
t
10
ν 100
1
100 120 140 160 180
mH (GeV/c )
200
2
50
object used by reconstruction algorithm to carry state
b
!
1
100 120 140 160 180 200
1 0 120 2
100 0 50 140
100 160
150 180
200 250200 300 350 400 mH (GeV/c )
2
mH (GeV/c ) GeV
between various tools
root Zebra CBNT PostSc
! carry all root
information to the CBNT dumper to be dumped into
Zebra CBNTRoot
root Zebra
Zebra CBNT
CBNT PostScriptPostScript
PS
Evolution of Workflow
a “flat” ntuple
or paw/hbook or PyRoot from athena −i −i
Athena or PyRoot ROOTathena
Root from
DC2/Rome/CSC
Athena.py
Athena.exe Root
Much Athena.py current situation is related to
of our bour
of Root this historical
Much Athena.py ¯current situation is related to this historical
Root or PyRoot from athena −i
situation.
Signal significance
Hadronic TopHMass
Signal significance
ttbar (MC NLO) $ L dt = 30 fb!1 H 5 !!
situation.
5 !! ttH (H 5 bb)
$ L dt = 30 fb!1 ttH (H 5 bb) (no K!factors) (*)
H 5 ZZ 5 4l
(no K!factors) H 5 ZZ
(*)
5 4l
Entry
(*)
Signal significance
ATLAS H 5 WW 5 ll
¯
(*)
ATLAS H 5 WW 5 ll W + Jets (Alpgen) H 5 ! !
$ L dt =10 2fb!1
Signal significance
(*)
30
q Event Tag
10 2 (*) ttH (H 5 bb) qqH 5 qq WW
DC1
H 5 5 ! qq WW
qqH !
300dt = 30 fb!1
$L (no K!factors) (*) qqH 5 qq ##
AOD Event
DC2
ttH (H5 5 qq ##
qqH bb) H 5 ZZ 5 4l
(*)
(no K!factors) H 5 ZZ
(*)
5 4 l ATLAS H 5 WW 5 ll Total significance
Total significance
H 5 WW 2 5 ll
(*) (*)
ATLAS 10 qqH 5 qq WW
q AOD Tag Analysis Event
10 2 250 (*) qqH 5 qq ##
DC2
qqH 5 qq WW
Reconstruction AOD Tag Selection Event
qqH 5 qq ## Total significance
DC2
l+ Reconstruction
Reconstruction Builders Selection
Total significance
Geant3 Geant4
200
+
W Reconstruction
t
10 10
ν Reconstruction Builders Builders Selection Selection
150
Geant4 Geant4
10
Kyle Cranmer Cranmer (BNL)
Kyle (BNL) Analysis Model Workshop, October October 26, 2006
Analysis Model Workshop, 26, 2006
10
4
100
b 1
100
50
120 140 160 180 200 1
100 120 140 160 180 200
2
mH (GeV/c ) 2
0 1200
mH (GeV/c )
0 50 100 150 250
100 300
120 350
140 400 160 180 200
1 2
100 120 140 160 180 200 GeV mH (GeV/c )
2
mH (GeV/c )
EVNT Hits ESD/
Zebra SimulDigi CBNT Ntuple PS
PostScript
root Pool
pool SimulDigi
Pool
SimulDigi ESD ESD
AOD AOD RootTuple RootTuple
AOD RootTuple PostSc
pool pool ESD AOD PostScriptPostScript
pool pool
pool Tags Tags
Tags
Much Athena.py current situation is relatedRoot or PyRoot from athena −i
of our to this historical 7
8. • CBNT
• Flat ntuple with all information from reconstruction.
• No common look and feel.
• No official mechanism to share activity after reconstruction.
• AOD/ESD is nice!
• It has common interface to all objects.
• With it, we can do analysis using Athena.
• We need to publish and share analysis code for cross check and
reproduction of results.
• But why Athena?
• All of our reconstruction alg is in Athena.
• We’ll need to be able to re-run clustering, vertexing, track
extrapolation, calibration etc on AOD.
• How about Root
• We always need to use Root for our final analysis.
• Athena is good for event-level processing but not sample level.
• Two stage analysis
• We cannot do everything in one place. Some in Athena, rest in Root.
• Support flexibility in analysis. But what to do where?
8
9. Where to do Analysis?
Root lovers / Athena purists
physicists / computing specialists
1. Pre-physics analysis: reconstruction.
2. Baseline analysis: preselection overlap removal, further processing such
as jet re-calibration etc.
3. Event level analysis: object reconstruction, event variable calculation etc.
4. Sample level analysis: plot histogram, fitting templates, toy MC etc.
Athena Central Athena Analysis on Root Analysis
Production Distributed Ana Local or DA
9
10. Two-Stage Analysis A global view
Feedback / port analysis to Athena
Athena Analysis
Development on Lxplus Root Analysis
at Home
AOD Athena Ntuple Ntuple Histogram
DPD Production
on DA Publish
Monitor and retrieve
DA Submission
Feed back
Backnavigation
Re-Submission
AOD AOD Ath Ntupl
DA Submission
analysis
Scope of this workshop:
Athena analysis and distributed analysis
10
11. • Athena analysis should offer:
• The official place for common analysis.
• Should be easy enough for physicists to use!
• Simple mechanism for sharing code - modular analysis.
• Should be more than an ntuple dumper, need to do
“analysis”. (e.g. tune PID, re-calibrate)
• AOD object definition is loose (needs to be):
• It has overlapping objects (e.g. electrons and jets).
• It has loose object selection (e.g. no isolation cut).
• It has multiple algorithms (e.g. Staco/Muid muons).
• First step in analysis: Define your “view” of events:
• Do this by constructing EventView.
• Further baseline and event-level analysis using EventView
tools and modules.
• Produce Derived Physics Data, DPD (ntuple) for Root analysis:
• Private ntuple.
• Group standard DPD.
11
13. What is EventView?
EventView is a container which holds objects (or links to them) needed in physics analysis.
13
14. EventView Modular Analysis Framework
Calibrate Reconst.
AOD AOD Insert
Electron W and nu
Preselect BJet Calculate Truth Truth
AOD AOD Mt
Remove Calibrate
Elec Associate MET Match
Overlap MET
B-Jet Truth
AOD
Muon
Once we have defined our container, now analysis can be divided into pieces.
Each step of analysis divided into separate tools.
Each tool is made independent and general as much as possible.
Analysis information is propagated through EventView.
Each tool receives EventView and execute something on EventView.
Gradually build up EventView with more information.
14
15. EVToolLooper
Components of EV
EVTool (Inserter) Framework
EV
•
EVTool EVObjTool
(ObjLooper)
Obj
(ObjCalculator) EVToolLooper (or just looper)
EV
• Schedule all EV components.
Tool EVModule
• EVTools (or just Tools)
•
Obj
EV Tool
Smallest component in EV.
Tool
• Written in C++.
EV
• Mainly EV tool and EV obj tool.
•
Module
Tool Obj Tool Other types of tools too.
•
Tool
EV EVModules (or just Modules)
Tool • Set of tools and configurations.
• Written in Python.
DPD • Can be EV level or obj level.
15
16. Extending the Framework
• It is crucial that the framework is easily extendable.
• EventView framework is a collection of tools, extendibility is its
natural feature.
• Minimize work needed to add components.
• Subdivide usefulness of the tools into different types.
• Interface and functionality inherited through class hierarchy.
• Skeleton code generated for typical type of tools.
Generate skeleton
Just implement this function
16
18. Standard EventView Tool/Module Library
• EventViewBuilder (Container package)
• EventViewBuilderUtils: Base classes of all tools
• EventViewBuilderAlgs: Tool loopers
• EventViewConfiguration: High level configuration interface
• EventViewInserters: Tools for selection and overlap removal
• EventViewDumpers: Output to screen, ntuple, and atlantis
• EventViewUserData: Calculators and associators
• EventViewTrigger: Tools for reading trigger info
• EventViewPerformance: Performance study modules
• EventViewCombiners: Combinatorics tools
• EventViewSelectors: Event selection tools
List growing thanks to contributions from people
but we need more people to work on them.
18
19. EVToolLooper
Multiple View (Branch) An example EV analysis
Reco
•
Branch Tool
Electron Branch Tau Branch Tau Branch 2 One can define multiple
Electron Inserter Tau Inserter Tau Inserter
views of an event:
Muon Inserter Electron Inserter Electron Inserter
• e.g. use different jet in
each view.
Tau Inserter Muon Inserter Muon Inserter
EVToolLooper
•
Jet Inserter Jet Inserter Jet Inserter Truth
e.g. use different object
(cone04) (cone04) (cone07)
Vtx
selection in each view.
Filter Truth Inserter
•
Truth
Same analysis executed on
Analysis
Combinatorics Analysis
Selection
each EventView.
•
NtupleDumper - Truth
Each view can talk to each
KinematicCalc
TruthAssociator
other:
• compare views.
NtupleDumper - Electron Branch
Ntuple Output
Trees
•
NtupleDumper - Tau Branch
order views.
Truth
NtupleDumper - Tau Branch 2
•
Electron Tau Tau2
match views.
End
• Each view saved into separate
Root trees in ntuple.
19
20. Multiple View (Combinatorics) Another example analysis
EVSort
Unsorted Sorted
Comparator
EventView
Container EV0 EV1 EV0
EV0 EV0
(ex EV2)
Combination EV1
EventView EV1 EV1 Sort EV1 EV2
Tool (ex EV1)
EV2
EV2 EV2
(ex EV0)
• Multiple EV produced from combinatorics:
• e.g. take all combinations of dijets.
• e.g. combine one W and a b-jet to obtain top quark.
• Manipulation of multiple EV through Multi Input (MI) tools:
• e.g. sort EV according to the pt of reconstructed top.
• e.g. remove EVs not satisfying condition.
• Each view saved to separate tree or all views in one tree.
20
21. DPD Production
• EV analysis can easily produce custom ntuple with desired amount
of information.
• It is not just copying information but a lot more needs to be done:
• Selection of objects
• Slimming contents
• Adding derived variables
• Calibration / correction
• you name it.
• Various physics groups started making their own ntuple:
• They are based on the same tools but the format is slightly
different.
• Need common package to provide interface for producing DPD.
HighPtView.
• Provide same look and feel but still support difference in
contents.
• Central production of default ntuples. Good starting point and
reference point.
21
22. • HighPtView implements a set
HighPtView of configurations to
manipulate tools/modules in
EventViewBuilder.
EventView
EVBuilder • Provides common interface
EventView EventView EventView
Performance
for DPD building for all
Performance Configuration
PhysicsView packages.
Combiners
Study
•
EventView EventView Jet
BuilderUtils
Package
EventView
UserData
Electron/Photon Default physics contents
Muon
(selection cuts) come from
EventView EventView
BuilderAlgs Dumpers
Tau
EventView
Transformation
EventView
Selectors
EventView
Inserters
Flavour Tagging
performance study.
Tools/
Feedback • Physics groups may have their
own “view” packages (eg
Interface
Tools/ HighPtView
TopView) for optimization
Feedback
(override configuration) and
Interface
Optimization
extra functionality (eg tools
GroupView for analysis).
SUSYView TopView
HiggsTo
TauTau
HiggsTo
GamGam • Group view packages will
look similar inheriting
interface from HighPtView.
22
24. People Involved
• Core Developers: This is a VERY incomplete list of people.
• Kyle Cranmer (BNL) There are lots of other people contributing!
• Amir Farbin (CERN)
• Akira Shibata (QMUL)
• Package Developers:
• Attila Krasznahorkay (EventViewTrigger)
• Jamie Boyd (EventViewPerformance)
• Lorenzo Feligioni (Btagging)
• Liza Mijovic (Development for release 13)
• GroupArea management:
• Akira (Lxplus), Fabien Tarrade (BNL), Sandrine Laplace (Lyon)
• We have a lot of interaction with
• physics groups and performance groups
• distributed analysis and Athena core developers.
24
25. Where is EventView?
EventViewTools
EventView
Trigger HiggsTo HiggsTo
Gamma TauTau
EventViewCore Gamma
EventView TopView
Inserters
HighPt Physics
EventView View Groups
EventView BuilderAlgs
Performance
Groups EventView BuilderAlgs
Performance SUSY
EventView View Physics
Configuration Validation
EventView
Transformation Event Distributed
Analysis
View
EventView
UserData Central
Production
EventView
Combiners
EventView
Dumpers EventView
Selectors
Athena
Core
Developers
Analysis
EDM
Physics
Analysis
Tools
25
26. • Interaction with various groups is VERY important to maintain a
functional framework:
• Performance groups: Maintenance of relevant tools and input of
physics contents (object selection, common performance study.)
• EDM: Compatibility of the tools to developing EDM. Optimal
DPD format for the future
• Physics analysis tools: tool development and maintenance.
• Core developers: support for new feature request and help on
core development.
• Central production: production of HighPtView ntuple.
• Distributed analysis: support for EV analysis.
• Physics validation: validation on standard DPD and RTT on
Physics View packages.
• Physics groups: development of group tools and modules in
PhysicsVies packages.
• These things are just beginning to happen. More input and
collaboration is required for:
• Development and maintenance of PhysicsView packages.
• Standardised performance study and selection tools.
26
27. DA and EventView
EventView has a long experience with PANDA distributed framework because
It is highly integrated with data management tool (dq2).
It supports GroupArea and configurables.
It runs athena jobs seamlessly and no book keeping is necessary.
1. Local test with sample fetched with DQ2.
athena TopView_Top_jobOptions.py
2. Set up grid/DQ2.
source GridSetupLxplus.sh
3. Send job (replace “athena” by “pathena”):
pathena TopView_Top_jobOptions.py
--inDS csc11.005200.T1_McAtNlo_Jimmy.recon.AOD.v11004103
--outDS user.top_wg.TopView21.T1_McAtNlo_Jimmy.001
27
28. TopView Use Case
1. Develop analysis based on EventView tools under TopView
runtime package: Write C++ tools, write python modules
Athena and write job options.
Analysis
Developmnt
2. Run tests on lxplus with test samples fetched with DQ2
end user tool.
3. Commit changes to CVS and tag.
(iterate 1 and 2 until I’m happy; use HN if I need help)
Athena Ana. 4. Send PANDA job specifying input/output dataset and a few
Submission other parameters.
on DA 5. Check status on the web.
(if job failed, report Savannah bug and iterate 4 and 5)
6. Fetch results (AANTuple) using DQ2 end user tool and
Post-Athena analyze with ROOT locally.
Analysis
7. Go back to the AOD/ESD using the AANT and do further
analysis. (in future).
28
29. Two-stage Iterative Analysis Process Again!
1st stage: Define common optimal
preparation for a physics group. Includes
2nd stage: Interactive analysis in ROOT
with quick turnaround. Use TMVA, Roofit
selection, calibration, b-tagging. Use
and other tools outside framework.
modular analysis framework and
Produce final plots to be published. Back-
feedback/tools from performance study.
navigate through DA if necessary.
Parameterise using common tools and
python configuration.
Feedback / port analysis to Athena
Athena Analysis
Development on Lxplus Root Analysis
at Home
AOD Athena Ntuple Ntuple Histogram
DPD Production
on DA Publish
Monitor and retrieve
DA Submission
Feed back
Backnavigation
Re-Submission
AOD AOD Ath Ntupl
DA Submission
analysis
Group-wise central production on DA system.
29
31. • Faster AOD access: EDM in Release 13
• Factor of ~10 in I/O speed expected thanks to transient
persistent separation.
• Broadens the use case of Athena analysis.
• AOD access from root or structured DPD:
• Different technology now under test: pAOD, SAN, and new
Scott’s recipe.
• Broadens the use case of Root analysis and lowers the wall
between Athena and Root: higher portability of analysis code.
• We will still need common framework just like before!
EventView will write out structured format.
• ESD/AOD merger
• ESD / AOD classes are now merged: identical interface for ESD
and AOD analysis.
• Can use ESD based tools on AOD objects: e.g. particle
identification.
31
32. Future developments in PANDA
• Root analysis support on PANDA
• Using PROOF or otherwise
• Support for more sites with improvements in operation
• BNL has been fully operational for a long time.
• Added lcg sites: ANALY_CERN, ANALY_LYON,
ANALY_RAL, ANALY_FZK, ANALY_CNAF,
ANALY_PIC, ANALY_SARA
32