Boost Fertility New Invention Ups Success Rates.pdf
ACAT 2019: A hybrid deep learning approach to vertexing
1. A hybrid deep learning approach to vertexing
Rui Fang1
Henry Schreiner1, 2
Mike Sokoloff1
Constantin Weisser3
Mike Williams3
March 11, 2019
1
The University of Cincinnati
2
Princeton University
3
Massachusetts Institute of Technology
ACAT 2019
Supported by:
2. 0 5 10 15 20 25 30 35 40 45 50 55 60
# LHCb long tracks
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Efficiency
Found 103002 of 109733 (eff 93.87%)
False positive rate = 0.251 per event
Asymmetric cost function
Found 96616 of 109733 (eff 88.05%)
False positive rate = 0.0485 per event
Symmetric cost function
Events in sample = 20K
Training sample = 240K
0 5 10 15 20 25 30 35 40 45 50 55 60
# LHCb long tracks
102
103
104
PVs
1/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
3. Tracking in the LHCb upgrade Introduction
The changes
• 30 MHz software trigger
• 7.6 PVs per event (Poisson distribution)
• Roughly 5.5 visible PVs per event
The problem
• Much higher pileup
• Very little time to do the tracking
• Current algorithms too slow
We need to rethink our algorithms from the ground up...
2/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
4. Vertices and tracks Introduction
Vertices
• Events contain ≈ 7 Primary Vertices
(PVs)
A PV should contain 5+ long tracks
• Multiple Secondary Vertices (SVs) per
event as well
A SV should contain 2+ tracks
Beams
PV
Track
SV
Adapt to machine learning?
• Sparse 3D data (41M pixels) → rich 1D data
• 1D convolutional neural nets
• Highly parallelizable, GPU friendly
• Opportunities to visualize learning process
3/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
5. A hybrid ML approach Introduction
Tracking Kernel generation Make predictions
CNNs
Interpret results
Truth Training
Validation
Machine learning features (so far)
• Prototracking converts sparse 3D dataset to feature-rich 1D dataset
• Easy and effective visualization due to 1D nature
• Even simple networks can provide interesting results
What follows is a proof of principle implementation for finding PVs.
4/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
6. Kernel generation Design
Tracking procedure
• Hits lie on the 26 planes
• For simplicity, only 3 tracks shown
z axis (along the beam)
x PV
5/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
7. Kernel generation Design
Tracking procedure
• Hits lie on the 26 planes
• For simplicity, only 3 tracks shown
• Make a 3D grid of voxels (2D shown)
• Note: only z will be fully calculated and
stored
z axis (along the beam)
x PV
5/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
8. Kernel generation Design
Tracking procedure
• Hits lie on the 26 planes
• For simplicity, only 3 tracks shown
• Make a 3D grid of voxels (2D shown)
• Note: only z will be fully calculated and
stored
• Tracking (full or partial)
z axis (along the beam)
x PV
5/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
9. Kernel generation Design
Tracking procedure
• Hits lie on the 26 planes
• For simplicity, only 3 tracks shown
• Make a 3D grid of voxels (2D shown)
• Note: only z will be fully calculated and
stored
• Tracking (full or partial)
• Fill in each voxel center with Gaussian PDF
z axis (along the beam)
x PV
5/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
10. Kernel generation Design
Tracking procedure
• Hits lie on the 26 planes
• For simplicity, only 3 tracks shown
• Make a 3D grid of voxels (2D shown)
• Note: only z will be fully calculated and
stored
• Tracking (full or partial)
• Fill in each voxel center with Gaussian PDF
• PDF for each (proto)track is combined
z axis (along the beam)
x PV
5/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
11. Kernel generation Design
Tracking procedure
• Hits lie on the 26 planes
• For simplicity, only 3 tracks shown
• Make a 3D grid of voxels (2D shown)
• Note: only z will be fully calculated and
stored
• Tracking (full or partial)
• Fill in each voxel center with Gaussian PDF
• PDF for each (proto)track is combined
• Fill z "histogram" with maximum KDE value
in xy
z axis (along the beam)
x PV
5/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
12. Example of z KDE histogram Design
100 50 0 50 100 150 200 250 300
z values [mm]
0
500
1000
1500
2000
DensityofKernel
Kernel
LHCb PVs
Other PVs
LHCb SVs
Other SVs
Note: All events from toy detector simulation
Human learning
• Peaks generally correspond to PVs and SVs
Challenges
• Vertex may be offset from peak
• Vertices interact
6/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
13. Target distribution Design
Build target distribution
• True PV position as the mean of Gaussian
• σ (standard deviation) is 100 µm (simplification)
• Fill bins with integrated PDF within ±3 bins (±300 µm )
7/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
15. Cost function Design
10 6 10 5 10 4 10 3 10 2 10 1 100
yhat
0
10
20
30
40
50
60
cost
0.0 0.2 0.4 0.6 0.8
yhat
0
5
10
15
20
25
30
cost
Asym. Cost for y = 0.10
Symm. Cost for y = 0.10
Asym. Cost for y = 0.30
Symm. Cost for y = 0.30
Asym. Cost for y = 1e-5
Symm. Cost for y = 1e-5
0.2 0.4 0.6 0.8 1.0
yhat
0
2
4
6
8
10
cost
Approach
• Symmetric cost function: low FP but low efficiency
• Adding asymmetry term controls trade-off for FP vs. efficiency
9/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
16. Compare predictions with targets: When it works Results
0
200
400
600
800
1000
1200
KernelDensity
True: 48.904 mm
Pred: 48.954 mm
: 50 µm
Event 0 @ 48.9 mm: PV found
Kernel Density
47.00 48.00 49.00 50.00 51.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
PV found example
0
50
100
150
200
KernelDensity
Pred: 0.976 mm
Event 0 @ 1.0 mm: Masked
Kernel Density
-1.00 0.00 1.00 2.00 3.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
Masked (<5 tracks) example
10/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
17. Compare predictions with targets: When it fails Results
0
50
100
150
200
250
KernelDensity
Pred: 65.696 mm
Event 2 @ 65.7 mm: False positive
Kernel Density
64.00 65.00 66.00 67.00 68.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
False Positive example
0
100
200
300
400
500
KernelDensity
True: 51.898 mm
Event 3 @ 51.9 mm: PV not found
Kernel Density
50.00 51.00 52.00 53.00 54.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
PV not found example
11/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
18. False Positive and efficiency rates Results
88 89 90 91 92 93 94
Efficiency [%]
0.05
0.10
0.15
0.20
0.25FPperevent
Symm cost
Most asymm cost
88 89 90 91 92 93 94
Efficiency [%]
10 1
6×10 2
2×10 1
FPperevent
Symm cost
Most asymm cost
Search for PVs
• Search ±5 bins (±500µm) around a true PV
• At least 3 bins with predicted probability > 1% and
integrated probability > 20%.
Tunable efficiency vs. FP
• The asymmetry parameter
controls FP vs. efficiency
12/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
19. Conclusions and plans Future plans
0 5 10 15 20 25 30 35 40 45 50 55 60
# LHCb long tracks
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Efficiency
• Proof-of-Principle established: a hybrid ML algorithm
using a 1-dimensional KDE processed by a 5-layer CNN
finds primary vertices with efficiencies and false positive
rates similar to traditional algorithms.
• Efficiency is tunable; increasing the efficiency also
increases the false positive rate.
• Adding information should improve performance.
• can add KDE (x,y) information to algorithm
• can associate tracks to PV candidates, then iterate.
• Next steps: train with full LHCb MC and deploy
inference engine in LHCb Hlt1 framework.
• Beyond LHCb
• approach might work for ATLAS and CMS;
• algorithm is an interesting ML laboratory.
13/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
20. Final words Future plans
Questions?
Source code:
• https://gitlab.cern.ch/LHCb-Reco-Dev/pv-finder
• Runnable with Conda on macOS and Linux
Run: conda env create -f environment-gpu.yml
Python 3.6+ and PyTorch used for machine learning code
Generation now available too using the new Conda-Forge
ROOT and Pythia8 packages
Supported by:
• NSF OAC-1836650:
IRIS-HEP
• NSF OAC-1740102:
SI2:SSE
• NSF OAC-1739772:
SI2:SSE
14/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
21. More predictions with targets (1) Backup
0
500
1000
1500
2000
KernelDensity
True: 114.622 mm
Pred: 114.597 mm
: -26 µm
Event 2 @ 114.6 mm: PV found
Kernel Density
113.00 114.00 115.00 116.00 117.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
0
100
200
300
400
500
KernelDensity
True: 197.461 mm
Pred: 197.396 mm
: -65 µm
Event 5 @ 197.4 mm: PV found
Kernel Density
195.00 196.00 197.00 198.00 199.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
15/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
22. More predictions with targets (2) Backup
0
50
100
150
200
KernelDensity
True: 221.595 mm
Pred: 221.546 mm
: -49 µm
Event 5 @ 221.5 mm: PV found
Kernel Density
219.00 220.00 221.00 222.00 223.00 224.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
0
200
400
600
800
1000
1200
1400
1600
KernelDensity
True: 36.068 mm
Pred: 36.400 mm
: 332 µm
Event 6 @ 36.1 mm: PV found
Kernel Density
34.00 35.00 36.00 37.00 38.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
16/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
23. More predictions with targets (3) Backup
0
200
400
600
800
1000
1200
1400
1600
KernelDensity
True: 129.336 mm
Pred: 129.337 mm
: 1 µm
Event 6 @ 129.3 mm: PV found
Kernel Density
127.00 128.00 129.00 130.00 131.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
0
500
1000
1500
2000
KernelDensity
True: 143.224 mm
Pred: 143.199 mm
: -25 µm
Event 6 @ 143.2 mm: PV found
Kernel Density
141.00 142.00 143.00 144.00 145.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
17/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
24. More predictions with targets (4) Backup
0
50
100
150
200
250
300
350
400
KernelDensity
True: 150.650 mm
Pred: 150.416 mm
: -234 µm
Event 6 @ 150.4 mm: PV found
Kernel Density
148.00 149.00 150.00 151.00 152.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
0
500
1000
1500
2000
2500
KernelDensity
True: 179.560 mm
Pred: 179.591 mm
: 31 µm
Event 6 @ 179.6 mm: PV found
Kernel Density
178.00 179.00 180.00 181.00 182.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
18/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
25. The VELO Backup
Tracks
• Originate from vertices (not shown)
• Hits originate from tracks
• We only know the true track in simulation
• Nearly straight, but tracks may scatter in material
The VELO
• A set of 26 planes that detect tracks
• Tracks should hit one or more pixels per plane
• Sparse 3D dataset (41M pixels)
19/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019
26. Questions for other experiments Backup
• Beam width (x, y): 40 µm for LHCb, what is yours?
• Transverse resolution: 5–15 µm for LHCb depending on number of tracks, what is yours?
• Longitudinal resolution: 40–100 µm for LHCb depending on number of tracks, what is
yours?
• Cleaning up prototracks based on IP could simplify kernel
• Can prototracking be done in the triggers?
20/14Fang, Schreiner, Sokoloff, Weisser, Williams
A hybrid DL approach to vertexing
March 11, 2019