1. 1
Deep Learning for Fast Simulation
HNSciCloud M-PIL-3.2 meeting
June 2018
S. Vallecorsa F.Carminati G. Khattak
2. 2
Our objective
• Activities on-going to speedup Monte Carlo techniques
• Not enough to cope with HL-LHC expected needs
• Current fast simulation solutions are detector dependent
• A general fast simulation tool based on Machine
Learning/Deep Learning
• Optimizing training time becomes crucial
Improved, efficient and accurate fast simulation
2
3. 3
Requirements
Precise simulation results
Detailed validation process
A fast inference step
Generic customizable tool
Easy-to-use and easily extensible framework
Large hyper-parameters scans and meta-optimisation:
Training time under control
Scalability
Possibility to work across platforms
3
4. 4
Generator G generates data from random noise
Discriminator D learns how to distinguish real data
from generated data
4
Simultaneously train two networks that compete and cooperate with each other
Generative adversarial networks
arXiv:1406.2661v1
Image source:
The (blind) counterfeiter/detective case
Counterfeiter shows the Monalisa
Detective says it is fake and gives feedback
Counterfeiter makes new Monalisa based on feedback
Iterate until detective is fooled
https://arxiv.org/pdf/1701.00160v1.pdf
5. 5
Generated images
Interpret detector output as a 3D image
5
GAN generated electron
shower
Y moment (width)
Average shower
section
3D convolutional GAN generate realistic detector output
Customized architecture (includes auxiliary regression tasks)
Agreement to standard Monte Carlo in terms of physics is remarkable!
Energy fraction measured by the calorimeter
on Caltech ibanks GPU cluster thanks to Prof M. Spiropulu
6. 6
Distributed training is needed
Inference:
Monte Carlo: 17 s/particle vs 3DGAN: 7 ms/particle
è speedup factor > 2500 on CPU!!
Training:
45 min/epoch on a NVIDIA P100
Introduce data parallel training using mpi-learn
(Elastic Averaging Stochastic Gradient Descent)
Computing performance
Calorimeter energy
response:
GAN prediction stays
stable through 20
nodes!
Strong scaling measured
at CSCS Swiss National
Super Computing Center
(J-R. Vlimant)
Time to create an electron shower
Method Machine
Time/Shower
(msec)
Full Simulation
(geant4)
Intel Xeon Platinum
8180
17000
3d GAN
(batch size 128)
Intel Xeon Platinum
8180
7
3d GAN
(batchsize 128)
P100 0.04
7. 7
DL with the HNSciCloud
First tests during prototype (2017)
Single GPU training benchmark ( RHEA, T-Systems,
IBM)
P100 (RHEA - Exoscale) vs K80 (IBM)
Current tests
MPI based distributed training (ssh/TCP)
Local input storage
Single GPU per node
Comparison to HPC environment
Trials with HTCondor on Exoscale cloud (5 VMs)
(still under investigation) 2
2 P100 T-Systems
(CSCS)
8. 8
Next steps
Continue with tests/optimisation:
• Schedulers (SLURM)
• Input storage options
• GPU/node configuration
• Possibility to combine GPUs from different resources
Additional GPUs are needed
First results are very promising
8