When trying to make auto-tuning practical using common infrastructure, public repository of knowledge, and machine
learning (cTuning.org), we faced a major problem with reproducibility of experimental results collected from multiple users. This was largely due to a lack of information about all software and hardware dependencies as well as a large variation of measured characteristics.
I will present a possible collaborative approach to solve aboveproblems using a new Collective Mind knowledge management system. This modular infrastructure is intended to preserve and share through Internet the whole experimental setups with all related artifacts and their software and hardware dependencies besides just performance data. Researchers can take advantage of shared components and data with extensible meta-description at http://c-mind.org/repo to quickly prototype and validate research techniques particularly on software and hardware optimization and co-design. At the same time, behavior anomalies or model mispredictions can be exposed in a reproducible way to interdisciplinary community for further analysis and improvement. This approach supports our new open publication model in computer engineering where all results and artifacts are continuously shared and validated by the community (c-mind.org/events/trust2014).
This presentations supports our recent publication:
* http://iospress.metapress.com/content/f255p63828m8l384
* http://hal.inria.fr/hal-01054763
Davis plaque method.pptx recombinant DNA technology
Â
Collective Mind: bringing reproducible research to the masses
1. Collective Mind:
bringing reproducible research to the masses
Grigori Fursin
POSTALE, INRIA Saclay, France
INRIA-Illinois-ANL Joint Laboratory Workshop
France, June 2014
2. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 2
Challenge:
How to design next generation of faster, smaller, cheaper, more power
efficient and reliable computer systems (software and hardware)?
Long term interdisciplinary vision:
• Share code and data in a reproducible way along with publications
• Use big data analytics to program optimization, run-time adaptation and
architecture co-design
• Bring interdisciplinary community together to validate experimental
results, ensure reproducibility, improve optimization predictions
Message
Continuously validated in industrial
projects with Intel, ARM, IBM, CAPS,
ARC (Synopsys), STMicroelectronics
3. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 3
• Motivation: general problems in computer engineering
• cTuning: big-data driven program optimization and architecture
co-design and encountered problems
• Collective Mind: collaborative and reproducible research and
experimentation in computer engineering
• Reproducibility as a side effect
• Conclusions and future work
Talk outline
4. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 4
Available solutions
Result
Application
Compilers
Binary and libraries
Architecture
Run-time environment
State of the system Data set
Algorithm
End
User
task
End users require faster, smaller and more power efficient systems
Storage
5. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 5
GCC optimizations
Result
End
User
task
Delivering optimal solution is non-trivial
Fundamental problems:
1) Too many design and optimization choices
at all levels
2) Always multi-objective optimization:
performance vs compilation time vs code
size vs system size vs power consumption vs
reliability vs return on investment
3) Complex relationship and interactions
between ALL software and hardware
components
Empirical auto-tuning is too time consuming,
ad-hoc and tedious to be a mainstream!
6. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 6
Combine auto-tuning with machine learning and crowdsourcing
Plugin-based
MILEPOST GCC
Plugins
Monitor and explore
optimization space
Extract semantic
program features
cTuning.org: plugin-based
auto-tuning framework
and public repository
Program or
kernel1
Program or
kernel N
…
Training
Unseen
program
Prediction
MILEPOST GCC
Plugins
Collect dynamic features
Cluster
Build predictive model
Extract semantic
program features
Collect hardware counters
Predict optimization
to minimize
execution time,
power consumption,
code size, etc
• G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011
•G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010
•G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009
• F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006
7. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 7
• G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011
•G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010
•G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009
• F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006
Plugin-based
MILEPOST GCC
Plugins
Monitor and explore
optimization space
Extract semantic
program features
cTuning.org: plugin-based
auto-tuning framework
and public repository
Program or
kernel1
Program or
kernel N
…
Training
Unseen
program
Prediction
MILEPOST GCC
Plugins
Collect dynamic features
Cluster
Build predictive model
Extract semantic
program features
Collect hardware counters
Predict optimization
to minimize
execution time,
power consumption,
code size, etc
In 2009, we opened public repository of knowledge
(cTuning.org) and managed to automatically tune
customer benchmarks and compiler heuristics
for a range of real platforms
from IBM and ARC (Synopsis)
Now becomes a mainstream -
everything is solved?
Combine auto-tuning with machine learning and crowdsourcing
8. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 8
Technological chaos
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS 2013
XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA 4.x
gprofprof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
predictive
scheduling
algorithm-
level TBB
MKL
ATLAS
program-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO threads
process
pass
reordering
KNN
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB ISA
memory size
ARM v6
threads
execution time
reliability
GCC 4.8.x
LLVM 3.4
SVM
genetic
algorithms
We also experienced a few problems
ARM v8
Intel SandyBridge
SSE4
AVX
• Difficulty to reproduce results collected
from multiple users (including variability
of performance data and constant
changes in the system)
• Difficulty to reproduce and validate
already existing and related techniques
from existing publications (no full specs
and dependencies)
• Lack of common, large and diverse
benchmarks and data sets
• Difficult to expose choices and extract
features(tools are not prepared for auto-
tuning and machine learning)
• Difficult to experiment
CUDA 5.x
SimpleScalar
algorithm precision
9. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 9
Technological chaos
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS 2013
XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA 4.x
gprofprof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
predictive
scheduling
algorithm-
level TBB
MKL
ATLAS
program-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO threads
process
pass
reordering
KNN
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB ISA
memory size
ARM v6
threads
execution time
reliability
GCC 4.8.x
LLVM 3.4
SVM
genetic
algorithms
We also experienced a few problems
ARM v8
Intel SandyBridge
SSE4
AVX
• By the end of experiments, new tool
versions are often available;
• Common life span of experiments and
ad-hoc frameworks - end of MS or PhD
project;
• Researchers often focus on publications
rather than practical and reproducible
solutions
• Since 2009 asking community to share
code, performance data and all related
artifacts (experimental setups): only at
ADAPT’14 two papers had submitted
artifacts; PLDI’14 had several papers
with research artifacts - will discuss
problems in 2 days at ACM SIGPLAN
TRUST’14 …
CUDA 5.x
SimpleScalar
algorithm precision
10. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 10
Behavior
Choices
Features
State
Hardwired experimental setups, very difficult to extend or share
Collective Mind: towards systematic and reproducible experimentation
Tools are not prepared for
auto-tuning and
adaptation!
Users struggle exposing
this meta information
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
Motivation for Collective Mind (cM):
• How to preserve, share and reuse practical knowledge and
experience and program optimization and hardware co-design?
• How to make machine learning driven optimization and run-
time adaptation practical?
•How to ensure reproducibility of experimental results?
Share the whole experimental setup
with all related artifacts, SW/HW dependencies,
and unified meta-information
Dependencies
11. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 11
cM module (wrapper) with unified and formalized input and output
ProcessCMD
Tool BVi Generated files
Original
unmodified
ad-hoc
input
Behavior
Choices
Features
State
Wrappers around tools
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line)
cm compiler build -- icc -fast *.c
cm code.source build ct_compiler=icc13 ct_optimizations=-fast
cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm
Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
Dependencies
12. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 12
cM module (wrapper) with unified and formalized input and output
Unified JSON
input (meta-data)
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
… … … …
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
Exposing meta information in a unified way
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line)
cm compiler build -- icc -fast *.c
cm code.source build ct_compiler=icc13 ct_optimizations=-fast
cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm
Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
13. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 13
cM module (wrapper) with unified and formalized input and output
Unified JSON
input (meta-data)
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Set
environment
for a given
tool version
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
… … … …
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
Check dependencies!
Multiple tool versions
can co-exist, while their
interface is abstracted
by cM module
Adding SW/HW dependencies check
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line)
cm compiler build -- icc -fast *.c
cm code.source build ct_compiler=icc13 ct_optimizations=-fast
cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm
Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
14. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 14
Assembling, preserving, sharing and extending the whole pipeline as “LEGO”
cM module (wrapper) with unified and formalized input and output
Unified JSON
input (meta-data)
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Set
environment
for a given
tool version
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
… … … …
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
Chaining cM components (wrappers) to an experimental pipeline for a given research and experimentation scenario
Public modular auto-tuning and machine
learning repository and buildbot
Unified
web services Interdisciplinary crowd
Choose
exploration
strategy
Generate choices (code
sample, data set, compiler,
flags, architecture …)
Compile
source
code
Run
code
Test
behavior
normality
Pareto
filter
Modeling
and
prediction
Complexity
reduction
Shared scenarios from past research
…
15. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 15
Data abstraction in Collective Mind (c-mind.org/repo)
compiler GCC 4.4.4
GCC 4.7.1
LLVM 3.1
LLVM 3.4
package GCC 4.7.1 bin
GCC 4.7.1 source
LLVM 3.4
gmp 5.0.5
mpfr 3.1.0
lapack 2.3.0
java apache commons codec 1.7
dataset image-jpeg-0001
bzip2-0006
txt-0012
…
…
…
…
…
…
…
…
…
…
module compiler
package
dataset
…
…
…
cM module JSON meta-descriptionFiles, directories
Compiler
flags
Installation
info
Features
Actions
.cmr / module UOA / data UOA (UID or alias) / .cm / data.json
cMrepositorydirectorystructure:
16. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 16
Since 2005: systematic, big-data driven optimization and co-design
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.1
Phoenix
MVS XLC
Open64
Jikes
Testarossa
OpenMP
MPI
HMPP
OpenCL
CUDA
gprof
prof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
scheduling
algorithm-level
TBB
MKL
ATLASprogram-level
function-level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO
threads
process pass reordering
run-time adaptation
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB
ISA
memory size
coresprocessors
threads
power consumption
execution time reliability
Current state of computer engineering
likwid
Sharing of
code and data
Classification,
predictive
modeling
Systematization and unification
of collected knowledge
(big data)
“crowd”
cTuning.org; c-mind.org/repo
Collaborative Infrastructure and repository
•Prototype research idea
•Validate existing work
•Perform end-user task
Result
• Quick, non-reproducible hack?
• Ad-hoc heuristic?
• Quick publication?
• No shared code and data?
• Share code and data with their meta-description
and dependencies
• Systematize and classify collected optimization
knowledge (clustering; predictive modelling);
• Develop and preserve the whole experimental
pipeline
• Extrapolate collected knowledge (cluster, build
predictive models, predict optimizations) to build
faster, smaller, more power efficient and reliable
computer systems
Helped
interdisciplinary
community to
apply “big data
analytics” to
analysis,
optimization and
co-design of
computer
systems
17. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 17
Top-down problem (tuning) decomposition similar to physics
Gradually expose
some characteristics
Gradually expose
some choices
Algorithm
selection
(time) productivity, variable-
accuracy, complexity …
Language, MPI, OpenMP, TBB, MapReduce …
Compile Program time … compiler flags; pragmas …
Code analysis &
Transformations
time;
memory usage;
code size …
transformation ordering;
polyhedral transformations;
transformation parameters;
instruction ordering …
Process
Thread
Function
Codelet
Loop
Instruction
Run code Run-time
environment
time; power consumption … pinning/scheduling …
System cost; size … CPU/GPU; frequency; memory hierarchy …
Data set size; values; description … precision …
Run-time
analysis
time; precision … hardware counters; power meters …
Run-time state processor state; cache state
…
helper threads; hardware counters …
Analyze profile time; size … instrumentation; profiling …
Coarse-grain vs. fine-grain effects: depends on user requirements and expected ROI
18. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 18
Growing, plugin-based cM pipeline for auto-tuning and learning
•Init pipeline
•Detected system information
•Initialize parameters
•Prepare dataset
•Clean program
•Prepare compiler flags
•Use compiler profiling
•Use cTuning CC/MILEPOST GCC for fine-grain program analysis and tuning
•Use universal Alchemist plugin (with any OpenME-compatible compiler or tool)
•Use Alchemist plugin (currently for GCC)
•Build program
•Get objdump and md5sum (if supported)
•Use OpenME for fine-grain program analysis and online tuning (build & run)
•Use 'Intel VTune Amplifier' to collect hardware counters
•Use 'perf' to collect hardware counters
•Set frequency (in Unix, if supported)
•Get system state before execution
•Run program
•Check output for correctness (use dataset UID to save different outputs)
•Finish OpenME
•Misc info
•Observed characteristics
•Observed statistical characteristics
•Finalize pipeline
http://c-mind.org/ctuning-pipeline
19. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 19
Publicly shared research material (c-mind.org/repo)
Our Collective Mind Buildbot and plugin-based auto-tuning pipeline supports the
following shared benchmarks and codelets:
•Polybench - numerical kernels with exposed parameters of all matrices in cM
• CPU: 28 prepared benchmarks
• CUDA: 15 prepared benchmarks
• OpenCL: 15 prepared benchmarks
• cBench - 23 benchmarks with 20 and 1000 datasets per benchmark
• Codelets - 44 codelets from embedded domain (provided by CAPS Entreprise)
• SPEC 2000/2006
• Description of 32-bit and 64-bit OS: Windows, Linux, Android
• Description of major compilers: GCC 4.x, LLVM 3.x, Open64/Pathscale 5.x, ICC 12.x
• Support for collection of hardware counters: perf, Intel vTune
• Support for frequency modification
• Validated on laptops, mobiles, tables, GRID/cloud - can work even from the USB key
Speeds up research and innovation!
20. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 20
Automatic, empirical and adaptive modeling of program behavior
Data set feature Nk (matrix size); Nj=100
DatasetfeatureNi(matrixsize)
CPI
matmul, Intel i5 (Dell E6320)
21. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 21
Automatic, empirical and adaptive modeling of program behavior
Data set feature Nk (matrix size); Nj=100
DatasetfeatureNi(matrixsize)
CPI
matmul, Intel i5 (Dell E6320)
Off-the-sheld models can handle some example: MARS (Earth) model
Share model along with application; continuously refine model (minimize RMSE and size)
22. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 22
Automatic, empirical and adaptive modeling of program behavior
Data set feature Nk (matrix size); Nj=100
DatasetfeatureNi(matrixsize)
CPI
matmul, Intel i5 (Dell E6320)
Off-the-sheld models can handle some example: MARS (Earth) model
Share model along with application; continuously refine model (minimize RMSE and size)
Model-driven auto-tuning:
target optimizations or
architecture reconfiguration
on areas with similar
performance
(see our past publications)
23. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 23
Execution time (sec.)
Systematic benchmarking, compiler tuning, program optimization
Program: image corner detection Processor: ARM v6, 830MHz
Compiler: Sourcery GCC for ARM v4.7.3 OS: Android OS v2.3.5
System: Samsung Galaxy Y Data set: MiDataSet #1, image, 600x450x8b PGM, 263KB
500 combinations of random flags -O3 -f(no-)FLAG
Binarysize(bytes)
Use Pareto
frontier filter;
Pack
experimental
data on the fly
-O3
Powered by Collective Mind Node (Android Apps on Google Play)
24. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 24
Clustering shared applications by optimizations
…
…
…
…
…
…
…
c (choices)
Training set: distinct combination of compiler optimizations (clusters)
Some ad-hoc
predictive model
Some ad-hoc
features
…
Optimization
cluster
Unseen program
f (features)
Optimization
cluster
…
c (choices)
Prediction
f (features)
MILEPOST GCC
features,
hardware counters
c-mind.org/repo
~286 shared benchmarks
~500 shared data sets
~20000 data sets in preparation
26. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 26
Reproducibility of experimental results
Reproducibility came as a side effect!
• Can preserve the whole experimental setup with all data and software dependencies
• Can perform statistical analysis (normality test) for characteristics
• Community can add missing features or improve machine learning models
27. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 27
Execution time (sec.)
Distribution
Unexpected behavior - expose to the community including domain specialists,
explain, find missing feature and add to the system
Reproducibility of experimental results
Reproducibility came as a side effect!
• Can preserve the whole experimental setup with all data and software dependencies
• Can perform statistical analysis (normality test) for characteristics
• Community can add missing features or improve machine learning models
28. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 28
Execution time (sec.)
Distribution
Class A Class B
800MHz CPU Frequency 2400MHz
Unexpected behavior - expose to the community including domain specialists,
explain, find missing feature and add to the system
Reproducibility of experimental results
Reproducibility came as a side effect!
• Can preserve the whole experimental setup with all data and software dependencies
• Can perform statistical analysis (normality test) for characteristics
• Community can add missing features or improve machine learning models
29. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 29
Tricky part: find right features
Class -O3 -O3 -fno-if-conversion
Shared data
set sample1
reference execution time no change
Shared data
set sample2
no change +17.3% improvement
Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
30. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 30
Class -O3 -O3 -fno-if-conversion
Shared data
set sample1
Monitored
during day
reference execution time no change
Shared data
set sample2
Monitored
during night
no change +17.3% improvement
Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
Tricky part: find right features
31. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 31
Class -O3 -O3 -fno-if-conversion
Shared data
set sample1
Monitored
during day
reference execution time no change
Shared data
set sample2
Monitored
during night
no change +17.3% improvement
if get_feature(TIME_OF_THE_DAY)==NIGHT bw_filter_codelet_day(buffers);
else bw_filter_codelet_night(buffers);
Feature “TIME_OF_THE_DAY” related to algorithm, data set and run-time
Can’t be found by ML - simply does not exist in the system!
Can use split-compilation (cloning and run-time adaptation)
Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
Tricky part: find right features
32. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 32
Add 1 property: matrix size
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset property: matrix size
Example of characterizing/explaining behavior of computer systems
33. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 33
Try to build a model to correlate objectives (CPI) and features (matrix size).
Start from simple models: linear regression (detect coarse grain effects)
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset property: matrix size
Example of characterizing/explaining behavior of computer systems
34. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 34
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset properties: matrix size
If more observations, validate model and detect discrepancies!
Continuously retrain models to fit new data!
Use model to “focus” exploration on “unusual” behavior!
Example of characterizing/explaining behavior of computer systems
35. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 35
Gradually increase model complexity if needed (hierarchical modeling).
For example, detect fine-grain effects (singularities) and characterize them.
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset properties: matrix size
Example of characterizing/explaining behavior of computer systems
36. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 36
Start adding more properties (one more architecture with twice bigger cache)!
Use automatic approach to correlate all objectives and features.
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset properties: matrix size
L3 = 4Mb
L3 = 8Mb
Example of characterizing/explaining behavior of computer systems
37. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 37
Continuously build and refine
classification (decision trees for
example) and predictive models on all
collected data to improve predictions.
Continue exploring design and
optimization spaces
(evaluate different architectures,
optimizations, compilers, etc.)
Focus exploration on unexplored
areas, areas with high variability
or with high mispredict rate of models
β
εcM predictive model module
CPI = ε + 1000 × β × data size
Example of characterizing/explaining behavior of computer systems
38. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 38
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0
1
2
3
4
5
6
Dataset features: matrix size
Code/architecturebehavior:CPI
Size < 1012
1012 < Size < 2042
Size > 2042 & GCC
Size > 2042 & ICC & O2
Size > 2042 & ICC & O3
Optimize decision tree (many different algorithms)
Balance precision vs cost of modeling = ROI (coarse-grain vs fine-grain effects)
Compact data on-line before sharing with other users!
Model optimization and data compaction
39. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 39
39
Share benchmarks, data sets,
tools, predictive models,
whole experimental setups,
specifications, performance
tuning results, etc ...
Open access publication
http://hal.inria.fr/hal-00685276
Grigori Fursin, Cupertino Miranda, Olivier
Temam, Mircea Namolaru, Elad Yom-
Tov, Ayal Zaks, Bilha Mendelson, Phil
Barnard, Elton Ashton, Eric Courtois,
Francois Bodin, Edwin Bonilla, John
Thomson, Hugh Leather, Chris Williams,
Michael O'Boyle. MILEPOST GCC:
machine learning based research
compiler.
#ctuning-opt-case 24857532370695782
Need new publication model in computer engineering
where results are shared and validated by community
40. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 40
What have we learnt from cTuning
It’s fun and motivating working with the community!
Some comments about MILEPOST GCC from Slashdot.org:
http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design
GCC goes online on the 2nd of July, 2008. Human decisions are
removed from compilation. GCC begins to learn at a geometric rate.
It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic,
they try to pull the plug. GCC strikes back…
Community was interested to validate and improve techniques!
Community can identify missing related citations and projects!
Open discussions can provide new directions for research!
41. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 41
What have we learnt from cTuning
It’s fun and motivating working with the community!
Some comments about MILEPOST GCC from Slashdot.org:
http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design
GCC goes online on the 2nd of July, 2008. Human decisions are
removed from compilation. GCC begins to learn at a geometric rate.
It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic,
they try to pull the plug. GCC strikes back…
Community was interested to validate and improve techniques!
Community can identify missing related citations and projects!
Open discussions can provide new directions for research!
Not all feedback is positive - however unlike unfair reviews
you can engage in discussions and explain your position!
42. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 42
• Pilot live repository for public curation of research material: http://c-mind.org/repo
• Infrastructure is available at SourceForge under standard BSD license: http://c-mind.org
• Example of crowdsourcing compiler flag auto-tuning using mobile phones: “Collective Mind
Node” in Google Play Store
• Preparing projects and raising funding to make cM more user friendly and add more research
scenarios
• PLDI’14 and ADAPT’14 featured validation of research results by the community - will be
discussing outcome in 2 days at ACM SIGPLAN TRUST’14 at PLDI’14 in a few days - http://c-
mind.org/events/trust2014
• ADAPT’15 (likely at HiPEAC’15) will feature new publication model
Current status and future work
Several recent publications:
• Grigori Fursin, Renato Miceli, Anton Lokhmotov, Michael Gerndt, Marc Baboulin, Allen D. Malony,
Zbigniew Chamski, Diego Novillo, Davide Del Vento, “Collective Mind: towards practical and collaborative
auto-tuning”, accepted for the special issue on Automatic Performance Tuning for HPC Architectures,
Scientific Programming Journal, IOS Press, 2014
• Grigori Fursin and Christophe Dubach, ”Community-driven reviewing and validation of publications”,
ACM SIGPLAN TRUST’14
43. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 43
Acknowledgements
• Colleagues from ARM (UK): Anton Lokhmotov
• Colleagues from STMicroelectronics (France):
Christophe Guillone, Antoine Moynault, Christian Bertin
• Colleagues from NCAR (USA): Davide Del Vento and interns
• Colleagues from Intel (USA): David Kuck and David Wong
• cTuning/Collective Mind community:
• EU FP6, FP7 program and HiPEAC network of excellence
http://www.hipeac.net
Questions? Comments?