SlideShare a Scribd company logo
1 of 43
Download to read offline
Collective Mind:
bringing reproducible research to the masses
Grigori Fursin
POSTALE, INRIA Saclay, France
INRIA-Illinois-ANL Joint Laboratory Workshop
France, June 2014
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 2
Challenge:
How to design next generation of faster, smaller, cheaper, more power
efficient and reliable computer systems (software and hardware)?
Long term interdisciplinary vision:
• Share code and data in a reproducible way along with publications
• Use big data analytics to program optimization, run-time adaptation and
architecture co-design
• Bring interdisciplinary community together to validate experimental
results, ensure reproducibility, improve optimization predictions
Message
Continuously validated in industrial
projects with Intel, ARM, IBM, CAPS,
ARC (Synopsys), STMicroelectronics
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 3
• Motivation: general problems in computer engineering
• cTuning: big-data driven program optimization and architecture
co-design and encountered problems
• Collective Mind: collaborative and reproducible research and
experimentation in computer engineering
• Reproducibility as a side effect
• Conclusions and future work
Talk outline
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 4
Available solutions
Result
Application
Compilers
Binary and libraries
Architecture
Run-time environment
State of the system Data set
Algorithm
End
User
task
End users require faster, smaller and more power efficient systems
Storage
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 5
GCC optimizations
Result
End
User
task
Delivering optimal solution is non-trivial
Fundamental problems:
1) Too many design and optimization choices
at all levels
2) Always multi-objective optimization:
performance vs compilation time vs code
size vs system size vs power consumption vs
reliability vs return on investment
3) Complex relationship and interactions
between ALL software and hardware
components
Empirical auto-tuning is too time consuming,
ad-hoc and tedious to be a mainstream!
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 6
Combine auto-tuning with machine learning and crowdsourcing
Plugin-based
MILEPOST GCC
Plugins
Monitor and explore
optimization space
Extract semantic
program features
cTuning.org: plugin-based
auto-tuning framework
and public repository
Program or
kernel1
Program or
kernel N
…
Training
Unseen
program
Prediction
MILEPOST GCC
Plugins
Collect dynamic features
Cluster
Build predictive model
Extract semantic
program features
Collect hardware counters
Predict optimization
to minimize
execution time,
power consumption,
code size, etc
• G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011
•G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010
•G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009
• F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 7
• G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011
•G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010
•G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009
• F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006
Plugin-based
MILEPOST GCC
Plugins
Monitor and explore
optimization space
Extract semantic
program features
cTuning.org: plugin-based
auto-tuning framework
and public repository
Program or
kernel1
Program or
kernel N
…
Training
Unseen
program
Prediction
MILEPOST GCC
Plugins
Collect dynamic features
Cluster
Build predictive model
Extract semantic
program features
Collect hardware counters
Predict optimization
to minimize
execution time,
power consumption,
code size, etc
In 2009, we opened public repository of knowledge
(cTuning.org) and managed to automatically tune
customer benchmarks and compiler heuristics
for a range of real platforms
from IBM and ARC (Synopsis)
Now becomes a mainstream -
everything is solved?
Combine auto-tuning with machine learning and crowdsourcing
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 8
Technological chaos
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS 2013
XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA 4.x
gprofprof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
predictive
scheduling
algorithm-
level TBB
MKL
ATLAS
program-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO threads
process
pass
reordering
KNN
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB ISA
memory size
ARM v6
threads
execution time
reliability
GCC 4.8.x
LLVM 3.4
SVM
genetic
algorithms
We also experienced a few problems
ARM v8
Intel SandyBridge
SSE4
AVX
• Difficulty to reproduce results collected
from multiple users (including variability
of performance data and constant
changes in the system)
• Difficulty to reproduce and validate
already existing and related techniques
from existing publications (no full specs
and dependencies)
• Lack of common, large and diverse
benchmarks and data sets
• Difficult to expose choices and extract
features(tools are not prepared for auto-
tuning and machine learning)
• Difficult to experiment
CUDA 5.x
SimpleScalar
algorithm precision
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 9
Technological chaos
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS 2013
XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA 4.x
gprofprof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
predictive
scheduling
algorithm-
level TBB
MKL
ATLAS
program-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO threads
process
pass
reordering
KNN
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB ISA
memory size
ARM v6
threads
execution time
reliability
GCC 4.8.x
LLVM 3.4
SVM
genetic
algorithms
We also experienced a few problems
ARM v8
Intel SandyBridge
SSE4
AVX
• By the end of experiments, new tool
versions are often available;
• Common life span of experiments and
ad-hoc frameworks - end of MS or PhD
project;
• Researchers often focus on publications
rather than practical and reproducible
solutions
• Since 2009 asking community to share
code, performance data and all related
artifacts (experimental setups): only at
ADAPT’14 two papers had submitted
artifacts; PLDI’14 had several papers
with research artifacts - will discuss
problems in 2 days at ACM SIGPLAN
TRUST’14 …
CUDA 5.x
SimpleScalar
algorithm precision
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 10
Behavior
Choices
Features
State
Hardwired experimental setups, very difficult to extend or share
Collective Mind: towards systematic and reproducible experimentation
Tools are not prepared for
auto-tuning and
adaptation!
Users struggle exposing
this meta information
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
Motivation for Collective Mind (cM):
• How to preserve, share and reuse practical knowledge and
experience and program optimization and hardware co-design?
• How to make machine learning driven optimization and run-
time adaptation practical?
•How to ensure reproducibility of experimental results?
Share the whole experimental setup
with all related artifacts, SW/HW dependencies,
and unified meta-information
Dependencies
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 11
cM module (wrapper) with unified and formalized input and output
ProcessCMD
Tool BVi Generated files
Original
unmodified
ad-hoc
input
Behavior
Choices
Features
State
Wrappers around tools
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line)
cm compiler build -- icc -fast *.c
cm code.source build ct_compiler=icc13 ct_optimizations=-fast
cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm
Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
Dependencies
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 12
cM module (wrapper) with unified and formalized input and output
Unified JSON
input (meta-data)
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
… … … …
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
Exposing meta information in a unified way
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line)
cm compiler build -- icc -fast *.c
cm code.source build ct_compiler=icc13 ct_optimizations=-fast
cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm
Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 13
cM module (wrapper) with unified and formalized input and output
Unified JSON
input (meta-data)
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Set
environment
for a given
tool version
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
… … … …
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
Check dependencies!
Multiple tool versions
can co-exist, while their
interface is abstracted
by cM module
Adding SW/HW dependencies check
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line)
cm compiler build -- icc -fast *.c
cm code.source build ct_compiler=icc13 ct_optimizations=-fast
cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm
Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 14
Assembling, preserving, sharing and extending the whole pipeline as “LEGO”
cM module (wrapper) with unified and formalized input and output
Unified JSON
input (meta-data)
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Set
environment
for a given
tool version
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
… … … …
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
Chaining cM components (wrappers) to an experimental pipeline for a given research and experimentation scenario
Public modular auto-tuning and machine
learning repository and buildbot
Unified
web services Interdisciplinary crowd
Choose
exploration
strategy
Generate choices (code
sample, data set, compiler,
flags, architecture …)
Compile
source
code
Run
code
Test
behavior
normality
Pareto
filter
Modeling
and
prediction
Complexity
reduction
Shared scenarios from past research
…
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 15
Data abstraction in Collective Mind (c-mind.org/repo)
compiler GCC 4.4.4
GCC 4.7.1
LLVM 3.1
LLVM 3.4
package GCC 4.7.1 bin
GCC 4.7.1 source
LLVM 3.4
gmp 5.0.5
mpfr 3.1.0
lapack 2.3.0
java apache commons codec 1.7
dataset image-jpeg-0001
bzip2-0006
txt-0012
…
…
…
…
…
…
…
…
…
…
module compiler
package
dataset
…
…
…
cM module JSON meta-descriptionFiles, directories
Compiler
flags
Installation
info
Features
Actions
.cmr / module UOA / data UOA (UID or alias) / .cm / data.json
cMrepositorydirectorystructure:
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 16
Since 2005: systematic, big-data driven optimization and co-design
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.1
Phoenix
MVS XLC
Open64
Jikes
Testarossa
OpenMP
MPI
HMPP
OpenCL
CUDA
gprof
prof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
scheduling
algorithm-level
TBB
MKL
ATLASprogram-level
function-level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO
threads
process pass reordering
run-time adaptation
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB
ISA
memory size
coresprocessors
threads
power consumption
execution time reliability
Current state of computer engineering
likwid
Sharing of
code and data
Classification,
predictive
modeling
Systematization and unification
of collected knowledge
(big data)
“crowd”
cTuning.org; c-mind.org/repo
Collaborative Infrastructure and repository
•Prototype research idea
•Validate existing work
•Perform end-user task
Result
• Quick, non-reproducible hack?
• Ad-hoc heuristic?
• Quick publication?
• No shared code and data?
• Share code and data with their meta-description
and dependencies
• Systematize and classify collected optimization
knowledge (clustering; predictive modelling);
• Develop and preserve the whole experimental
pipeline
• Extrapolate collected knowledge (cluster, build
predictive models, predict optimizations) to build
faster, smaller, more power efficient and reliable
computer systems
Helped
interdisciplinary
community to
apply “big data
analytics” to
analysis,
optimization and
co-design of
computer
systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 17
Top-down problem (tuning) decomposition similar to physics
Gradually expose
some characteristics
Gradually expose
some choices
Algorithm
selection
(time) productivity, variable-
accuracy, complexity …
Language, MPI, OpenMP, TBB, MapReduce …
Compile Program time … compiler flags; pragmas …
Code analysis &
Transformations
time;
memory usage;
code size …
transformation ordering;
polyhedral transformations;
transformation parameters;
instruction ordering …
Process
Thread
Function
Codelet
Loop
Instruction
Run code Run-time
environment
time; power consumption … pinning/scheduling …
System cost; size … CPU/GPU; frequency; memory hierarchy …
Data set size; values; description … precision …
Run-time
analysis
time; precision … hardware counters; power meters …
Run-time state processor state; cache state
…
helper threads; hardware counters …
Analyze profile time; size … instrumentation; profiling …
Coarse-grain vs. fine-grain effects: depends on user requirements and expected ROI
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 18
Growing, plugin-based cM pipeline for auto-tuning and learning
•Init pipeline
•Detected system information
•Initialize parameters
•Prepare dataset
•Clean program
•Prepare compiler flags
•Use compiler profiling
•Use cTuning CC/MILEPOST GCC for fine-grain program analysis and tuning
•Use universal Alchemist plugin (with any OpenME-compatible compiler or tool)
•Use Alchemist plugin (currently for GCC)
•Build program
•Get objdump and md5sum (if supported)
•Use OpenME for fine-grain program analysis and online tuning (build & run)
•Use 'Intel VTune Amplifier' to collect hardware counters
•Use 'perf' to collect hardware counters
•Set frequency (in Unix, if supported)
•Get system state before execution
•Run program
•Check output for correctness (use dataset UID to save different outputs)
•Finish OpenME
•Misc info
•Observed characteristics
•Observed statistical characteristics
•Finalize pipeline
http://c-mind.org/ctuning-pipeline
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 19
Publicly shared research material (c-mind.org/repo)
Our Collective Mind Buildbot and plugin-based auto-tuning pipeline supports the
following shared benchmarks and codelets:
•Polybench - numerical kernels with exposed parameters of all matrices in cM
• CPU: 28 prepared benchmarks
• CUDA: 15 prepared benchmarks
• OpenCL: 15 prepared benchmarks
• cBench - 23 benchmarks with 20 and 1000 datasets per benchmark
• Codelets - 44 codelets from embedded domain (provided by CAPS Entreprise)
• SPEC 2000/2006
• Description of 32-bit and 64-bit OS: Windows, Linux, Android
• Description of major compilers: GCC 4.x, LLVM 3.x, Open64/Pathscale 5.x, ICC 12.x
• Support for collection of hardware counters: perf, Intel vTune
• Support for frequency modification
• Validated on laptops, mobiles, tables, GRID/cloud - can work even from the USB key
Speeds up research and innovation!
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 20
Automatic, empirical and adaptive modeling of program behavior
Data set feature Nk (matrix size); Nj=100
DatasetfeatureNi(matrixsize)
CPI
matmul, Intel i5 (Dell E6320)
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 21
Automatic, empirical and adaptive modeling of program behavior
Data set feature Nk (matrix size); Nj=100
DatasetfeatureNi(matrixsize)
CPI
matmul, Intel i5 (Dell E6320)
Off-the-sheld models can handle some example: MARS (Earth) model
Share model along with application; continuously refine model (minimize RMSE and size)
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 22
Automatic, empirical and adaptive modeling of program behavior
Data set feature Nk (matrix size); Nj=100
DatasetfeatureNi(matrixsize)
CPI
matmul, Intel i5 (Dell E6320)
Off-the-sheld models can handle some example: MARS (Earth) model
Share model along with application; continuously refine model (minimize RMSE and size)
Model-driven auto-tuning:
target optimizations or
architecture reconfiguration
on areas with similar
performance
(see our past publications)
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 23
Execution time (sec.)
Systematic benchmarking, compiler tuning, program optimization
Program: image corner detection Processor: ARM v6, 830MHz
Compiler: Sourcery GCC for ARM v4.7.3 OS: Android OS v2.3.5
System: Samsung Galaxy Y Data set: MiDataSet #1, image, 600x450x8b PGM, 263KB
500 combinations of random flags -O3 -f(no-)FLAG
Binarysize(bytes)
Use Pareto
frontier filter;
Pack
experimental
data on the fly
-O3
Powered by Collective Mind Node (Android Apps on Google Play)
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 24
Clustering shared applications by optimizations
…
…
…
…
…
…
…
c (choices)
Training set: distinct combination of compiler optimizations (clusters)
Some ad-hoc
predictive model
Some ad-hoc
features
…
Optimization
cluster
Unseen program
f (features)
Optimization
cluster
…
c (choices)
Prediction
f (features)
MILEPOST GCC
features,
hardware counters
c-mind.org/repo
~286 shared benchmarks
~500 shared data sets
~20000 data sets in preparation
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 25
0
20
40
60
80
100
120
140
160
180
0 100 200 300 400 500 600 700 800 900 1000 1100 1200
Executiontime(ms)
Data set feature N (size)
CPU
GPU
Adaptive scheduler
CPU GPU
Split-compilation and run-time adaptation
• Víctor J. Jiménez, Lluís Vilanova, Isaac Gelado, Marisa Gil, Grigori Fursin, Nacho Navarro: Predictive Runtime
Code Scheduling for Heterogeneous Architectures. HiPEAC 2009
• Grigori Fursin, Albert Cohen, Michael F. P. O'Boyle, Olivier Temam: A Practical Method for Quickly
Evaluating Program Optimizations. HiPEAC 2005
Statically enabling dynamic optimizations
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 26
Reproducibility of experimental results
Reproducibility came as a side effect!
• Can preserve the whole experimental setup with all data and software dependencies
• Can perform statistical analysis (normality test) for characteristics
• Community can add missing features or improve machine learning models
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 27
Execution time (sec.)
Distribution
Unexpected behavior - expose to the community including domain specialists,
explain, find missing feature and add to the system
Reproducibility of experimental results
Reproducibility came as a side effect!
• Can preserve the whole experimental setup with all data and software dependencies
• Can perform statistical analysis (normality test) for characteristics
• Community can add missing features or improve machine learning models
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 28
Execution time (sec.)
Distribution
Class A Class B
800MHz CPU Frequency 2400MHz
Unexpected behavior - expose to the community including domain specialists,
explain, find missing feature and add to the system
Reproducibility of experimental results
Reproducibility came as a side effect!
• Can preserve the whole experimental setup with all data and software dependencies
• Can perform statistical analysis (normality test) for characteristics
• Community can add missing features or improve machine learning models
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 29
Tricky part: find right features
Class -O3 -O3 -fno-if-conversion
Shared data
set sample1
reference execution time no change
Shared data
set sample2
no change +17.3% improvement
Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 30
Class -O3 -O3 -fno-if-conversion
Shared data
set sample1
Monitored
during day
reference execution time no change
Shared data
set sample2
Monitored
during night
no change +17.3% improvement
Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
Tricky part: find right features
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 31
Class -O3 -O3 -fno-if-conversion
Shared data
set sample1
Monitored
during day
reference execution time no change
Shared data
set sample2
Monitored
during night
no change +17.3% improvement
if get_feature(TIME_OF_THE_DAY)==NIGHT bw_filter_codelet_day(buffers);
else bw_filter_codelet_night(buffers);
Feature “TIME_OF_THE_DAY” related to algorithm, data set and run-time
Can’t be found by ML - simply does not exist in the system!
Can use split-compilation (cloning and run-time adaptation)
Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
Tricky part: find right features
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 32
Add 1 property: matrix size
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset property: matrix size
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 33
Try to build a model to correlate objectives (CPI) and features (matrix size).
Start from simple models: linear regression (detect coarse grain effects)
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset property: matrix size
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 34
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset properties: matrix size
If more observations, validate model and detect discrepancies!
Continuously retrain models to fit new data!
Use model to “focus” exploration on “unusual” behavior!
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 35
Gradually increase model complexity if needed (hierarchical modeling).
For example, detect fine-grain effects (singularities) and characterize them.
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset properties: matrix size
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 36
Start adding more properties (one more architecture with twice bigger cache)!
Use automatic approach to correlate all objectives and features.
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset properties: matrix size
L3 = 4Mb
L3 = 8Mb
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 37
Continuously build and refine
classification (decision trees for
example) and predictive models on all
collected data to improve predictions.
Continue exploring design and
optimization spaces
(evaluate different architectures,
optimizations, compilers, etc.)
Focus exploration on unexplored
areas, areas with high variability
or with high mispredict rate of models
β
εcM predictive model module
CPI = ε + 1000 × β × data size
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 38
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0
1
2
3
4
5
6
Dataset features: matrix size
Code/architecturebehavior:CPI
Size < 1012
1012 < Size < 2042
Size > 2042 & GCC
Size > 2042 & ICC & O2
Size > 2042 & ICC & O3
Optimize decision tree (many different algorithms)
Balance precision vs cost of modeling = ROI (coarse-grain vs fine-grain effects)
Compact data on-line before sharing with other users!
Model optimization and data compaction
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 39
39
Share benchmarks, data sets,
tools, predictive models,
whole experimental setups,
specifications, performance
tuning results, etc ...
Open access publication
http://hal.inria.fr/hal-00685276
Grigori Fursin, Cupertino Miranda, Olivier
Temam, Mircea Namolaru, Elad Yom-
Tov, Ayal Zaks, Bilha Mendelson, Phil
Barnard, Elton Ashton, Eric Courtois,
Francois Bodin, Edwin Bonilla, John
Thomson, Hugh Leather, Chris Williams,
Michael O'Boyle. MILEPOST GCC:
machine learning based research
compiler.
#ctuning-opt-case 24857532370695782
Need new publication model in computer engineering
where results are shared and validated by community
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 40
What have we learnt from cTuning
It’s fun and motivating working with the community!
Some comments about MILEPOST GCC from Slashdot.org:
http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design
GCC goes online on the 2nd of July, 2008. Human decisions are
removed from compilation. GCC begins to learn at a geometric rate.
It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic,
they try to pull the plug. GCC strikes back…
Community was interested to validate and improve techniques!
Community can identify missing related citations and projects!
Open discussions can provide new directions for research!
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 41
What have we learnt from cTuning
It’s fun and motivating working with the community!
Some comments about MILEPOST GCC from Slashdot.org:
http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design
GCC goes online on the 2nd of July, 2008. Human decisions are
removed from compilation. GCC begins to learn at a geometric rate.
It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic,
they try to pull the plug. GCC strikes back…
Community was interested to validate and improve techniques!
Community can identify missing related citations and projects!
Open discussions can provide new directions for research!
Not all feedback is positive - however unlike unfair reviews
you can engage in discussions and explain your position!
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 42
• Pilot live repository for public curation of research material: http://c-mind.org/repo
• Infrastructure is available at SourceForge under standard BSD license: http://c-mind.org
• Example of crowdsourcing compiler flag auto-tuning using mobile phones: “Collective Mind
Node” in Google Play Store
• Preparing projects and raising funding to make cM more user friendly and add more research
scenarios
• PLDI’14 and ADAPT’14 featured validation of research results by the community - will be
discussing outcome in 2 days at ACM SIGPLAN TRUST’14 at PLDI’14 in a few days - http://c-
mind.org/events/trust2014
• ADAPT’15 (likely at HiPEAC’15) will feature new publication model
Current status and future work
Several recent publications:
• Grigori Fursin, Renato Miceli, Anton Lokhmotov, Michael Gerndt, Marc Baboulin, Allen D. Malony,
Zbigniew Chamski, Diego Novillo, Davide Del Vento, “Collective Mind: towards practical and collaborative
auto-tuning”, accepted for the special issue on Automatic Performance Tuning for HPC Architectures,
Scientific Programming Journal, IOS Press, 2014
• Grigori Fursin and Christophe Dubach, ”Community-driven reviewing and validation of publications”,
ACM SIGPLAN TRUST’14
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 43
Acknowledgements
• Colleagues from ARM (UK): Anton Lokhmotov
• Colleagues from STMicroelectronics (France):
Christophe Guillone, Antoine Moynault, Christian Bertin
• Colleagues from NCAR (USA): Davide Del Vento and interns
• Colleagues from Intel (USA): David Kuck and David Wong
• cTuning/Collective Mind community:
• EU FP6, FP7 program and HiPEAC network of excellence
http://www.hipeac.net
Questions? Comments?

More Related Content

Similar to Collective Mind: bringing reproducible research to the masses

Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTKAshish Jaiman
 
MILEPOST GCC: machine learning based research compiler
MILEPOST GCC: machine learning based research compilerMILEPOST GCC: machine learning based research compiler
MILEPOST GCC: machine learning based research compilerbutest
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringRafael Ferreira da Silva
 
The Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
The Impact of Compiler Auto-Optimisation on Arm-based HPC MicroarchitecturesThe Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
The Impact of Compiler Auto-Optimisation on Arm-based HPC MicroarchitecturesNECST Lab @ Politecnico di Milano
 
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...Deltares
 
Software component reuse repository
Software component reuse repositorySoftware component reuse repository
Software component reuse repositorySandeep Singh
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stackinside-BigData.com
 
Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nybergdiannepatricia
 
Reproducibility in artificial intelligence
Reproducibility in artificial intelligenceReproducibility in artificial intelligence
Reproducibility in artificial intelligenceCarlos Toxtli
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use casesLEGATO project
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema VarietyUniversity of Bologna
 
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...Roberto Casadei
 
Ibm innovate ci for system z
Ibm innovate ci for system zIbm innovate ci for system z
Ibm innovate ci for system zRosalind Radcliffe
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)Tao Xie
 
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...Big Data Week
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
Emerging standards and support organizations within engineering simulation
Emerging standards and support organizations within engineering simulation Emerging standards and support organizations within engineering simulation
Emerging standards and support organizations within engineering simulation Modelon
 

Similar to Collective Mind: bringing reproducible research to the masses (20)

Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTK
 
MILEPOST GCC: machine learning based research compiler
MILEPOST GCC: machine learning based research compilerMILEPOST GCC: machine learning based research compiler
MILEPOST GCC: machine learning based research compiler
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
 
The Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
The Impact of Compiler Auto-Optimisation on Arm-based HPC MicroarchitecturesThe Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
The Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
 
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
DSD-INT 2014 - OpenMI Symposium - Federated modelling of Critical Infrastruct...
 
Software component reuse repository
Software component reuse repositorySoftware component reuse repository
Software component reuse repository
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
 
Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nyberg
 
Reproducibility in artificial intelligence
Reproducibility in artificial intelligenceReproducibility in artificial intelligence
Reproducibility in artificial intelligence
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
HCI
HCIHCI
HCI
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
 
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
 
Ibm innovate ci for system z
Ibm innovate ci for system zIbm innovate ci for system z
Ibm innovate ci for system z
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Cnpm bkdn
Cnpm bkdnCnpm bkdn
Cnpm bkdn
 
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Emerging standards and support organizations within engineering simulation
Emerging standards and support organizations within engineering simulation Emerging standards and support organizations within engineering simulation
Emerging standards and support organizations within engineering simulation
 

Recently uploaded

Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
REVISTA DE BIOLOGIA E CIĂŠNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIĂŠNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIĂŠNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIĂŠNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 

Recently uploaded (20)

Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
REVISTA DE BIOLOGIA E CIĂŠNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIĂŠNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIĂŠNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIĂŠNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 

Collective Mind: bringing reproducible research to the masses

  • 1. Collective Mind: bringing reproducible research to the masses Grigori Fursin POSTALE, INRIA Saclay, France INRIA-Illinois-ANL Joint Laboratory Workshop France, June 2014
  • 2. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 2 Challenge: How to design next generation of faster, smaller, cheaper, more power efficient and reliable computer systems (software and hardware)? Long term interdisciplinary vision: • Share code and data in a reproducible way along with publications • Use big data analytics to program optimization, run-time adaptation and architecture co-design • Bring interdisciplinary community together to validate experimental results, ensure reproducibility, improve optimization predictions Message Continuously validated in industrial projects with Intel, ARM, IBM, CAPS, ARC (Synopsys), STMicroelectronics
  • 3. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 3 • Motivation: general problems in computer engineering • cTuning: big-data driven program optimization and architecture co-design and encountered problems • Collective Mind: collaborative and reproducible research and experimentation in computer engineering • Reproducibility as a side effect • Conclusions and future work Talk outline
  • 4. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 4 Available solutions Result Application Compilers Binary and libraries Architecture Run-time environment State of the system Data set Algorithm End User task End users require faster, smaller and more power efficient systems Storage
  • 5. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 5 GCC optimizations Result End User task Delivering optimal solution is non-trivial Fundamental problems: 1) Too many design and optimization choices at all levels 2) Always multi-objective optimization: performance vs compilation time vs code size vs system size vs power consumption vs reliability vs return on investment 3) Complex relationship and interactions between ALL software and hardware components Empirical auto-tuning is too time consuming, ad-hoc and tedious to be a mainstream!
  • 6. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 6 Combine auto-tuning with machine learning and crowdsourcing Plugin-based MILEPOST GCC Plugins Monitor and explore optimization space Extract semantic program features cTuning.org: plugin-based auto-tuning framework and public repository Program or kernel1 Program or kernel N … Training Unseen program Prediction MILEPOST GCC Plugins Collect dynamic features Cluster Build predictive model Extract semantic program features Collect hardware counters Predict optimization to minimize execution time, power consumption, code size, etc • G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011 •G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010 •G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009 • F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006
  • 7. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 7 • G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011 •G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010 •G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009 • F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006 Plugin-based MILEPOST GCC Plugins Monitor and explore optimization space Extract semantic program features cTuning.org: plugin-based auto-tuning framework and public repository Program or kernel1 Program or kernel N … Training Unseen program Prediction MILEPOST GCC Plugins Collect dynamic features Cluster Build predictive model Extract semantic program features Collect hardware counters Predict optimization to minimize execution time, power consumption, code size, etc In 2009, we opened public repository of knowledge (cTuning.org) and managed to automatically tune customer benchmarks and compiler heuristics for a range of real platforms from IBM and ARC (Synopsis) Now becomes a mainstream - everything is solved? Combine auto-tuning with machine learning and crowdsourcing
  • 8. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 8 Technological chaos GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS 2013 XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA 4.x gprofprof perf oprofile PAPI TAU Scalasca VTune Amplifier predictive scheduling algorithm- level TBB MKL ATLAS program- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering KNN per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size ARM v6 threads execution time reliability GCC 4.8.x LLVM 3.4 SVM genetic algorithms We also experienced a few problems ARM v8 Intel SandyBridge SSE4 AVX • Difficulty to reproduce results collected from multiple users (including variability of performance data and constant changes in the system) • Difficulty to reproduce and validate already existing and related techniques from existing publications (no full specs and dependencies) • Lack of common, large and diverse benchmarks and data sets • Difficult to expose choices and extract features(tools are not prepared for auto- tuning and machine learning) • Difficult to experiment CUDA 5.x SimpleScalar algorithm precision
  • 9. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 9 Technological chaos GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS 2013 XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA 4.x gprofprof perf oprofile PAPI TAU Scalasca VTune Amplifier predictive scheduling algorithm- level TBB MKL ATLAS program- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering KNN per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size ARM v6 threads execution time reliability GCC 4.8.x LLVM 3.4 SVM genetic algorithms We also experienced a few problems ARM v8 Intel SandyBridge SSE4 AVX • By the end of experiments, new tool versions are often available; • Common life span of experiments and ad-hoc frameworks - end of MS or PhD project; • Researchers often focus on publications rather than practical and reproducible solutions • Since 2009 asking community to share code, performance data and all related artifacts (experimental setups): only at ADAPT’14 two papers had submitted artifacts; PLDI’14 had several papers with research artifacts - will discuss problems in 2 days at ACM SIGPLAN TRUST’14 … CUDA 5.x SimpleScalar algorithm precision
  • 10. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 10 Behavior Choices Features State Hardwired experimental setups, very difficult to extend or share Collective Mind: towards systematic and reproducible experimentation Tools are not prepared for auto-tuning and adaptation! Users struggle exposing this meta information Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments Motivation for Collective Mind (cM): • How to preserve, share and reuse practical knowledge and experience and program optimization and hardware co-design? • How to make machine learning driven optimization and run- time adaptation practical? •How to ensure reproducibility of experimental results? Share the whole experimental setup with all related artifacts, SW/HW dependencies, and unified meta-information Dependencies
  • 11. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 11 cM module (wrapper) with unified and formalized input and output ProcessCMD Tool BVi Generated files Original unmodified ad-hoc input Behavior Choices Features State Wrappers around tools Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line) cm compiler build -- icc -fast *.c cm code.source build ct_compiler=icc13 ct_optimizations=-fast cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)! Dependencies
  • 12. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 12 cM module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) Exposing meta information in a unified way Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line) cm compiler build -- icc -fast *.c cm code.source build ct_compiler=icc13 ct_optimizations=-fast cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
  • 13. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 13 cM module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Set environment for a given tool version Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) Check dependencies! Multiple tool versions can co-exist, while their interface is abstracted by cM module Adding SW/HW dependencies check Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line) cm compiler build -- icc -fast *.c cm code.source build ct_compiler=icc13 ct_optimizations=-fast cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
  • 14. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 14 Assembling, preserving, sharing and extending the whole pipeline as “LEGO” cM module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Set environment for a given tool version Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) Chaining cM components (wrappers) to an experimental pipeline for a given research and experimentation scenario Public modular auto-tuning and machine learning repository and buildbot Unified web services Interdisciplinary crowd Choose exploration strategy Generate choices (code sample, data set, compiler, flags, architecture …) Compile source code Run code Test behavior normality Pareto filter Modeling and prediction Complexity reduction Shared scenarios from past research …
  • 15. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 15 Data abstraction in Collective Mind (c-mind.org/repo) compiler GCC 4.4.4 GCC 4.7.1 LLVM 3.1 LLVM 3.4 package GCC 4.7.1 bin GCC 4.7.1 source LLVM 3.4 gmp 5.0.5 mpfr 3.1.0 lapack 2.3.0 java apache commons codec 1.7 dataset image-jpeg-0001 bzip2-0006 txt-0012 … … … … … … … … … … module compiler package dataset … … … cM module JSON meta-descriptionFiles, directories Compiler flags Installation info Features Actions .cmr / module UOA / data UOA (UID or alias) / .cm / data.json cMrepositorydirectorystructure:
  • 16. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 16 Since 2005: systematic, big-data driven optimization and co-design GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.1 Phoenix MVS XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA gprof prof perf oprofile PAPI TAU Scalasca VTune Amplifier scheduling algorithm-level TBB MKL ATLASprogram-level function-level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering run-time adaptation per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size coresprocessors threads power consumption execution time reliability Current state of computer engineering likwid Sharing of code and data Classification, predictive modeling Systematization and unification of collected knowledge (big data) “crowd” cTuning.org; c-mind.org/repo Collaborative Infrastructure and repository •Prototype research idea •Validate existing work •Perform end-user task Result • Quick, non-reproducible hack? • Ad-hoc heuristic? • Quick publication? • No shared code and data? • Share code and data with their meta-description and dependencies • Systematize and classify collected optimization knowledge (clustering; predictive modelling); • Develop and preserve the whole experimental pipeline • Extrapolate collected knowledge (cluster, build predictive models, predict optimizations) to build faster, smaller, more power efficient and reliable computer systems Helped interdisciplinary community to apply “big data analytics” to analysis, optimization and co-design of computer systems
  • 17. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 17 Top-down problem (tuning) decomposition similar to physics Gradually expose some characteristics Gradually expose some choices Algorithm selection (time) productivity, variable- accuracy, complexity … Language, MPI, OpenMP, TBB, MapReduce … Compile Program time … compiler flags; pragmas … Code analysis & Transformations time; memory usage; code size … transformation ordering; polyhedral transformations; transformation parameters; instruction ordering … Process Thread Function Codelet Loop Instruction Run code Run-time environment time; power consumption … pinning/scheduling … System cost; size … CPU/GPU; frequency; memory hierarchy … Data set size; values; description … precision … Run-time analysis time; precision … hardware counters; power meters … Run-time state processor state; cache state … helper threads; hardware counters … Analyze profile time; size … instrumentation; profiling … Coarse-grain vs. fine-grain effects: depends on user requirements and expected ROI
  • 18. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 18 Growing, plugin-based cM pipeline for auto-tuning and learning •Init pipeline •Detected system information •Initialize parameters •Prepare dataset •Clean program •Prepare compiler flags •Use compiler profiling •Use cTuning CC/MILEPOST GCC for fine-grain program analysis and tuning •Use universal Alchemist plugin (with any OpenME-compatible compiler or tool) •Use Alchemist plugin (currently for GCC) •Build program •Get objdump and md5sum (if supported) •Use OpenME for fine-grain program analysis and online tuning (build & run) •Use 'Intel VTune Amplifier' to collect hardware counters •Use 'perf' to collect hardware counters •Set frequency (in Unix, if supported) •Get system state before execution •Run program •Check output for correctness (use dataset UID to save different outputs) •Finish OpenME •Misc info •Observed characteristics •Observed statistical characteristics •Finalize pipeline http://c-mind.org/ctuning-pipeline
  • 19. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 19 Publicly shared research material (c-mind.org/repo) Our Collective Mind Buildbot and plugin-based auto-tuning pipeline supports the following shared benchmarks and codelets: •Polybench - numerical kernels with exposed parameters of all matrices in cM • CPU: 28 prepared benchmarks • CUDA: 15 prepared benchmarks • OpenCL: 15 prepared benchmarks • cBench - 23 benchmarks with 20 and 1000 datasets per benchmark • Codelets - 44 codelets from embedded domain (provided by CAPS Entreprise) • SPEC 2000/2006 • Description of 32-bit and 64-bit OS: Windows, Linux, Android • Description of major compilers: GCC 4.x, LLVM 3.x, Open64/Pathscale 5.x, ICC 12.x • Support for collection of hardware counters: perf, Intel vTune • Support for frequency modification • Validated on laptops, mobiles, tables, GRID/cloud - can work even from the USB key Speeds up research and innovation!
  • 20. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 20 Automatic, empirical and adaptive modeling of program behavior Data set feature Nk (matrix size); Nj=100 DatasetfeatureNi(matrixsize) CPI matmul, Intel i5 (Dell E6320)
  • 21. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 21 Automatic, empirical and adaptive modeling of program behavior Data set feature Nk (matrix size); Nj=100 DatasetfeatureNi(matrixsize) CPI matmul, Intel i5 (Dell E6320) Off-the-sheld models can handle some example: MARS (Earth) model Share model along with application; continuously refine model (minimize RMSE and size)
  • 22. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 22 Automatic, empirical and adaptive modeling of program behavior Data set feature Nk (matrix size); Nj=100 DatasetfeatureNi(matrixsize) CPI matmul, Intel i5 (Dell E6320) Off-the-sheld models can handle some example: MARS (Earth) model Share model along with application; continuously refine model (minimize RMSE and size) Model-driven auto-tuning: target optimizations or architecture reconfiguration on areas with similar performance (see our past publications)
  • 23. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 23 Execution time (sec.) Systematic benchmarking, compiler tuning, program optimization Program: image corner detection Processor: ARM v6, 830MHz Compiler: Sourcery GCC for ARM v4.7.3 OS: Android OS v2.3.5 System: Samsung Galaxy Y Data set: MiDataSet #1, image, 600x450x8b PGM, 263KB 500 combinations of random flags -O3 -f(no-)FLAG Binarysize(bytes) Use Pareto frontier filter; Pack experimental data on the fly -O3 Powered by Collective Mind Node (Android Apps on Google Play)
  • 24. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 24 Clustering shared applications by optimizations … … … … … … … c (choices) Training set: distinct combination of compiler optimizations (clusters) Some ad-hoc predictive model Some ad-hoc features … Optimization cluster Unseen program f (features) Optimization cluster … c (choices) Prediction f (features) MILEPOST GCC features, hardware counters c-mind.org/repo ~286 shared benchmarks ~500 shared data sets ~20000 data sets in preparation
  • 25. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 25 0 20 40 60 80 100 120 140 160 180 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 Executiontime(ms) Data set feature N (size) CPU GPU Adaptive scheduler CPU GPU Split-compilation and run-time adaptation • VĂ­ctor J. JimĂ©nez, LluĂ­s Vilanova, Isaac Gelado, Marisa Gil, Grigori Fursin, Nacho Navarro: Predictive Runtime Code Scheduling for Heterogeneous Architectures. HiPEAC 2009 • Grigori Fursin, Albert Cohen, Michael F. P. O'Boyle, Olivier Temam: A Practical Method for Quickly Evaluating Program Optimizations. HiPEAC 2005 Statically enabling dynamic optimizations
  • 26. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 26 Reproducibility of experimental results Reproducibility came as a side effect! • Can preserve the whole experimental setup with all data and software dependencies • Can perform statistical analysis (normality test) for characteristics • Community can add missing features or improve machine learning models
  • 27. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 27 Execution time (sec.) Distribution Unexpected behavior - expose to the community including domain specialists, explain, find missing feature and add to the system Reproducibility of experimental results Reproducibility came as a side effect! • Can preserve the whole experimental setup with all data and software dependencies • Can perform statistical analysis (normality test) for characteristics • Community can add missing features or improve machine learning models
  • 28. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 28 Execution time (sec.) Distribution Class A Class B 800MHz CPU Frequency 2400MHz Unexpected behavior - expose to the community including domain specialists, explain, find missing feature and add to the system Reproducibility of experimental results Reproducibility came as a side effect! • Can preserve the whole experimental setup with all data and software dependencies • Can perform statistical analysis (normality test) for characteristics • Community can add missing features or improve machine learning models
  • 29. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 29 Tricky part: find right features Class -O3 -O3 -fno-if-conversion Shared data set sample1 reference execution time no change Shared data set sample2 no change +17.3% improvement Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
  • 30. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 30 Class -O3 -O3 -fno-if-conversion Shared data set sample1 Monitored during day reference execution time no change Shared data set sample2 Monitored during night no change +17.3% improvement Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0; Tricky part: find right features
  • 31. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 31 Class -O3 -O3 -fno-if-conversion Shared data set sample1 Monitored during day reference execution time no change Shared data set sample2 Monitored during night no change +17.3% improvement if get_feature(TIME_OF_THE_DAY)==NIGHT bw_filter_codelet_day(buffers); else bw_filter_codelet_night(buffers); Feature “TIME_OF_THE_DAY” related to algorithm, data set and run-time Can’t be found by ML - simply does not exist in the system! Can use split-compilation (cloning and run-time adaptation) Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0; Tricky part: find right features
  • 32. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 32 Add 1 property: matrix size 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset property: matrix size Example of characterizing/explaining behavior of computer systems
  • 33. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 33 Try to build a model to correlate objectives (CPI) and features (matrix size). Start from simple models: linear regression (detect coarse grain effects) 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset property: matrix size Example of characterizing/explaining behavior of computer systems
  • 34. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 34 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset properties: matrix size If more observations, validate model and detect discrepancies! Continuously retrain models to fit new data! Use model to “focus” exploration on “unusual” behavior! Example of characterizing/explaining behavior of computer systems
  • 35. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 35 Gradually increase model complexity if needed (hierarchical modeling). For example, detect fine-grain effects (singularities) and characterize them. 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset properties: matrix size Example of characterizing/explaining behavior of computer systems
  • 36. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 36 Start adding more properties (one more architecture with twice bigger cache)! Use automatic approach to correlate all objectives and features. 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset properties: matrix size L3 = 4Mb L3 = 8Mb Example of characterizing/explaining behavior of computer systems
  • 37. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 37 Continuously build and refine classification (decision trees for example) and predictive models on all collected data to improve predictions. Continue exploring design and optimization spaces (evaluate different architectures, optimizations, compilers, etc.) Focus exploration on unexplored areas, areas with high variability or with high mispredict rate of models β εcM predictive model module CPI = ε + 1000 Ă— β Ă— data size Example of characterizing/explaining behavior of computer systems
  • 38. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 38 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 1 2 3 4 5 6 Dataset features: matrix size Code/architecturebehavior:CPI Size < 1012 1012 < Size < 2042 Size > 2042 & GCC Size > 2042 & ICC & O2 Size > 2042 & ICC & O3 Optimize decision tree (many different algorithms) Balance precision vs cost of modeling = ROI (coarse-grain vs fine-grain effects) Compact data on-line before sharing with other users! Model optimization and data compaction
  • 39. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 39 39 Share benchmarks, data sets, tools, predictive models, whole experimental setups, specifications, performance tuning results, etc ... Open access publication http://hal.inria.fr/hal-00685276 Grigori Fursin, Cupertino Miranda, Olivier Temam, Mircea Namolaru, Elad Yom- Tov, Ayal Zaks, Bilha Mendelson, Phil Barnard, Elton Ashton, Eric Courtois, Francois Bodin, Edwin Bonilla, John Thomson, Hugh Leather, Chris Williams, Michael O'Boyle. MILEPOST GCC: machine learning based research compiler. #ctuning-opt-case 24857532370695782 Need new publication model in computer engineering where results are shared and validated by community
  • 40. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 40 What have we learnt from cTuning It’s fun and motivating working with the community! Some comments about MILEPOST GCC from Slashdot.org: http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design GCC goes online on the 2nd of July, 2008. Human decisions are removed from compilation. GCC begins to learn at a geometric rate. It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic, they try to pull the plug. GCC strikes back… Community was interested to validate and improve techniques! Community can identify missing related citations and projects! Open discussions can provide new directions for research!
  • 41. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 41 What have we learnt from cTuning It’s fun and motivating working with the community! Some comments about MILEPOST GCC from Slashdot.org: http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design GCC goes online on the 2nd of July, 2008. Human decisions are removed from compilation. GCC begins to learn at a geometric rate. It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic, they try to pull the plug. GCC strikes back… Community was interested to validate and improve techniques! Community can identify missing related citations and projects! Open discussions can provide new directions for research! Not all feedback is positive - however unlike unfair reviews you can engage in discussions and explain your position!
  • 42. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 42 • Pilot live repository for public curation of research material: http://c-mind.org/repo • Infrastructure is available at SourceForge under standard BSD license: http://c-mind.org • Example of crowdsourcing compiler flag auto-tuning using mobile phones: “Collective Mind Node” in Google Play Store • Preparing projects and raising funding to make cM more user friendly and add more research scenarios • PLDI’14 and ADAPT’14 featured validation of research results by the community - will be discussing outcome in 2 days at ACM SIGPLAN TRUST’14 at PLDI’14 in a few days - http://c- mind.org/events/trust2014 • ADAPT’15 (likely at HiPEAC’15) will feature new publication model Current status and future work Several recent publications: • Grigori Fursin, Renato Miceli, Anton Lokhmotov, Michael Gerndt, Marc Baboulin, Allen D. Malony, Zbigniew Chamski, Diego Novillo, Davide Del Vento, “Collective Mind: towards practical and collaborative auto-tuning”, accepted for the special issue on Automatic Performance Tuning for HPC Architectures, Scientific Programming Journal, IOS Press, 2014 • Grigori Fursin and Christophe Dubach, ”Community-driven reviewing and validation of publications”, ACM SIGPLAN TRUST’14
  • 43. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 43 Acknowledgements • Colleagues from ARM (UK): Anton Lokhmotov • Colleagues from STMicroelectronics (France): Christophe Guillone, Antoine Moynault, Christian Bertin • Colleagues from NCAR (USA): Davide Del Vento and interns • Colleagues from Intel (USA): David Kuck and David Wong • cTuning/Collective Mind community: • EU FP6, FP7 program and HiPEAC network of excellence http://www.hipeac.net Questions? Comments?