SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
Collective Mind
a collaborative curation tool for program optimization
Grigori Fursin
INRIA, France
SEA 2014, Boulder, USA
April 2014
Towards collaborative, systematic and reproducible
design and optimization of computer systems
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 2
• Interdisciplinary background
• Back to basics: major problems in computer engineering
• Collaborative and Reproducible Research and Development:
 cTuning.org framework and repository
 Exposed problems during practical usage
 Collective Mind framework and repository
 Collaborative research usage scenarios
 Collaborative curation for program optimization
 Changing R&D methodology and publication model
• Conclusions and future work
Outline
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 3
Back to 1993: physics, electronics, machine learning
Semiconductor neuron -
base of neural accelerators
and possible future neuro computers
Modeling and understanding
brain functions
My problem
with modeling:
• Slow
• Unreliable
• Costly
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 4
Available solutions
Result
Application
Compilers
Binary and libraries
Architecture
Run-time environment
State of the system Data set
Algorithm
End
User
task
Back to basics: task pipeline (workflow)
I needed help from computer scientists
to find the best solutions for my tasks!
User requirements:
minimize all costs (characteristics)
(execution time, power consumption, price,
size, faults, etc)
guarantee real-time constraints
(bandwidth, QoS, etc)
Service/application providers
(cloud computing, supercomputers, mobile systems)
Hardware and software designers
(processors, memory, interconnects,
languages, compilers, run-time systems)
Storage
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 5
GCC optimizations
Result
End
User
task
Back to basics: ever rising complexity of computer systems
Deliver universal and optimal
optimization heuristic is often
impossible!
1) Too many design and optimization choices
at all levels
2) Always multi-objective optimization:
performance vs compilation time vs code
size vs system size vs power consumption vs
reliability vs return on investment
3) Complex relationship and interactions
between ALL software and hardware
components
4) Users and developers often have to resort
to empirical auto-tuning
Empirical auto-tuning is too time consuming,
ad-hoc and tedious to be a mainstream!
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 6
• Can we leverage their experience and computational resources?
• Can we build common research and experimentation infrastructure
connected to a public repository of knowledge?
• Can we extrapolate collected knowledge using machine learning to
predict optimal program optimizations, hardware designs and run-time
adaptation scenarios?
Many end users and service providers run numerous applications
on different architectures with different data sets, run-time systems, compilers
and optimizations!
cTuning.org: combine auto-tuning with machine learning and crowdsourcing
Proposed and developed in the MILEPOST/cTuning project (2006-2009) to automate and crowdsource
training of a machine learning based self-tuning compiler for reconfigurable processors (SW/HW co-design).
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 7
cTuning.org: plugin-based auto-tuning and learning framework
cTuning fixed workflow
Program
cTuning wrapper: extract program
semantic (static) features using
MILEPOST GCC with ICI
cTuning run wrapper
cTuning third-party analysis plugins:
profile and collect hardware counters
(dynamic features) using gprof, perf,
PAPI,etc
Data set
cTuning auto-tuning plugin: explore
design and optimization space
cTuning compile wrapper
Pareto frontier plugin: balance
exec.time, code size, compilation time,
power consumption, etc
Machine learning plugins: KNN, SVM …
Common open-source cTuning CC framework (http://cTuning.org/ctuning-cc)
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 8
cTuning fixed workflow
cTuning.org: plugin-based auto-tuning and learning framework
Program
cTuning third-party analysis plugins:
profile and collect hardware counters
(dynamic features) using gprof, perf,
PAPI,etc OS and platform shared
descriptions (txt)
Data set
cTuning auto-tuning plugin: explore
design and optimization space
Shared benchmarks,
codelets, kernels from SVN
(cTuning.org/cbench)
Plugins with strategies:
random, hill climbing,
probabilistic, genetic …
Multiple shared compiler
descriptions for GCC, ICC,
Open64, LLVM, PathScale,
XLC… (txt)
Shared datasets from SVN
(cTuning.org/cdatasets)
Pareto frontier plugin: balance
exec.time, code size, compilation time,
power consumption, etc
Machine learning plugins: KNN, SVM …
Shared material in
various formats from
multiple sources
`
Common open-source cTuning CC framework (http://cTuning.org/ctuning-cc)
cTuning wrapper: extract program
semantic (static) features using
MILEPOST GCC with ICI
cTuning run wrapper
cTuning compile wrapper
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 9
cTuning.org: plugin-based auto-tuning and learning framework
cTuning fixed workflow
Program
cTuning third-party analysis plugins:
profile and collect hardware counters
(dynamic features) using gprof, perf,
PAPI,etc OS and platform shared
descriptions (txt)
Data set
cTuning auto-tuning plugin: explore
design and optimization space
Shared benchmarks,
codelets, kernels from SVN
(cTuning.org/cbench)
Plugins with strategies:
random, hill climbing,
probabilistic, genetic …
Multiple shared compiler
descriptions for GCC, ICC,
Open64, LLVM, PathScale,
XLC… (txt)
Shared datasets from SVN
(cTuning.org/cdatasets)
Pareto frontier plugin: balance
exec.time, code size, compilation time,
power consumption, etc
Machine learning plugins: KNN, SVM …
Shared material in
various formats from
multiple sources
`
cTuning.org
cTuningunifiedphp-basedwebservices
MySQL cTuning
database:
optimization
statistics from
multiple users
Interdisciplinary
crowd
Collaboratively
searching for best
optimizations and
learning techniques
Common open-source cTuning CC framework (http://cTuning.org/ctuning-cc)
cTuning wrapper: extract program
semantic (static) features using
MILEPOST GCC with ICI
cTuning run wrapper
cTuning compile wrapper
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 10
cTuning.org: plugin-based auto-tuning and learning framework
cTuning fixed workflow
Program
cTuning third-party analysis plugins:
profile and collect hardware counters
(dynamic features) using gprof, perf,
PAPI,etc OS and platform shared
descriptions (txt)
Data set
cTuning auto-tuning plugin: explore
design and optimization space
Shared benchmarks,
codelets, kernels from SVN
(cTuning.org/cbench)
Plugins with strategies:
random, hill climbing,
probabilistic, genetic …
Multiple shared compiler
descriptions for GCC, ICC,
Open64, LLVM, PathScale,
XLC… (txt)
Shared datasets from SVN
(cTuning.org/cdatasets)
Pareto frontier plugin: balance
exec.time, code size, compilation time,
power consumption, etc
Machine learning plugins: KNN, SVM …
Shared material in
various formats from
multiple sources
cTuning.org
cTuningunifiedphp-basedwebservices
MySQL cTuning
database:
optimization
statistics from
multiple users
Interdisciplinary
crowd
Collaboratively
searching for best
optimizations and
learning techniques
Common open-source cTuning CC framework (http://cTuning.org/ctuning-cc)
`
cTuning wrapper: extract program
semantic (static) features using
MILEPOST GCC with ICI
cTuning run wrapper
cTuning compile wrapper
In 2009, we managed to automatically tune
customer benchmarks and compiler heuristics
for a range of real platforms
from IBM and ARC (Synopsis)
Everything is solved?
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 11
Technological chaos
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS 2013
XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA 4.x
gprofprof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
predictive
scheduling
algorithm-
level TBB
MKL
ATLAS
program-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO threads
process
pass
reordering
KNN
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB ISA
memory size
ARM v6
threads
execution time
reliability
GCC 4.8.x
LLVM 3.4
SVM
genetic
algorithms
cTuning practical usage exposed a few problems
ARM v8
Intel SandyBridge
SSE4
AVX
Many thanks to international academic
and industrial users for their feedback
http://cTuning.org/lab/people
Thanks to Davide Del Vento
and his interns from UCAR
CUDA 5.x
SimpleScalar
algorithm precision
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 12
cTuning practical usage exposed a few problems
• Ever changing environment
(SW and HW choices, features, tools and libraries
versions, interfaces, data formats …)
• Difficult to reproduce experimental results
from different users
• Difficult to keep cTuning up-to-date - by the
end of development cycle, new software and
hardware is often available
• Complex deployment and maintenance
• Complex, manual configuration of the
framework, repository, plugins, hardware and
all dependencies
• Many used languages (C, C++, PHP, Java) and
data exchange formats (txt, csv, xls, …)
• Different versions of benchmarks, data sets,
plugins, models, compilers, third-party tools,
etc are scattered around many directories and
disks
Technological chaos
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS 2013
XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA 4.x
gprofprof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
predictive
scheduling
algorithm-
level TBB
MKL
ATLAS
program-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO threads
process
pass
reordering
KNN
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB ISA
memory size
ARM v6
threads
execution time
reliability
GCC 4.8.x
LLVM 3.4
SVM
genetic
algorithms
ARM v8
Intel SandyBridge
SSE4
AVX
CUDA 5.x
SimpleScalar
algorithm precision
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 13
cTuning practical usage exposed a few problems
• Difficult or impossible to reproduce and reuse
techniques from existing publications
• No culture of sharing code and data in
computer engineering unlike other sciences
• Impossible to publish negative results or just
validation of existing works
• Academia focuses more and more on quantity
(number of publications and citations) rather
than quality and reproducibility of results
• No simple and scalable mechanism to preserve,
share and reuse all heterogeneous experimental
material
• Centralized, MySQL-based repository got
quickly saturated when collecting and
processing data from many users
• MySQL repository is difficult to adapt to
continuous changes in the system
• Difficult to preserve different versions of
benchmarks, data sets, plugins and tools, and
make them co-exist using SVN repositories
Technological chaos
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS 2013
XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA 4.x
gprofprof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
predictive
scheduling
algorithm-
level TBB
MKL
ATLAS
program-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO threads
process
pass
reordering
KNN
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB ISA
memory size
ARM v6
threads
execution time
reliability
GCC 4.8.x
LLVM 3.4
SVM
genetic
algorithms
ARM v8
Intel SandyBridge
SSE4
AVX
CUDA 5.x
SimpleScalar
algorithm precision
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 14
cTuning practical usage exposed a few problems
• Difficult or impossible to reproduce and reuse
techniques from existing publications
• No culture of sharing code and data in
computer engineering unlike other sciences
• Impossible to publish negative results or just
validation of existing works
• Academia focuses more and more on quantity
(number of publications and citations) rather
than quality and reproducibility of results
• No simple and scalable mechanism to preserve,
share and reuse all heterogeneous experimental
material
• Centralized, MySQL-based repository got
quickly saturated when collecting and
processing data from many users
• MySQL repository is difficult to adapt to
continuous changes in the system
• Difficult to preserve different versions of
benchmarks, data sets, plugins and tools, and
make them co-exist using SVN repositories
Technological chaos
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS 2013
XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA 4.x
gprofprof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
predictive
scheduling
algorithm-
level TBB
MKL
ATLAS
program-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO threads
process
pass
reordering
KNN
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB ISA
memory size
ARM v6
threads
execution time
reliability
GCC 4.8.x
LLVM 3.4
SVM
genetic
algorithms
ARM v8
Intel SandyBridge
SSE4
AVX
CUDA 5.x
SimpleScalar
algorithm precision
We face typical “big data” problem!
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 15
Collective Mind: attempt to fix cTuning problems
cTuning fixed workflow
Program
cTuning third-party analysis plugins:
profile and collect hardware counters
(dynamic features) using gprof, perf,
PAPI,etc OS and platform shared
descriptions (txt)
Data set
cTuning auto-tuning plugin: explore
design and optimization space
Shared benchmarks,
codelets, kernels from SVN
(cTuning.org/cbench)
Plugins with strategies:
random, hill climbing,
probabilistic, genetic …
Multiple shared compiler
descriptions for GCC, ICC,
Open64, LLVM, PathScale,
XLC… (txt)
Shared datasets from SVN
(cTuning.org/cdatasets)
Pareto frontier plugin: balance
exec.time, code size, compilation time,
power consumption, etc
Machine learning plugins: KNN, SVM …
Shared material in
various formats from
multiple sources
cTuning.org
cTuningunifiedphp-basedwebservices
MySQL cTuning
database:
optimization
statistics from
multiple users
Interdisciplinary
crowd
Collaboratively
searching for best
optimizations and
learning techniques
Common open-source Collective Mind framework (http://c-mind.org)
`
cTuning wrapper: extract program
semantic (static) features using
MILEPOST GCC with ICI
cTuning run wrapper
cTuning compile wrapper
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 16
Collective Mind: attempt to fix cTuning problems
Collective Mind flexible pipeline
Program
cTuning third-party analysis plugins:
profile and collect hardware counters
(dynamic features) using gprof, perf,
PAPI,etc OS and platform shared
descriptions (txt)
Data set
cTuning auto-tuning plugin: explore
design and optimization space
Shared benchmarks,
codelets, kernels from SVN
(cTuning.org/cbench)
Plugins with strategies:
random, hill climbing,
probabilistic, genetic …
Multiple shared compiler
descriptions for GCC, ICC,
Open64, LLVM, PathScale,
XLC… (txt)
Shared datasets from SVN
(cTuning.org/cdatasets)
Pareto frontier plugin: balance
exec.time, code size, compilation time,
power consumption, etc
Machine learning plugins: KNN, SVM …
Shared material in
various formats from
multiple sources
cTuning.org
cTuningunifiedphp-basedwebservices
MySQL cTuning
database:
optimization
statistics from
multiple users
Interdisciplinary
crowd
Collaboratively
searching for best
optimizations and
learning techniques
`
cTuning wrapper: extract program
semantic (static) features using
MILEPOST GCC with ICI
cTuning run wrapper
cTuning compile wrapper
Unify plugins and
tool wrappers:
cm <plugin_name>
params -- orig. CMD
Use high-level,
portable language
Common open-source Collective Mind framework (http://c-mind.org)
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 17
Collective Mind: attempt to fix cTuning problems
Collective Mind flexible pipeline
Program
cTuning third-party analysis plugins:
profile and collect hardware counters
(dynamic features) using gprof, perf,
PAPI,etc OS and platform shared
descriptions (txt)
Data set
cTuning auto-tuning plugin: explore
design and optimization space
Shared benchmarks,
codelets, kernels from SVN
(cTuning.org/cbench)
Plugins with strategies:
random, hill climbing,
probabilistic, genetic …
Multiple shared compiler
descriptions for GCC, ICC,
Open64, LLVM, PathScale,
XLC… (txt)
Shared datasets from SVN
(cTuning.org/cdatasets)
Pareto frontier plugin: balance
exec.time, code size, compilation time,
power consumption, etc
Machine learning plugins: KNN, SVM …
Shared material in
various formats from
multiple sources
c-mind.org/repo
cMunifiedwebservices
MySQL cTuning
database:
optimization
statistics from
multiple users
Interdisciplinary
crowd
Collaborative
curation of all
research material
`
cTuning wrapper: extract program
semantic (static) features using
MILEPOST GCC with ICI
cTuning run wrapper
cTuning compile wrapper
Unify plugins and
tool wrappers:
cm <plugin_name>
params -- orig. CMD
Use high-level,
portable language
New repository
which can keep
both collections of
files and meta-
description in
some unified
format
Common open-source Collective Mind framework (http://c-mind.org)
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 18
JSON: extensible and human-readable
format
file data.json = {
"characteristics":{
"execution times": ["10.3","10.1","13.3"],
"code size": "131938", ...},
"choices":{
"os":"linux", "os version":"2.6.32-5-amd64",
"compiler":"gcc", "compiler version":"4.6.3",
"compiler_flags":"-O3 -fno-if-conversion",
"platform":{"processor":"intel xeon e5520",
"l2":"8192“, ...}, ...},
"features":{
"semantic features": {"number_of_bb": "24", ...},
"hardware counters": {"cpi": "1.4" ...}, ... }
"state":{
"frequency":"2.27", ...}}
Can be directly indexed using web-based,
distributed ElasticSearch (Apache2 license)
Python: high-level, portable, popular
language with JIT, JSON support and
many statistical and machine learning
libraries
import json
f=file('data.json')
array=json.loads(f.read())
f.close()
et=array[‘characteristics’][‘execution_times’]
for tm in et:
print ‘Execution time:’, tm
Collective Mind: which technology to use?
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 19
Behavior
Choices
Features
State
Hardwired experimental setups, very difficult to change, scale or share
Convert ad-hoc experiments to cM python modules
Meta description that
should be exposed in
the information flow
for auto-tuning and
machine learning
Run-time2
CompilerN
Compiler2
Compiler1 Run-time1 Ad-hoc
analysis
scripts
Ad-hoc
tuning
scripts
Collection of
CSV, XLS, TXT
and other files
Offline tuning
User
program,
dataset
Online tuning
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 20
cM module (wrapper) with unified and formalized input and output
ProcessCMD
Tool BVi Generated files
Original
unmodified
ad-hoc
input
Convert ad-hoc experiments to cM python modules
Behavior
Choices
Features
State
Run-time2
CompilerN
Compiler2
Compiler1 Run-time1 Ad-hoc
analysis
scripts
Ad-hoc
tuning
scripts
Collection of
CSV, XLS, TXT
and other files
User
program,
dataset
Online tuning
Offline tuning
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 21
cM module (wrapper) with unified and formalized input and output
Unified JSON
input (meta-data)
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
… … … …
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
Convert ad-hoc experiments to cM python modules
Run-time2
CompilerN
Compiler2
Compiler1 Run-time1 Ad-hoc
analysis
scripts
Ad-hoc
tuning
scripts
Collection of
CSV, XLS, TXT
and other files
User
program,
dataset
Online tuning
Offline tuning
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 22
cM module (wrapper) with unified and formalized input and output
Unified JSON
input (meta-data)
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Set
environment
for a given
tool version
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
… … … …
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
Convert ad-hoc experiments to cM python modules
Multiple tool versions
can co-exist, while their
interface is abstracted
by cM module
Run-time2
CompilerN
Compiler2
Compiler1 Run-time1 Ad-hoc
analysis
scripts
Ad-hoc
tuning
scripts
Collection of
CSV, XLS, TXT
and other files
User
program,
dataset
Online tuning
Offline tuning
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 23
cM module (wrapper) with unified and formalized input and output
Unified JSON
input (meta-data)
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Set
environment
for a given
tool version
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
… … … …
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line)
cm compiler build -- icc -fast *.c
cm code.source build ct_compiler=icc13 ct_optimizations=-fast
cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm
Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
Convert ad-hoc experiments to cM python modules
Run-time2
CompilerN
Compiler2
Compiler1 Run-time1 Ad-hoc
analysis
scripts
Ad-hoc
tuning
scripts
Collection of
CSV, XLS, TXT
and other files
User
program,
dataset
Online tuning
Offline tuning
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 24
Data abstraction in Collective Mind
compiler GCC 4.4.4
GCC 4.7.1
LLVM 3.1
LLVM 3.4
cM module JSON meta-description
Compiler
flags
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 25
Data abstraction in Collective Mind
compiler GCC 4.4.4
GCC 4.7.1
LLVM 3.1
LLVM 3.4
package GCC 4.7.1 bin
GCC 4.7.1 source
LLVM 3.4
gmp 5.0.5
mpfr 3.1.0
lapack 2.3.0
java apache commons codec 1.7
…
…
…
…
…
…
…
cM module JSON meta-descriptionFiles, directories
Compiler
flags
Installation
info
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 26
Data abstraction in Collective Mind
compiler GCC 4.4.4
GCC 4.7.1
LLVM 3.1
LLVM 3.4
package GCC 4.7.1 bin
GCC 4.7.1 source
LLVM 3.4
gmp 5.0.5
mpfr 3.1.0
lapack 2.3.0
java apache commons codec 1.7
dataset image-jpeg-0001
bzip2-0006
txt-0012
…
…
…
…
…
…
…
…
…
…
module compiler
package
dataset
…
…
…
cM module JSON meta-descriptionFiles, directories
Compiler
flags
Installation
info
Features
Actions
.cmr / module UOA / data UOA (UID or alias) / .cm / data.json
cMrepositorydirectorystructure:
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 27
Data abstraction in Collective Mind
compiler GCC 4.4.4
GCC 4.7.1
LLVM 3.1
LLVM 3.4
package GCC 4.7.1 bin
GCC 4.7.1 source
LLVM 3.4
gmp 5.0.5
mpfr 3.1.0
lapack 2.3.0
java apache commons codec 1.7
dataset image-jpeg-0001
bzip2-0006
txt-0012
…
…
…
…
…
…
…
…
…
…
module compiler
package
dataset
…
…
…
cM module JSON meta-descriptionFiles, directories
Compiler
flags
Installation
info
Features
Actions
.cmr / module UOA / data UOA (UID or alias) / .cm / data.json
cMrepositorydirectorystructure:
Now can reference and find any data by CID
(similar to DOI):
<module UOA : data UOA>
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 28
Data abstraction in Collective Mind
compiler GCC 4.4.4
GCC 4.7.1
LLVM 3.1
LLVM 3.4
package GCC 4.7.1 bin
GCC 4.7.1 source
LLVM 3.4
gmp 5.0.5
mpfr 3.1.0
lapack 2.3.0
java apache commons codec 1.7
dataset image-jpeg-0001
bzip2-0006
txt-0012
…
…
…
…
…
…
…
…
…
…
module compiler
package
dataset
…
…
…
cM module JSON meta-descriptionFiles, directories
Compiler
flags
Installation
info
Features
Actions
Dependenciesbetweendataandmodules
.cmr / module UOA / data UOA (UID or alias) / .cm / data.json
cMrepositorydirectorystructure:
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 29
Gradually adding specification (agile development)
cTuning experiment module data.json
{
"characteristics":{
"execution times": ["10.3","10.1","13.3"],
"code size": "131938", ...},
"choices":{
"os":"linux", "os version":"2.6.32-5-amd64",
"compiler":"gcc", "compiler version":"4.6.3",
"compiler_flags":"-O3 -fno-if-conversion",
"platform":{"processor":"intel xeon e5520",
"l2":"8192“, ...}, ...},
"features":{
"semantic features": {"number_of_bb": "24", ...},
"hardware counters": {"cpi": "1.4" ...}, ... }
"state":{
"frequency":"2.27", ...}
}
cM flattened JSON key
##characteristics#execution_times@1
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 30
Gradually adding specification (agile development)
cTuning experiment module data.json
{
"characteristics":{
"execution times": ["10.3","10.1","13.3"],
"code size": "131938", ...},
"choices":{
"os":"linux", "os version":"2.6.32-5-amd64",
"compiler":"gcc", "compiler version":"4.6.3",
"compiler_flags":"-O3 -fno-if-conversion",
"platform":{"processor":"intel xeon e5520",
"l2":"8192“, ...}, ...},
"features":{
"semantic features": {"number_of_bb": "24", ...},
"hardware counters": {"cpi": "1.4" ...}, ... }
"state":{
"frequency":"2.27", ...}
}
cM flattened JSON key
##characteristics#execution_times@1
"flattened_json_key”:{
"type": "text”|"integer" | “float" | "dict" | "list”
| "uid",
"characteristic": "yes" | "no",
"feature": "yes" | "no",
"state": "yes" | "no",
"has_choice": "yes“ | "no",
"choices": [ list of strings if categorical
choice],
"explore_start": "start number if numerical
range",
"explore_stop": "stop number if numerical
range",
"explore_step": "step if numerical range",
"can_be_omitted" : "yes" | "no"
...
}
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 31
TransparentlyindexedbyElasticSearch
Gradually adding, extending and improving modules and data
Applications, benchmarks,
codelets, kernels
Auto-tuning compilers
Binaries and libraries
Processors
Operating Systems
Data sets
Statistical and data
mining plugins
Compiler machine
learning plugins
Third-party tools, libraries
code.source
ctuning.compiler
code
processor
os
dataset
math.statistics.r
math.model
algorithm
Category cM module Module actions
common*, transform
common*,
detect_host_processor
common*, detect_host_family
common*, create
common*, analyze
common*, build, predict, fit,
detect_representative_points
common*, installpackage
High-level algorithms
common*, build
common*, compile_program
common*, run
All data
…
…
…
…
…
…
…
…
…
…
* add, list, view, copy, move, search
Meta-description
…
…
…
…
…
…
…
…
…
…
.cmr / module UOA (UID or alias) / data UOA / .cm / data.json
cM repository
directory structure:
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 32
cM interface: only one main function!
Simple and minimalistic high-level cM interface - one function (!)
(python dictionary) output = cm_kernel.access ( (python dictionary) input )
Load data:
r = cm_kernel.access({‘cm_run_module_uoa’:’dataset’,
‘cm_action’:’load’,
‘ cm_data_uoa’:’image-jpeg-0001’})
if r[‘cm_return’]>0: return r
d=r[‘cm_data_obj’][‘cfg’]
Call R machine learning module:
r = cm_kernel.access({‘cm_run_module_uoa’:’ cm math.model.r’,
‘cm_action’:’ build’,
‘model_name’:’earth’, …})
if r[‘cm_return’]>0: return r
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 33
Assembling, preserving, sharing and extending the whole pipeline as “LEGO”
cM tool wrapper with unified and formalized input and output
Unified JSON
input (meta-data)
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Set
environment
for a given
tool version
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
… … … …
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
Chaining cM components (wrappers) to an experimental pipeline for a given research and experimentation scenario
Public modular auto-tuning and machine
learning repository and buildbot
Unified
web services Interdisciplinary crowd
Choose
exploration
strategy
Generate choices (code
sample, data set, compiler,
flags, architecture …)
Compile
source
code
Run
code
Test
behavior
normality
Pareto
filter
Modeling
and
prediction
Complexity
reduction
Shared scenarios from past research
…
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 34
Gradually expose
some characteristics
Gradually expose
some choices and features
Compile Program time … compiler flags; pragmas …
Run code Run-time
environment
time; CPI, power
consumption …
pinning/scheduling …
System cost; architecture; frequency; cache size…
Data set size; values; description … precision …
Analyze profile time; size … instrumentation; profiling …
Start coarse-grain decomposition of a system (detect coarse-grain effects first). Add universal learning modules.
Top-down decomposition and learning of computer systems
I now gradually convert to Collective Mind and validate
past research techniques with the great help of volunteers
(sadly such work is usually not funded, can’t be easily published,
and is often considered as a waste of time by academic community)
Methodology similar to physics:
start from coarse-grain and gradually move to fine-grain level!
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 35
Growing, plugin-based cM pipeline for auto-tuning and learning
•Init pipeline
•Detected system information
•Initialize parameters
•Prepare dataset
•Clean program
•Prepare compiler flags
•Use compiler profiling
•Use cTuning CC/MILEPOST GCC for fine-grain program analysis and tuning
•Use universal Alchemist plugin (with any OpenME-compatible compiler or tool)
•Use Alchemist plugin (currently for GCC)
•Build program
•Get objdump and md5sum (if supported)
•Use OpenME for fine-grain program analysis and online tuning (build & run)
•Use 'Intel VTune Amplifier' to collect hardware counters
•Use 'perf' to collect hardware counters
•Set frequency (in Unix, if supported)
•Get system state before execution
•Run program
•Check output for correctness (use dataset UID to save different outputs)
•Finish OpenME
•Misc info
•Observed characteristics
•Observed statistical characteristics
•Finalize pipeline
http://c-mind.org/ctuning-pipeline
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 36
Publicly shared research material (c-mind.org/repo)
Our Collective Mind Buildbot and plugin-based auto-tuning pipeline supports the
following shared benchmarks and codelets:
•Polybench - numerical kernels with exposed parameters of all matrices in cM
• CPU: 28 prepared benchmarks
• CUDA: 15 prepared benchmarks
• OpenCL: 15 prepared benchmarks
• cBench - 23 benchmarks with 20 and 1000 datasets per benchmark
• Codelets - 44 codelets from embedded domain (provided by CAPS Entreprise)
• SPEC 2000/2006
• Description of 32-bit and 64-bit OS: Windows, Linux, Android
• Description of major compilers: GCC 4.x, LLVM 3.x, Open64/Pathscale 5.x, ICC 12.x
• Support for collection of hardware counters: perf, Intel vTune
• Support for frequency modification
• Validated on laptops, mobiles, tables, GRID/cloud - can work even from the USB key
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 37
Implemented scenario: multi-objective compiler auto-tuning using mobile phones
Program: image corner detection Processor: ARM v6, 830MHz
Compiler: Sourcery GCC for ARM v4.7.3 OS: Android OS v2.3.5
System: Samsung Galaxy Y Data set: MiDataSet #1, image, 600x450x8b PGM, 263KB
500 combinations of random flags -O3 -f(no-)FLAG
Binarysize(bytes)
Execution time (sec.)
Use Pareto
frontier filter;
Pack
experimental
data on the fly
-O3
Powered by Collective Mind Node (Android Apps on Google Play)
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 38
Collective Mind experimental pipelines can now survive changes in the system!
Modules and data can evolve with the evolution of the technology!
From agile development to agile research!
Community shares, validates and improves benchmarks, data sets, tools, design
and optimization choices, features, characteristics, models, …
Collective Mind agile research and development
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 39
Reproducibility of experimental results
Reproducibility came as a side effect!
• Can preserve the whole experimental setup with all data and software dependencies
• Can perform statistical analysis (normality test) for characteristics
• Community can add missing features or improve machine learning models
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 40
Execution time (sec.)
Distribution
Unexpected behavior - expose to the community including domain specialists,
explain, find missing feature and add to the system
Reproducibility of experimental results
Reproducibility came as a side effect!
• Can preserve the whole experimental setup with all data and software dependencies
• Can perform statistical analysis (normality test) for characteristics
• Community can add missing features or improve machine learning models
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 41
Execution time (sec.)
Distribution
Class A Class B
800MHz CPU Frequency 2400MHz
Unexpected behavior - expose to the community including domain specialists,
explain, find missing feature and add to the system
Reproducibility of experimental results
Reproducibility came as a side effect!
• Can preserve the whole experimental setup with all data and software dependencies
• Can perform statistical analysis (normality test) for characteristics
• Community can add missing features or improve machine learning models
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 42
My dream: systematic, collaborative and reproducible computer engineering
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.1
Phoenix
MVS XLC
Open64
Jikes
Testarossa
OpenMP
MPI
HMPP
OpenCL
CUDA
gprof
prof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
scheduling
algorithm-level
TBB
MKL
ATLASprogram-level
function-level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO
threads
process pass reordering
run-time adaptation
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB
ISA
memory size
coresprocessors
threads
power consumption
execution time reliability
Current state of computer engineering
likwid
Sharing of
code and data
Classification,
predictive
modeling
Systematization and unification
of collected knowledge
(big data)
“crowd”
c-mind.org/repo
Collaborative Infrastructure and repository
•Prototype research idea
•Validate existing work
•Perform end-user task
Result
• Quick, non-reproducible hack?
• Ad-hoc heuristic?
• Quick publication?
• No shared code and data?
• Share code, data with their meta-description and
dependencies
• Systematize and classify collected optimization
knowledge
• Develop and preserve the whole experimental
pipeline
• Validate experimental results by the community
• Extrapolate collected knowledge to build faster,
smaller, more power efficient and reliable
computer systems
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 43
• Pilot live repository for public curation of research material: http://c-mind.org/repo
• Infrastructure is available at SourceForge under standard BSD license: http://c-mind.org
• Example of crowdsourcing compiler flag auto-tuning using mobile phones: “Collective Mind Node” in
Google Play Store
• Several publications under submission
• Raising funding to make cM more user friendly and add more research scenarios
Current status and future work
Grigori Fursin, “Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing,
big data and machine learning”, INRIA Tech. report No 00850880, August 2013
http://hal.inria.fr/hal-00850880 http://arxiv.org/abs/1308.2410
Education Academic research
New publication model where research material and
experimental results are shared, validated and reused
by the community
http://ctuning.org/reproducibility
• ACM SIGPLAN TRUST 2014 @ PLDI 2014
http://c-mind.org/events/trust2014
• Panel at ADAPT 2014 @ HiPEAC 2014
http://adapt-workshop.org
(this year we attempted to reproduce some submitted
articles)
• Systematizing, validating, sharing past research
techniques on auto-tuning and machine learning
through cM pipeline
• Optimal feature and model selection
• Finding representative benchmarks and data sets
• Run-time adaptation and ML
Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 44
Acknowledgements
• Colleagues from NCAR (USA): Davide Del Vento et.al.
• Colleagues from STMicroelectronics (France):
Christophe Guillone, Antoine Moynault, Christian Bertin
• Colleagues from ARM (UK): Anton Lokhmotov
• Colleagues from Intel (USA): David Kuck and David Wong
• cTuning community:
http://cTuning.org/lab/people
• EU FP6, FP7 program and HiPEAC network of excellence
http://www.hipeac.net
• INRIA fellowship (2012-2016)
Thank you for your attention!
Questions and comments: grigori.fursin@inria.fr

Mais conteúdo relacionado

Semelhante a Collective Mind: a collaborative curation tool for program optimization

Collective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the massesCollective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the masses
Grigori Fursin
 
CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...
Grigori Fursin
 
Ujan Sengupta Resume
Ujan Sengupta ResumeUjan Sengupta Resume
Ujan Sengupta Resume
Ujan Sengupta
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Bhaskar Mitra
 
Presentation
PresentationPresentation
Presentation
butest
 

Semelhante a Collective Mind: a collaborative curation tool for program optimization (20)

Collective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the massesCollective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the masses
 
CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...
 
Enabling open and reproducible computer systems research: the good, the bad a...
Enabling open and reproducible computer systems research: the good, the bad a...Enabling open and reproducible computer systems research: the good, the bad a...
Enabling open and reproducible computer systems research: the good, the bad a...
 
Cluster Setup Manual Using Ubuntu and MPICH
Cluster Setup Manual Using Ubuntu and MPICHCluster Setup Manual Using Ubuntu and MPICH
Cluster Setup Manual Using Ubuntu and MPICH
 
Ujan Sengupta Resume
Ujan Sengupta ResumeUjan Sengupta Resume
Ujan Sengupta Resume
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTK
 
Panel at acm_sigplan_trust2014
Panel at acm_sigplan_trust2014Panel at acm_sigplan_trust2014
Panel at acm_sigplan_trust2014
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
Reproducibility in artificial intelligence
Reproducibility in artificial intelligenceReproducibility in artificial intelligence
Reproducibility in artificial intelligence
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine Learning
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
 
Sambhab_Mohapatra
Sambhab_MohapatraSambhab_Mohapatra
Sambhab_Mohapatra
 
Presentation
PresentationPresentation
Presentation
 
The Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
The Impact of Compiler Auto-Optimisation on Arm-based HPC MicroarchitecturesThe Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
The Impact of Compiler Auto-Optimisation on Arm-based HPC Microarchitectures
 
Software component reuse repository
Software component reuse repositorySoftware component reuse repository
Software component reuse repository
 
Resume
ResumeResume
Resume
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Collective Mind: a collaborative curation tool for program optimization

  • 1. Collective Mind a collaborative curation tool for program optimization Grigori Fursin INRIA, France SEA 2014, Boulder, USA April 2014 Towards collaborative, systematic and reproducible design and optimization of computer systems
  • 2. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 2 • Interdisciplinary background • Back to basics: major problems in computer engineering • Collaborative and Reproducible Research and Development:  cTuning.org framework and repository  Exposed problems during practical usage  Collective Mind framework and repository  Collaborative research usage scenarios  Collaborative curation for program optimization  Changing R&D methodology and publication model • Conclusions and future work Outline
  • 3. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 3 Back to 1993: physics, electronics, machine learning Semiconductor neuron - base of neural accelerators and possible future neuro computers Modeling and understanding brain functions My problem with modeling: • Slow • Unreliable • Costly
  • 4. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 4 Available solutions Result Application Compilers Binary and libraries Architecture Run-time environment State of the system Data set Algorithm End User task Back to basics: task pipeline (workflow) I needed help from computer scientists to find the best solutions for my tasks! User requirements: minimize all costs (characteristics) (execution time, power consumption, price, size, faults, etc) guarantee real-time constraints (bandwidth, QoS, etc) Service/application providers (cloud computing, supercomputers, mobile systems) Hardware and software designers (processors, memory, interconnects, languages, compilers, run-time systems) Storage
  • 5. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 5 GCC optimizations Result End User task Back to basics: ever rising complexity of computer systems Deliver universal and optimal optimization heuristic is often impossible! 1) Too many design and optimization choices at all levels 2) Always multi-objective optimization: performance vs compilation time vs code size vs system size vs power consumption vs reliability vs return on investment 3) Complex relationship and interactions between ALL software and hardware components 4) Users and developers often have to resort to empirical auto-tuning Empirical auto-tuning is too time consuming, ad-hoc and tedious to be a mainstream!
  • 6. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 6 • Can we leverage their experience and computational resources? • Can we build common research and experimentation infrastructure connected to a public repository of knowledge? • Can we extrapolate collected knowledge using machine learning to predict optimal program optimizations, hardware designs and run-time adaptation scenarios? Many end users and service providers run numerous applications on different architectures with different data sets, run-time systems, compilers and optimizations! cTuning.org: combine auto-tuning with machine learning and crowdsourcing Proposed and developed in the MILEPOST/cTuning project (2006-2009) to automate and crowdsource training of a machine learning based self-tuning compiler for reconfigurable processors (SW/HW co-design).
  • 7. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 7 cTuning.org: plugin-based auto-tuning and learning framework cTuning fixed workflow Program cTuning wrapper: extract program semantic (static) features using MILEPOST GCC with ICI cTuning run wrapper cTuning third-party analysis plugins: profile and collect hardware counters (dynamic features) using gprof, perf, PAPI,etc Data set cTuning auto-tuning plugin: explore design and optimization space cTuning compile wrapper Pareto frontier plugin: balance exec.time, code size, compilation time, power consumption, etc Machine learning plugins: KNN, SVM … Common open-source cTuning CC framework (http://cTuning.org/ctuning-cc)
  • 8. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 8 cTuning fixed workflow cTuning.org: plugin-based auto-tuning and learning framework Program cTuning third-party analysis plugins: profile and collect hardware counters (dynamic features) using gprof, perf, PAPI,etc OS and platform shared descriptions (txt) Data set cTuning auto-tuning plugin: explore design and optimization space Shared benchmarks, codelets, kernels from SVN (cTuning.org/cbench) Plugins with strategies: random, hill climbing, probabilistic, genetic … Multiple shared compiler descriptions for GCC, ICC, Open64, LLVM, PathScale, XLC… (txt) Shared datasets from SVN (cTuning.org/cdatasets) Pareto frontier plugin: balance exec.time, code size, compilation time, power consumption, etc Machine learning plugins: KNN, SVM … Shared material in various formats from multiple sources ` Common open-source cTuning CC framework (http://cTuning.org/ctuning-cc) cTuning wrapper: extract program semantic (static) features using MILEPOST GCC with ICI cTuning run wrapper cTuning compile wrapper
  • 9. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 9 cTuning.org: plugin-based auto-tuning and learning framework cTuning fixed workflow Program cTuning third-party analysis plugins: profile and collect hardware counters (dynamic features) using gprof, perf, PAPI,etc OS and platform shared descriptions (txt) Data set cTuning auto-tuning plugin: explore design and optimization space Shared benchmarks, codelets, kernels from SVN (cTuning.org/cbench) Plugins with strategies: random, hill climbing, probabilistic, genetic … Multiple shared compiler descriptions for GCC, ICC, Open64, LLVM, PathScale, XLC… (txt) Shared datasets from SVN (cTuning.org/cdatasets) Pareto frontier plugin: balance exec.time, code size, compilation time, power consumption, etc Machine learning plugins: KNN, SVM … Shared material in various formats from multiple sources ` cTuning.org cTuningunifiedphp-basedwebservices MySQL cTuning database: optimization statistics from multiple users Interdisciplinary crowd Collaboratively searching for best optimizations and learning techniques Common open-source cTuning CC framework (http://cTuning.org/ctuning-cc) cTuning wrapper: extract program semantic (static) features using MILEPOST GCC with ICI cTuning run wrapper cTuning compile wrapper
  • 10. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 10 cTuning.org: plugin-based auto-tuning and learning framework cTuning fixed workflow Program cTuning third-party analysis plugins: profile and collect hardware counters (dynamic features) using gprof, perf, PAPI,etc OS and platform shared descriptions (txt) Data set cTuning auto-tuning plugin: explore design and optimization space Shared benchmarks, codelets, kernels from SVN (cTuning.org/cbench) Plugins with strategies: random, hill climbing, probabilistic, genetic … Multiple shared compiler descriptions for GCC, ICC, Open64, LLVM, PathScale, XLC… (txt) Shared datasets from SVN (cTuning.org/cdatasets) Pareto frontier plugin: balance exec.time, code size, compilation time, power consumption, etc Machine learning plugins: KNN, SVM … Shared material in various formats from multiple sources cTuning.org cTuningunifiedphp-basedwebservices MySQL cTuning database: optimization statistics from multiple users Interdisciplinary crowd Collaboratively searching for best optimizations and learning techniques Common open-source cTuning CC framework (http://cTuning.org/ctuning-cc) ` cTuning wrapper: extract program semantic (static) features using MILEPOST GCC with ICI cTuning run wrapper cTuning compile wrapper In 2009, we managed to automatically tune customer benchmarks and compiler heuristics for a range of real platforms from IBM and ARC (Synopsis) Everything is solved?
  • 11. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 11 Technological chaos GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS 2013 XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA 4.x gprofprof perf oprofile PAPI TAU Scalasca VTune Amplifier predictive scheduling algorithm- level TBB MKL ATLAS program- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering KNN per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size ARM v6 threads execution time reliability GCC 4.8.x LLVM 3.4 SVM genetic algorithms cTuning practical usage exposed a few problems ARM v8 Intel SandyBridge SSE4 AVX Many thanks to international academic and industrial users for their feedback http://cTuning.org/lab/people Thanks to Davide Del Vento and his interns from UCAR CUDA 5.x SimpleScalar algorithm precision
  • 12. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 12 cTuning practical usage exposed a few problems • Ever changing environment (SW and HW choices, features, tools and libraries versions, interfaces, data formats …) • Difficult to reproduce experimental results from different users • Difficult to keep cTuning up-to-date - by the end of development cycle, new software and hardware is often available • Complex deployment and maintenance • Complex, manual configuration of the framework, repository, plugins, hardware and all dependencies • Many used languages (C, C++, PHP, Java) and data exchange formats (txt, csv, xls, …) • Different versions of benchmarks, data sets, plugins, models, compilers, third-party tools, etc are scattered around many directories and disks Technological chaos GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS 2013 XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA 4.x gprofprof perf oprofile PAPI TAU Scalasca VTune Amplifier predictive scheduling algorithm- level TBB MKL ATLAS program- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering KNN per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size ARM v6 threads execution time reliability GCC 4.8.x LLVM 3.4 SVM genetic algorithms ARM v8 Intel SandyBridge SSE4 AVX CUDA 5.x SimpleScalar algorithm precision
  • 13. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 13 cTuning practical usage exposed a few problems • Difficult or impossible to reproduce and reuse techniques from existing publications • No culture of sharing code and data in computer engineering unlike other sciences • Impossible to publish negative results or just validation of existing works • Academia focuses more and more on quantity (number of publications and citations) rather than quality and reproducibility of results • No simple and scalable mechanism to preserve, share and reuse all heterogeneous experimental material • Centralized, MySQL-based repository got quickly saturated when collecting and processing data from many users • MySQL repository is difficult to adapt to continuous changes in the system • Difficult to preserve different versions of benchmarks, data sets, plugins and tools, and make them co-exist using SVN repositories Technological chaos GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS 2013 XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA 4.x gprofprof perf oprofile PAPI TAU Scalasca VTune Amplifier predictive scheduling algorithm- level TBB MKL ATLAS program- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering KNN per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size ARM v6 threads execution time reliability GCC 4.8.x LLVM 3.4 SVM genetic algorithms ARM v8 Intel SandyBridge SSE4 AVX CUDA 5.x SimpleScalar algorithm precision
  • 14. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 14 cTuning practical usage exposed a few problems • Difficult or impossible to reproduce and reuse techniques from existing publications • No culture of sharing code and data in computer engineering unlike other sciences • Impossible to publish negative results or just validation of existing works • Academia focuses more and more on quantity (number of publications and citations) rather than quality and reproducibility of results • No simple and scalable mechanism to preserve, share and reuse all heterogeneous experimental material • Centralized, MySQL-based repository got quickly saturated when collecting and processing data from many users • MySQL repository is difficult to adapt to continuous changes in the system • Difficult to preserve different versions of benchmarks, data sets, plugins and tools, and make them co-exist using SVN repositories Technological chaos GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS 2013 XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA 4.x gprofprof perf oprofile PAPI TAU Scalasca VTune Amplifier predictive scheduling algorithm- level TBB MKL ATLAS program- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering KNN per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size ARM v6 threads execution time reliability GCC 4.8.x LLVM 3.4 SVM genetic algorithms ARM v8 Intel SandyBridge SSE4 AVX CUDA 5.x SimpleScalar algorithm precision We face typical “big data” problem!
  • 15. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 15 Collective Mind: attempt to fix cTuning problems cTuning fixed workflow Program cTuning third-party analysis plugins: profile and collect hardware counters (dynamic features) using gprof, perf, PAPI,etc OS and platform shared descriptions (txt) Data set cTuning auto-tuning plugin: explore design and optimization space Shared benchmarks, codelets, kernels from SVN (cTuning.org/cbench) Plugins with strategies: random, hill climbing, probabilistic, genetic … Multiple shared compiler descriptions for GCC, ICC, Open64, LLVM, PathScale, XLC… (txt) Shared datasets from SVN (cTuning.org/cdatasets) Pareto frontier plugin: balance exec.time, code size, compilation time, power consumption, etc Machine learning plugins: KNN, SVM … Shared material in various formats from multiple sources cTuning.org cTuningunifiedphp-basedwebservices MySQL cTuning database: optimization statistics from multiple users Interdisciplinary crowd Collaboratively searching for best optimizations and learning techniques Common open-source Collective Mind framework (http://c-mind.org) ` cTuning wrapper: extract program semantic (static) features using MILEPOST GCC with ICI cTuning run wrapper cTuning compile wrapper
  • 16. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 16 Collective Mind: attempt to fix cTuning problems Collective Mind flexible pipeline Program cTuning third-party analysis plugins: profile and collect hardware counters (dynamic features) using gprof, perf, PAPI,etc OS and platform shared descriptions (txt) Data set cTuning auto-tuning plugin: explore design and optimization space Shared benchmarks, codelets, kernels from SVN (cTuning.org/cbench) Plugins with strategies: random, hill climbing, probabilistic, genetic … Multiple shared compiler descriptions for GCC, ICC, Open64, LLVM, PathScale, XLC… (txt) Shared datasets from SVN (cTuning.org/cdatasets) Pareto frontier plugin: balance exec.time, code size, compilation time, power consumption, etc Machine learning plugins: KNN, SVM … Shared material in various formats from multiple sources cTuning.org cTuningunifiedphp-basedwebservices MySQL cTuning database: optimization statistics from multiple users Interdisciplinary crowd Collaboratively searching for best optimizations and learning techniques ` cTuning wrapper: extract program semantic (static) features using MILEPOST GCC with ICI cTuning run wrapper cTuning compile wrapper Unify plugins and tool wrappers: cm <plugin_name> params -- orig. CMD Use high-level, portable language Common open-source Collective Mind framework (http://c-mind.org)
  • 17. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 17 Collective Mind: attempt to fix cTuning problems Collective Mind flexible pipeline Program cTuning third-party analysis plugins: profile and collect hardware counters (dynamic features) using gprof, perf, PAPI,etc OS and platform shared descriptions (txt) Data set cTuning auto-tuning plugin: explore design and optimization space Shared benchmarks, codelets, kernels from SVN (cTuning.org/cbench) Plugins with strategies: random, hill climbing, probabilistic, genetic … Multiple shared compiler descriptions for GCC, ICC, Open64, LLVM, PathScale, XLC… (txt) Shared datasets from SVN (cTuning.org/cdatasets) Pareto frontier plugin: balance exec.time, code size, compilation time, power consumption, etc Machine learning plugins: KNN, SVM … Shared material in various formats from multiple sources c-mind.org/repo cMunifiedwebservices MySQL cTuning database: optimization statistics from multiple users Interdisciplinary crowd Collaborative curation of all research material ` cTuning wrapper: extract program semantic (static) features using MILEPOST GCC with ICI cTuning run wrapper cTuning compile wrapper Unify plugins and tool wrappers: cm <plugin_name> params -- orig. CMD Use high-level, portable language New repository which can keep both collections of files and meta- description in some unified format Common open-source Collective Mind framework (http://c-mind.org)
  • 18. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 18 JSON: extensible and human-readable format file data.json = { "characteristics":{ "execution times": ["10.3","10.1","13.3"], "code size": "131938", ...}, "choices":{ "os":"linux", "os version":"2.6.32-5-amd64", "compiler":"gcc", "compiler version":"4.6.3", "compiler_flags":"-O3 -fno-if-conversion", "platform":{"processor":"intel xeon e5520", "l2":"8192“, ...}, ...}, "features":{ "semantic features": {"number_of_bb": "24", ...}, "hardware counters": {"cpi": "1.4" ...}, ... } "state":{ "frequency":"2.27", ...}} Can be directly indexed using web-based, distributed ElasticSearch (Apache2 license) Python: high-level, portable, popular language with JIT, JSON support and many statistical and machine learning libraries import json f=file('data.json') array=json.loads(f.read()) f.close() et=array[‘characteristics’][‘execution_times’] for tm in et: print ‘Execution time:’, tm Collective Mind: which technology to use?
  • 19. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 19 Behavior Choices Features State Hardwired experimental setups, very difficult to change, scale or share Convert ad-hoc experiments to cM python modules Meta description that should be exposed in the information flow for auto-tuning and machine learning Run-time2 CompilerN Compiler2 Compiler1 Run-time1 Ad-hoc analysis scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Offline tuning User program, dataset Online tuning
  • 20. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 20 cM module (wrapper) with unified and formalized input and output ProcessCMD Tool BVi Generated files Original unmodified ad-hoc input Convert ad-hoc experiments to cM python modules Behavior Choices Features State Run-time2 CompilerN Compiler2 Compiler1 Run-time1 Ad-hoc analysis scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files User program, dataset Online tuning Offline tuning
  • 21. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 21 cM module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) Convert ad-hoc experiments to cM python modules Run-time2 CompilerN Compiler2 Compiler1 Run-time1 Ad-hoc analysis scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files User program, dataset Online tuning Offline tuning
  • 22. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 22 cM module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Set environment for a given tool version Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) Convert ad-hoc experiments to cM python modules Multiple tool versions can co-exist, while their interface is abstracted by cM module Run-time2 CompilerN Compiler2 Compiler1 Run-time1 Ad-hoc analysis scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files User program, dataset Online tuning Offline tuning
  • 23. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 23 cM module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Set environment for a given tool version Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line) cm compiler build -- icc -fast *.c cm code.source build ct_compiler=icc13 ct_optimizations=-fast cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)! Convert ad-hoc experiments to cM python modules Run-time2 CompilerN Compiler2 Compiler1 Run-time1 Ad-hoc analysis scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files User program, dataset Online tuning Offline tuning
  • 24. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 24 Data abstraction in Collective Mind compiler GCC 4.4.4 GCC 4.7.1 LLVM 3.1 LLVM 3.4 cM module JSON meta-description Compiler flags
  • 25. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 25 Data abstraction in Collective Mind compiler GCC 4.4.4 GCC 4.7.1 LLVM 3.1 LLVM 3.4 package GCC 4.7.1 bin GCC 4.7.1 source LLVM 3.4 gmp 5.0.5 mpfr 3.1.0 lapack 2.3.0 java apache commons codec 1.7 … … … … … … … cM module JSON meta-descriptionFiles, directories Compiler flags Installation info
  • 26. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 26 Data abstraction in Collective Mind compiler GCC 4.4.4 GCC 4.7.1 LLVM 3.1 LLVM 3.4 package GCC 4.7.1 bin GCC 4.7.1 source LLVM 3.4 gmp 5.0.5 mpfr 3.1.0 lapack 2.3.0 java apache commons codec 1.7 dataset image-jpeg-0001 bzip2-0006 txt-0012 … … … … … … … … … … module compiler package dataset … … … cM module JSON meta-descriptionFiles, directories Compiler flags Installation info Features Actions .cmr / module UOA / data UOA (UID or alias) / .cm / data.json cMrepositorydirectorystructure:
  • 27. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 27 Data abstraction in Collective Mind compiler GCC 4.4.4 GCC 4.7.1 LLVM 3.1 LLVM 3.4 package GCC 4.7.1 bin GCC 4.7.1 source LLVM 3.4 gmp 5.0.5 mpfr 3.1.0 lapack 2.3.0 java apache commons codec 1.7 dataset image-jpeg-0001 bzip2-0006 txt-0012 … … … … … … … … … … module compiler package dataset … … … cM module JSON meta-descriptionFiles, directories Compiler flags Installation info Features Actions .cmr / module UOA / data UOA (UID or alias) / .cm / data.json cMrepositorydirectorystructure: Now can reference and find any data by CID (similar to DOI): <module UOA : data UOA>
  • 28. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 28 Data abstraction in Collective Mind compiler GCC 4.4.4 GCC 4.7.1 LLVM 3.1 LLVM 3.4 package GCC 4.7.1 bin GCC 4.7.1 source LLVM 3.4 gmp 5.0.5 mpfr 3.1.0 lapack 2.3.0 java apache commons codec 1.7 dataset image-jpeg-0001 bzip2-0006 txt-0012 … … … … … … … … … … module compiler package dataset … … … cM module JSON meta-descriptionFiles, directories Compiler flags Installation info Features Actions Dependenciesbetweendataandmodules .cmr / module UOA / data UOA (UID or alias) / .cm / data.json cMrepositorydirectorystructure:
  • 29. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 29 Gradually adding specification (agile development) cTuning experiment module data.json { "characteristics":{ "execution times": ["10.3","10.1","13.3"], "code size": "131938", ...}, "choices":{ "os":"linux", "os version":"2.6.32-5-amd64", "compiler":"gcc", "compiler version":"4.6.3", "compiler_flags":"-O3 -fno-if-conversion", "platform":{"processor":"intel xeon e5520", "l2":"8192“, ...}, ...}, "features":{ "semantic features": {"number_of_bb": "24", ...}, "hardware counters": {"cpi": "1.4" ...}, ... } "state":{ "frequency":"2.27", ...} } cM flattened JSON key ##characteristics#execution_times@1
  • 30. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 30 Gradually adding specification (agile development) cTuning experiment module data.json { "characteristics":{ "execution times": ["10.3","10.1","13.3"], "code size": "131938", ...}, "choices":{ "os":"linux", "os version":"2.6.32-5-amd64", "compiler":"gcc", "compiler version":"4.6.3", "compiler_flags":"-O3 -fno-if-conversion", "platform":{"processor":"intel xeon e5520", "l2":"8192“, ...}, ...}, "features":{ "semantic features": {"number_of_bb": "24", ...}, "hardware counters": {"cpi": "1.4" ...}, ... } "state":{ "frequency":"2.27", ...} } cM flattened JSON key ##characteristics#execution_times@1 "flattened_json_key”:{ "type": "text”|"integer" | “float" | "dict" | "list” | "uid", "characteristic": "yes" | "no", "feature": "yes" | "no", "state": "yes" | "no", "has_choice": "yes“ | "no", "choices": [ list of strings if categorical choice], "explore_start": "start number if numerical range", "explore_stop": "stop number if numerical range", "explore_step": "step if numerical range", "can_be_omitted" : "yes" | "no" ... }
  • 31. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 31 TransparentlyindexedbyElasticSearch Gradually adding, extending and improving modules and data Applications, benchmarks, codelets, kernels Auto-tuning compilers Binaries and libraries Processors Operating Systems Data sets Statistical and data mining plugins Compiler machine learning plugins Third-party tools, libraries code.source ctuning.compiler code processor os dataset math.statistics.r math.model algorithm Category cM module Module actions common*, transform common*, detect_host_processor common*, detect_host_family common*, create common*, analyze common*, build, predict, fit, detect_representative_points common*, installpackage High-level algorithms common*, build common*, compile_program common*, run All data … … … … … … … … … … * add, list, view, copy, move, search Meta-description … … … … … … … … … … .cmr / module UOA (UID or alias) / data UOA / .cm / data.json cM repository directory structure:
  • 32. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 32 cM interface: only one main function! Simple and minimalistic high-level cM interface - one function (!) (python dictionary) output = cm_kernel.access ( (python dictionary) input ) Load data: r = cm_kernel.access({‘cm_run_module_uoa’:’dataset’, ‘cm_action’:’load’, ‘ cm_data_uoa’:’image-jpeg-0001’}) if r[‘cm_return’]>0: return r d=r[‘cm_data_obj’][‘cfg’] Call R machine learning module: r = cm_kernel.access({‘cm_run_module_uoa’:’ cm math.model.r’, ‘cm_action’:’ build’, ‘model_name’:’earth’, …}) if r[‘cm_return’]>0: return r
  • 33. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 33 Assembling, preserving, sharing and extending the whole pipeline as “LEGO” cM tool wrapper with unified and formalized input and output Unified JSON input (meta-data) Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Set environment for a given tool version Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) Chaining cM components (wrappers) to an experimental pipeline for a given research and experimentation scenario Public modular auto-tuning and machine learning repository and buildbot Unified web services Interdisciplinary crowd Choose exploration strategy Generate choices (code sample, data set, compiler, flags, architecture …) Compile source code Run code Test behavior normality Pareto filter Modeling and prediction Complexity reduction Shared scenarios from past research …
  • 34. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 34 Gradually expose some characteristics Gradually expose some choices and features Compile Program time … compiler flags; pragmas … Run code Run-time environment time; CPI, power consumption … pinning/scheduling … System cost; architecture; frequency; cache size… Data set size; values; description … precision … Analyze profile time; size … instrumentation; profiling … Start coarse-grain decomposition of a system (detect coarse-grain effects first). Add universal learning modules. Top-down decomposition and learning of computer systems I now gradually convert to Collective Mind and validate past research techniques with the great help of volunteers (sadly such work is usually not funded, can’t be easily published, and is often considered as a waste of time by academic community) Methodology similar to physics: start from coarse-grain and gradually move to fine-grain level!
  • 35. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 35 Growing, plugin-based cM pipeline for auto-tuning and learning •Init pipeline •Detected system information •Initialize parameters •Prepare dataset •Clean program •Prepare compiler flags •Use compiler profiling •Use cTuning CC/MILEPOST GCC for fine-grain program analysis and tuning •Use universal Alchemist plugin (with any OpenME-compatible compiler or tool) •Use Alchemist plugin (currently for GCC) •Build program •Get objdump and md5sum (if supported) •Use OpenME for fine-grain program analysis and online tuning (build & run) •Use 'Intel VTune Amplifier' to collect hardware counters •Use 'perf' to collect hardware counters •Set frequency (in Unix, if supported) •Get system state before execution •Run program •Check output for correctness (use dataset UID to save different outputs) •Finish OpenME •Misc info •Observed characteristics •Observed statistical characteristics •Finalize pipeline http://c-mind.org/ctuning-pipeline
  • 36. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 36 Publicly shared research material (c-mind.org/repo) Our Collective Mind Buildbot and plugin-based auto-tuning pipeline supports the following shared benchmarks and codelets: •Polybench - numerical kernels with exposed parameters of all matrices in cM • CPU: 28 prepared benchmarks • CUDA: 15 prepared benchmarks • OpenCL: 15 prepared benchmarks • cBench - 23 benchmarks with 20 and 1000 datasets per benchmark • Codelets - 44 codelets from embedded domain (provided by CAPS Entreprise) • SPEC 2000/2006 • Description of 32-bit and 64-bit OS: Windows, Linux, Android • Description of major compilers: GCC 4.x, LLVM 3.x, Open64/Pathscale 5.x, ICC 12.x • Support for collection of hardware counters: perf, Intel vTune • Support for frequency modification • Validated on laptops, mobiles, tables, GRID/cloud - can work even from the USB key
  • 37. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 37 Implemented scenario: multi-objective compiler auto-tuning using mobile phones Program: image corner detection Processor: ARM v6, 830MHz Compiler: Sourcery GCC for ARM v4.7.3 OS: Android OS v2.3.5 System: Samsung Galaxy Y Data set: MiDataSet #1, image, 600x450x8b PGM, 263KB 500 combinations of random flags -O3 -f(no-)FLAG Binarysize(bytes) Execution time (sec.) Use Pareto frontier filter; Pack experimental data on the fly -O3 Powered by Collective Mind Node (Android Apps on Google Play)
  • 38. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 38 Collective Mind experimental pipelines can now survive changes in the system! Modules and data can evolve with the evolution of the technology! From agile development to agile research! Community shares, validates and improves benchmarks, data sets, tools, design and optimization choices, features, characteristics, models, … Collective Mind agile research and development
  • 39. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 39 Reproducibility of experimental results Reproducibility came as a side effect! • Can preserve the whole experimental setup with all data and software dependencies • Can perform statistical analysis (normality test) for characteristics • Community can add missing features or improve machine learning models
  • 40. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 40 Execution time (sec.) Distribution Unexpected behavior - expose to the community including domain specialists, explain, find missing feature and add to the system Reproducibility of experimental results Reproducibility came as a side effect! • Can preserve the whole experimental setup with all data and software dependencies • Can perform statistical analysis (normality test) for characteristics • Community can add missing features or improve machine learning models
  • 41. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 41 Execution time (sec.) Distribution Class A Class B 800MHz CPU Frequency 2400MHz Unexpected behavior - expose to the community including domain specialists, explain, find missing feature and add to the system Reproducibility of experimental results Reproducibility came as a side effect! • Can preserve the whole experimental setup with all data and software dependencies • Can perform statistical analysis (normality test) for characteristics • Community can add missing features or improve machine learning models
  • 42. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 42 My dream: systematic, collaborative and reproducible computer engineering GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.1 Phoenix MVS XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA gprof prof perf oprofile PAPI TAU Scalasca VTune Amplifier scheduling algorithm-level TBB MKL ATLASprogram-level function-level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering run-time adaptation per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size coresprocessors threads power consumption execution time reliability Current state of computer engineering likwid Sharing of code and data Classification, predictive modeling Systematization and unification of collected knowledge (big data) “crowd” c-mind.org/repo Collaborative Infrastructure and repository •Prototype research idea •Validate existing work •Perform end-user task Result • Quick, non-reproducible hack? • Ad-hoc heuristic? • Quick publication? • No shared code and data? • Share code, data with their meta-description and dependencies • Systematize and classify collected optimization knowledge • Develop and preserve the whole experimental pipeline • Validate experimental results by the community • Extrapolate collected knowledge to build faster, smaller, more power efficient and reliable computer systems
  • 43. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 43 • Pilot live repository for public curation of research material: http://c-mind.org/repo • Infrastructure is available at SourceForge under standard BSD license: http://c-mind.org • Example of crowdsourcing compiler flag auto-tuning using mobile phones: “Collective Mind Node” in Google Play Store • Several publications under submission • Raising funding to make cM more user friendly and add more research scenarios Current status and future work Grigori Fursin, “Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing, big data and machine learning”, INRIA Tech. report No 00850880, August 2013 http://hal.inria.fr/hal-00850880 http://arxiv.org/abs/1308.2410 Education Academic research New publication model where research material and experimental results are shared, validated and reused by the community http://ctuning.org/reproducibility • ACM SIGPLAN TRUST 2014 @ PLDI 2014 http://c-mind.org/events/trust2014 • Panel at ADAPT 2014 @ HiPEAC 2014 http://adapt-workshop.org (this year we attempted to reproduce some submitted articles) • Systematizing, validating, sharing past research techniques on auto-tuning and machine learning through cM pipeline • Optimal feature and model selection • Finding representative benchmarks and data sets • Run-time adaptation and ML
  • 44. Grigori Fursin “Collective Mind: a collaborative curation tool for program optimization” SEA’14, April 2014, Boulder, CO, USA 44 Acknowledgements • Colleagues from NCAR (USA): Davide Del Vento et.al. • Colleagues from STMicroelectronics (France): Christophe Guillone, Antoine Moynault, Christian Bertin • Colleagues from ARM (UK): Anton Lokhmotov • Colleagues from Intel (USA): David Kuck and David Wong • cTuning community: http://cTuning.org/lab/people • EU FP6, FP7 program and HiPEAC network of excellence http://www.hipeac.net • INRIA fellowship (2012-2016) Thank you for your attention! Questions and comments: grigori.fursin@inria.fr