SlideShare uma empresa Scribd logo
1 de 69
Baixar para ler offline
Collective Mind infrastructure and repository
to crowdsource auto-tuning (c-mind.org/repo)
Grigori Fursin
INRIA, France
HPSC 2013, Taiwan
March 2013
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 2 / 66
• Collective Mind approach combined with social networking, expert
knowledge and predictive modeling
• Collective Mind framework basics
• Plugin-based type-free and schema-free infrastructure
• Unified web interfaces (similar to WEB 2.0 concept)
• Portable file (json) based repository
• Auto-tuning and predictive modeling scenarios
• Demo
• Conclusions and future works
Background
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 3 / 66
Solution
Motivation: back to basics
Decision
(depends on user
requirements)
Result
Available choices
(solutions)
User requirements:
most common:
minimize all costs
(time, power consumption,
price, size, faults, etc)
guarantee real-time constraints
(bandwidth, QOS, etc)
Service/application
providers
(supercomputing,
cloud computing,
mobile systems)
Should provide choices
and help with decisions
Hardware and
software designers
End users
Task
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 4 / 66
Solutions
Challenges
Task
Result
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA
gprof
prof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifierscheduling
algorithm-
level TBB
MKL
ATLASprogram-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO
threadsprocess
pass
reordering
run-time
adaptation
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB
ISA
memory size
cores
processors
threads
power
consumptionexecution time
reliability
Clean up this mess!
Simplify analysis, tuning and
modelling of computer systems
for non-computer engineers
Bring together researchers from
interdisciplinary communities
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 5 / 66
Treat
computer
system as a
black box
Task
Result
Understanding computer systems’ behavior: a physicist’s approach
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 6 / 66
Treat
computer
system as a
black box
Task
Result
Understanding computer systems’ behavior: a physicist’s approach
Expose
object
information flow
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 7 / 66
Treat
computer
system as a
black box
Task
Result
Expose
object
information flow
Object
wrapper and
repository:
observe
behavior and
keep history
information flow
expose characteristics
Observe system
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 8 / 66
Treat
computer
system as a
black box
Task
Result
Gradually expose properties, characteristics, choices
Expose
object
expose and
explore choices
information flow
Object
wrapper and
repository:
observe
behavior and
keep history
(choices,
properties,
characteristics,
system state,
data)
information flow
expose characteristics
Universal Learning
Module
expose system
state
expose
properties
expose
characteristics
set
requirements
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 9 / 66
Treat
computer
system as a
black box
Task
Result
Classify, build models, predict behavior
Expose
object
information flow
expose system
state
Object
wrapper and
repository:
observe
behavior and
keep history
(choices,
properties,
characteristics,
system state,
data)
history
(experience)
information flow
expose
properties
expose characteristics
expose
characteristics
continuously build,
validate or refine
classification and
predictive model
on the fly
expose and
explore choices
set
requirements
Universal Learning
Module
Evolutionary approach, not revolutionary, i.e. do not rebuild existing SW/HW stack from
scratch but wrap up existing tools and techniques, and gradually clean up the mess!
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 10 / 66
Unified
web
interface
Unified
web
interface
Unified
web
interface
Unified
web
interface
Expose and exchange
optimization knowledge and
models in a unified way
through http (WEB 2.0)
Use distributed, noSQL
repository to store highly
heterogeneous “big” data
Transparently crowdsource learning of a behavior of any existing mobile,
cluster, cloud computer system
Extrapolate collective knowledge to build faster and more power efficient computer systems
Build self-tuning machines using agent-based models
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 11 / 66
Treat
computer
system as a
black box
Task
Result
Gradual decomposition, parameterization, observation
and exploration of a system
Application
Compilers and auxiliary tools
Binary and libraries
Architecture
Run-time environment
State of the system
Data set
Algorithm
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 12 / 66
Treat
computer
system as a
black box
Task
Result
Gradual decomposition, parameterization, observation
and exploration of a system
Application
Compilers and auxiliary tools
Binary and libraries
Architecture
Run-time environment
State of the system
Data set
Algorithm Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
cTuning3 framework aka Collective Mind
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 13 / 66
Treat
computer
system as a
black box
Task
Result
Gradual top-down decomposition, parameterization,
observation and exploration of a system
Application
Compilers and auxiliary tools
Binary and libraries
Architecture
Run-time environment
State of the system
Data set
Algorithm Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Light-weightinterfacetoconnectmodules,dataandmodels
cTuning3 framework aka Collective Mind
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 14 / 66
Example of characterizing/explaining behavior of computer systems
Gradually expose
some characteristics
Gradually expose
some properties/choices
Compile Program time … compiler flags; pragmas …
Run code Run-time
environment
time; CPI, power
consumption …
pinning/scheduling …
System cost; architecture; frequency; cache size…
Data set size; values; description … precision …
Analyze profile time; size … instrumentation; profiling …
Start coarse-grain decomposition of a system (detect coarse-grain effects first). Add universal learning modules.
Combine expert knowledge with automatic detection!
Start from coarse-grain and gradually move to fine-grain level!
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 15 / 66
0
1
2
3
4
5
6
Program/architecturebehavior:CPI
How we can explain the following observations for some piece of code (“codelet object”)?
(LU-decomposition codelet, Intel Nehalem)
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 16 / 66
Add 1 property: matrix size
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset property: matrix size
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 17 / 66
Try to build a model to correlate objectives (CPI) and features (matrix size).
Start from simple models: linear regression (detect coarse grain effects)
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset property: matrix size
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 18 / 66
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset properties: matrix size
If more observations, validate model and detect discrepancies!
Continuously retrain models to fit new data!
Use model to “focus” exploration on “unusual” behavior!
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 19 / 66
Gradually increase model complexity if needed (hierarchical modeling).
For example, detect fine-grain effects (singularities) and characterize them.
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset properties: matrix size
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 20 / 66
Start adding more properties (one more architecture with twice bigger cache)!
Use automatic approach to correlate all objectives and features.
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset properties: matrix size
L3 = 4Mb
L3 = 8Mb
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 21 / 66
Continuously build and refine
classification (decision trees for
example) and predictive models on all
collected data to improve predictions.
Continue exploring design and
optimization spaces
(evaluate different architectures,
optimizations, compilers, etc.)
Focus exploration on unexplored
areas, areas with high variability
or with high mispredict rate of models
β
εcM predictive model module
CPI = ε + 1000 × β × data size
Example of characterizing/explaining behavior of computer systems
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 22 / 66
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0
1
2
3
4
5
6
Dataset features: matrix size
Code/architecturebehavior:CPI
Size < 1012
1012 < Size < 2042
Size > 2042 & GCC
Size > 2042 & ICC & O2
Size > 2042 & ICC & O3
Optimize decision tree (many different algorithms)
Balance precision vs cost of modeling = ROI (coarse-grain vs fine-grain effects)
Compact data on-line before sharing with other users!
Model optimization and data compaction
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 23 / 66
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Code/architecturebehavior:CPI
Dataset features: matrix size
Collaboratively and continuously add expert advices or automatic optimizations.
Extensible and collaborative advice system
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 24 / 66
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Code/architecturebehavior:CPI
Dataset features: matrix size
Collaboratively and continuously add expert advices or automatic optimizations.
Automatically characterize problem (extract all
possible features: hardware counters, semantic
features, static features, state of the system, etc)
Add manual analysis if needed
Extensible and collaborative advice system
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 25 / 66
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Code/architecturebehavior:CPI
Dataset features: matrix size
cM advice system:
Possible problem:
Cache conflict misses degrade performance
Collaboratively and continuously add expert advices or automatic optimizations.
Extensible and collaborative advice system
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 26 / 66
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Code/architecturebehavior:CPI
Dataset features: matrix size
cM advice system:
Possible problem:
Cache conflict misses degrade performance
Possible solution:
Array padding (A[N,N] -> A[N,N+1])
Effect:
~30% execution time improvement
Collaboratively and continuously add expert advices or automatic optimizations.
Extensible and collaborative expert system
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 27 / 66
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Executiontime,sec
Dataset features: matrix size
Add dynamic memory characterization through semantically non-equivalent modifications.
For example, convert all array accesses to scalars to detect balance between CPU/memory accesses.
Intentionally change/break semantics to observe reaction in terms of performance/power etc!
for
for
for
addr = a[0,0]
load … [addr+index1]…
mulss … [addr+index2]…
subss … [addr+index3]…
store … [addr+index4]…
addr + = 4
System reaction to code changes: physicist’s view
Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for Determining a Lower Bound
on Execution Time. Concurrency Practice and Experience, 16(2-3), pages 271-292, 2004
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 28 / 66
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Executiontime,sec
Dataset features: matrix size
for
for
for
addr = a[0,0]
load … [addr+index1]…
mulss … [addr+index2]…
subss … [addr+index3]…
store … [addr+index4]…
addr + = 0
Add dynamic memory characterization through semantically non-equivalent modifications.
For example, convert all array accesses to scalars to detect balance between CPU/memory accesses.
Intentionally change/break semantics to observe reaction in terms of performance/power etc!
Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for Determining a Lower Bound
on Execution Time. Concurrency Practice and Experience, 16(2-3), pages 271-292, 2004
System reaction to code changes: physicist’s view
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 29 / 66
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Executiontime,sec
Dataset features: matrix size
Expert or automatic advices based on additional information in the repository!
Focus optimizations to speed up search: which/where?
Advice:
Small gap (arithmetic dominates):
• Focus on ILP optimizations
• Run on complex out-of-order
core
• Increase processor frequency to
speed up application
Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for Determining a Lower Bound
on Execution Time. Concurrency Practice and Experience, 16(2-3), pages 271-292, 2004
System reaction to code changes: physicist’s view
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 30 / 66
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Executiontime,sec
Dataset features: matrix size
Well-knowngapbetweenarithmetic
anddataaccesses
Advice:
Small gap (arithmetic dominates):
• Focus on ILP optimizations
• Run on complex out-of-order
core
• Increase processor frequency to
speed up application
Expert or automatic advices based on additional information in the repository!
Focus optimizations to speed up search: which/where?
Advice:
Big gap (data accesses dominate):
• Focus on memory optimizations
• Run on simple core
• Decrease processor frequency to
save power
Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for Determining a Lower Bound
on Execution Time. Concurrency Practice and Experience, 16(2-3), pages 271-292, 2004
System reaction to code changes: physicist’s view
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 31 / 66
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Executiontime,sec
Dataset features: matrix size
Well-knowngapbetweenarithmetic
anddataaccesses
Advice:
Small gap (arithmetic dominates):
• Focus on ILP optimizations
• Run on complex out-of-order
core
• Increase processor frequency to
speed up application
Expert or automatic advices based on additional information in the repository!
Focus optimizations to speed up search: which/where?
Advice:
Big gap (data accesses dominate):
• Focus on memory optimizations
• Run on simple core
• Decrease processor frequency to
save power
Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for Determining a Lower Bound
on Execution Time. Concurrency Practice and Experience, 16(2-3), pages 271-292, 2004
System reaction to code changes: physicist’s view
Can add/remove
instructions,
Can add/remove
threads, etc …
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 32 / 66
cd [application_directory]
make CC=icc CC_OPTS=-fast
or
icc -fast *.c
time ./a.out < [my_dataset] > [output]
record “-fast”, execution time
Implementation in open-source cTuning1 framework
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 33 / 66
Implementation in open-source cTuning1 framework
cd [application_directory]
make CC=icc CC_OPTS=-fast
or
icc -fast *.c
time ./a.out < [my_dataset] > [output]
record “-fast”, execution time
ccc-comp build=“make” compiler=icc opts=“-fast”
ccc-comp compiler=icc opts=“-fast”
ccc-run prog=./a.out cmd=“< [my dataset]”
ccc-time <cmd>
ccc-record-stats <file_with_stats>
• Low level platform-dependent plugins in C
• Communication through text file or directly through MySQL database
• High level platform-independent exploration or analysis plugins in PHP
• Web services at cTuning.org as plugins in PHP
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 34 / 66
Compilation abstraction
supports multiple compilers and
languages
ccc-comp
Compiler(s)
Multiple user architectures
Execution abstraction
supports multiple architectures,
remote SSH execution, profiling
ccc-run Low-level execution
abstraction
timing, profiling,
hardware counters
ccc-time
Application
Interactive Compilation
Interface
fine-grain analysis and
optimization
Multiple
datasets
(from
cBench)
Implementation in open-source cTuning1 framework
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 35 / 66
High-level optimization tools
compiler and platform independent
- perform iterative optimization:
compiler flags or ICI passes/parameters
[random, one off, one by one, focused]
- user optimization plugins
ccc-run-*
Compilation abstraction
supports multiple compilers and
languages
ccc-comp
Compiler(s)
Multiple user architectures
High-level machine learning tools
- build ML models
- extract program static/dynamic features
- predict “good” program optimizations
ccc-ml-*
Execution abstraction
supports multiple architectures,
remote SSH execution, profiling
ccc-run Low-level execution
abstraction
timing, profiling,
hardware counters
ccc-time
Application
Interactive Compilation
Interface
fine-grain analysis and
optimization
Multiple
datasets
(from
cBench)
Implementation in open-source cTuning1 framework
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 36 / 66
High-level optimization tools
compiler and platform independent
- perform iterative optimization:
compiler flags or ICI passes/parameters
[random, one off, one by one, focused]
- user optimization plugins
ccc-run-*
Compilation abstraction
supports multiple compilers and
languages
ccc-comp
Communication with COD
ccc-db-send-stats-*
Compiler(s)
Multiple user architectures
Collective Optimization Database (COD) MySQL
Common database: keep info about all
architectures, environments, programs,
compilers, etc
Optimization database: keep info about
interesting optimization cases from the community
and expert advices/knowledge
High-level machine learning tools
- build ML models
- extract program static/dynamic features
- predict “good” program optimizations
ccc-ml-*
COD tools
- administration db/admin
- optimization analysis db/analysis
Execution abstraction
supports multiple architectures,
remote SSH execution, profiling
ccc-run
Communication with COD
ccc-db-send-stats*
Low-level execution
abstraction
timing, profiling,
hardware counters
ccc-time
Application
Interactive Compilation
Interface
fine-grain analysis and
optimization
Multiple
datasets
(from
cBench)
CCC configuration
configure all tools, COD, architecture,
compiler, benchmarks, etc
ccc-configure-*
Implementation in open-source cTuning1 framework
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 37 / 66
High-level optimization tools
compiler and platform independent
- perform iterative optimization:
compiler flags or ICI passes/parameters
[random, one off, one by one, focused]
- user optimization plugins
ccc-run-*
Compilation abstraction
supports multiple compilers and
languages
ccc-comp
Communication with COD
ccc-db-send-stats-*
Compiler(s)
Multiple user architectures
Collective Optimization Database (COD) MySQL
Common database: keep info about all
architectures, environments, programs,
compilers, etc
Optimization database: keep info about
interesting optimization cases from the community
and expert advices/knowledge
High-level machine learning tools
- build ML models
- extract program static/dynamic features
- predict “good” program optimizations
ccc-ml-*
COD tools
- administration db/admin
- optimization analysis db/analysis
Execution abstraction
supports multiple architectures,
remote SSH execution, profiling
ccc-run
Communication with COD
ccc-db-send-stats*
Low-level execution
abstraction
timing, profiling,
hardware counters
ccc-time
Application
Interactive Compilation
Interface
fine-grain analysis and
optimization
Multiple
datasets
(from
cBench)
Web access to COD
http://ctuning.org
CCC configuration
configure all tools, COD, architecture,
compiler, benchmarks, etc
ccc-configure-*
Implementation in open-source cTuning1 framework
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 38 / 66
MySQL-based Collective Optimization Database
Platforms
unique PLATFORM_ID
Compilers
unique COMPILER_ID
Runtime environments
unique RE_ID
Programs
unique PROGRAM_ID
Datasets
unique DATASET_ID
Platform features
unique PLATFORM_FEATURE_ID
Global platform optimization flags
unique OPT_PLATFORM_ID
Global optimization flags
unique OPT_ID
Optimization passes
unique OPT_PASSES_ID
Compilation info
unique COMPILE_ID
Execution info
unique RUN_ID
unique RUN_ID_ASSOCIATE
Program passes
associated COMPILE_ID
Program features
associated COMPILE_ID
Common Optimization Database (shared among all users)
Local or shared databases with optimization cases
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 39 / 66
Problems with cTuning1
• Difficult to extend (C, various hardwired components, need to
change schema and types in MySQL)
• No convenient way of sharing modules, benchmarks, data sets,
models (manual, csv files, emails, etc)
• Problems with repository scalability
• Complex, hardwired interfaces
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 40 / 66
cd [application_directory]
make CC=icc CC_OPTS=-fast
or
icc -fast *.c
time ./a.out < [my_dataset] > [output]
record “-fast”, execution time
cTuning3 aka Collective Mind framework basics
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 41 / 66
cd [application_directory]
make CC=icc CC_OPTS=-fast
or
icc -fast *.c
time ./a.out < [my_dataset] > [output]
record “-fast”, execution time
cTuning3 aka Collective Mind framework basics
cM kernel
code.source build
Universal
cM FE
cM
plugins
(modules)
code run
compiler build
…
E nd-users or
cM developers
CMD
python python
or any other
language
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 42 / 66
cd [application_directory]
make CC=icc CC_OPTS=-fast
or
icc -fast *.c
time ./a.out < [my_dataset] > [output]
record “-fast”, execution time
cTuning3 aka Collective Mind framework basics
cM kernel
code.source build
Universal
cM FE
cM
plugins
(modules)
code run
compiler build
…
E nd-users or
cM developers
CMD
python python
or any other
language
cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line)
cm code.source build ct_compiler=icc13 ct_optimizations=-fast
cm compiler build -- icc -fast *.c
cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm
Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 43 / 66
cTuning3 aka Collective Mind framework basics
Simple and minimalistic high-level cM interface - one function (!)
should be easy to connect to any language if needed
schema and type-free (only strings) -
easily extended when needed for research (agile methodology)!
(python dictionary) output = cm_kernel.access ( (python dictionary) input )
Input: {
cm_run_module_uoa - cM plugin name (or some UID)
cm_action - cM plugin action (function)
parameters - (module and action dependent)
}
Output: {
cm_return - if 0, success
if >0, error
if <0, warning
cm_error - if cm_return>0, error message
parameters - (module and action dependent)
}
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 44 / 66
Application
Compilers and auxiliary tools
Binary and libraries
Architecture
Run-time environment
State of the system
Data set
Algorithm Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Collective Mind Repository basics
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 45 / 66
Application
Compilers and auxiliary tools
Binary and libraries
Architecture
Run-time environment
State of the system
Data set
Algorithm Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
Repo/models
.cmr / module UID or alias (cM UOA) / data UID or alias (cm UOA)
Repository root First level directory Second level directory
Very flexible and
portable
Easy to access,
edit and move
data
Can be per
application,
experiment,
architecture, etc
Can be easily
shared (through
web, SVN, GIT,
FTP)
Collective Mind Repository basics
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 46 / 66
Schema-free extensible data meta-representation
cM uses JSON as internal data representation format:
JSON or JavaScript Object Notation, is a text-based open standard designed for
human-readable data interchange (from Wikipedia)
• very intuitive to use and modify
• nearly native for python and php; simple libraries for Java, C, C++, …
• easy to index with powerful indexing services (cM uses ElasticSearch)
cM records input and output of the module for reproducibility!
Data is referenced by CID:
(Repository UID:) Module UID: Data UID
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 47 / 66
Example of JSON entry ctuning.compiler:icc-12.x-linux
{
"all_compiler_flags_desc": {
"##base_flag": {
"type": "text“
"desc_text": "compiler flag: -O3",
"field_size": "7",
"has_choice": "yes",
"choice": [
"-O0", "-O1", "-O2", "-Os", "-O3", "-fast"
],
"default_value": "-O3",
"explorable": "yes",
"explore_level": "1",
"explore_type": "fixed",
"forbid_disable_at_random": "yes"
},
…
}
…
}
Schema-free extensible data representation
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 48 / 66
Treat
computer
system as a
black box
Task
Result
Execute code
Input dictionary that can
be represented as JSON
{“run_cmd”:’image’,
“binary_name”:”a.out”,
“run_set_env”:[],
“run_output_files”:[],
“work_dir”:””,
“run_time_out”:””,
“code_deps”:[],
“run_os_uoa”:””,
…}
Function
(for example
“run”)
invoke object,
observe
behavior and
keep history
python module that has a UID and alias
(for example, “code” / 688dee2e7014f1fb)
Universal modules/functions
Output dictionary that can
be represented as JSON
{“cm_return”:’’,
“failed_run”:””,
“run_time_by_module”:””,
…}
gradually
describe input
and output
(choices,
characteristics,
etc)
Save
input/output
and related files
Index input and
output for fast
search
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 49 / 66
cM repository (.cmr)
cM data
Web
Command line
and scripts
cM kernel
Programs
Tools
Systems
OpenME
cM interface
php
shell (cm)
C
C++
Fortran
Java
End user
research and
development
scenarios
End user
instrumented
and adaptive
programs,
tools and
systems
cM kernel
in Python*
•native support
for other languages
maybe added later
access
load data
load module
create entry
delete entry
list entries
auth user
…
kernel
core
repo
user
module
web
code
code.source
os
package
dataset
…
…
…
…
…
…
…
…
…
…
…
…
Collective Mind overall structure
• Gradually add more modules, interfaces and data
depending on user/project/company needs
• Gradually add more parameters
• Gradually expose choices, properties, characteristics
Community
or workgroup
cmx - user-
friendly CMD
extension
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 50 / 66
Academia:
public, open-source modules and data
Collaborative, reproducible experiments: research LEGO
code.source code system
os dataset package
ctuning.processor
ctuning.compiler
choice
os.script
Industry:
proprietary modules and data
code.source code system
os dataset package
ctuning.processor
ctuning.compiler
choice
os.script
•Continuously adding “basic blocks” (modules)
•Adding tools, applications, datasets
•Gradually stabilize interfaces
Users can start connecting modules and data together to prepare
experimental pipelines with various observation, characterization,
auto-tuning and predictive scenarios!
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 51 / 66
Experimental pipelines for auto-tuning and modeling
•Init pipeline
•Detected system information
•Initialize parameters
•Prepare dataset
•Clean program
•Prepare compiler flags
•Use compiler profiling
•Use cTuning CC/MILEPOST GCC for fine-grain program analysis and tuning
•Use universal Alchemist plugin (with any OpenME-compatible compiler or tool)
•Use Alchemist plugin (currently for GCC)
•Build program
•Get objdump and md5sum (if supported)
•Use OpenME for fine-grain program analysis and online tuning (build & run)
•Use 'Intel VTune Amplifier' to collect hardware counters
•Use 'perf' to collect hardware counters
•Set frequency (in Unix, if supported)
•Get system state before execution
•Run program
•Check output for correctness (use dataset UID to save different outputs)
•Finish OpenME
•Misc info
•Observed characteristics
•Observed statistical characteristics
•Finalize pipeline
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 52 / 66
Currently prepared experiments
•Polybench - numerical kernels with exposed parameters of all matrices in cM
• CPU: 28 prepared benchmarks
• CUDA: 15 prepared benchmarks
• OpenCL: 15 prepared benchmarks
• cBench - 23 benchmarks with 20 and 1000 datasets per benchmark
• Codelets - 44 codelets from embedded domain (provided by CAPS Entreprise)
• SPEC 2000/2006
• Description of 32-bit and 64-bit OS: Windows, Linux, Android
• Description of major compilers: GCC 4.x, LLVM 3.x, Open64/Pathscale 5.x, ICC 12.x
• Support for collection of hardware counters: perf, Intel vTune
• Support for frequency modification
• Validated on laptops, mobiles, tables, GRID/cloud - can work even from the USB key
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 53 / 66
Visualize and analyze optimization spaces
Program: cBench: susan corners Processor: ARM v6, 830MHz
Compiler: Sourcery GCC for ARM v4.6.1 OS: Android OS v2.3.5
System: Samsung Galaxy Y Data set: MiDataSet #1, image, 600x450x8b PGM, 263KB
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 54 / 66
Gradually increase granularity and complexity
Gradually expose
some characteristics
Gradually expose
some choices
Algorithm
selection
(time) productivity, variable-
accuracy, complexity …
Language, MPI, OpenMP, TBB, MapReduce …
Compile Program time … compiler flags; pragmas …
Code analysis &
Transformations
time;
memory usage;
code size …
transformation ordering;
polyhedral transformations;
transformation parameters;
instruction ordering …
Process
Thread
Function
Codelet
Loop
Instruction
Run code Run-time
environment
time; power consumption … pinning/scheduling …
System cost; size … CPU/GPU; frequency; memory hierarchy …
Data set size; values; description … precision …
Run-time
analysis
time; precision … hardware counters; power meters …
Run-time state processor state; cache state
…
helper threads; hardware counters …
Analyze profile time; size … instrumentation; profiling …
Coarse-grain vs. fine-grain effects: depends on user requirements and expected ROI
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 55 / 66
Interactive compilers, tools and applications
Application
Source-to-source
transformation tools
binary
execution
Binary transformation
tools
Production Compilers
Traditional compilation,
analysis and optimization
Often internal compiler
decisions are not known or
there is no precise control
even through pragmas.
Interference with internal
compiler optimizations
complicates program
analysis and
characterization.
Current pragma based auto-
tuning frameworks are very
complex.
What about finer-
grain level?
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 56 / 66
Detect optimization
flags
Optimization
manager
Pass 1
GCC Data Layer
AST, CFG, CF, etc
Compiler
...
Pass N
OpenME - interactive plugin and event-based
interface to “open up” applications and tools
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 57 / 66
Detect optimization
flags
Optimization
manager
Event
Pass N
Event
Pass 1
GCC Data Layer
AST, CFG, CF, etc
Feature
Event
Open
ME
Compiler with OpenME
...
OpenME - interactive plugin and event-based
interface to “open up” applications and tools
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 58 / 66
Detect optimization
flags
Optimization
manager
Event
Pass N
Event
Pass 1
GCC Data Layer
AST, CFG, CF, etc
Feature
Event
Open
ME
...
OpenME Plugins
cM modules
Selecting pass
sequences
Extracting static
program features
<Dynamically linked
shared libraries>
...
OpenME - interactive plugin and event-based
interface to “open up” applications and tools
Compiler with OpenME
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 59 / 66
Detect optimization
flags
Optimization
manager
Event
Pass N
Event
Pass 1
GCC Data Layer
AST, CFG, CF, etc
Feature
Event
Open
ME
...
OpenME Plugins
cM modules
Selecting pass
sequences
Extracting static
program features
<Dynamically linked
shared libraries>
...
We collaborated with Google
and Mozilla to move similar
framework to mainline GCC
so that everyone can use it for
research.
Now available in GCC >=4.6
OpenME - interactive plugin and event-based
interface to “open up” applications and tools
Compiler with OpenME
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 60 / 66
Add cM wrapper (compiler)
Application
Source-to-source
transformation tools
Production Compiler with OpenME
binary
execution
Binary transformation
tools
Very simple plugin framework
for any compiler or tool
Full control over optimization
decisions!
Remove interference between
different tools
cM framework
OpenME - interactive plugin and event-based
interface to “open up” applications and tools
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 61 / 66
Example of OpenME for LLVM 3.2
tools/clang/tools/driver/cc1_main.cpp
#include "openme.h“
…
int cc1_main(const char **ArgBegin, const char **ArgEnd,
const char *Argv0, void *MainAddr) {
openme_init("UNI_ALCHEMIST_USE", "UNI_ALCHEMIST_PLUGINS", NULL, 0);
…
// Execute the frontend actions.
Success = ExecuteCompilerInvocation(Clang.get());
openme_callback("ALC_FINISH", NULL);
…
}
OpenME: 3 functions only!
• openme_init(…) - initialize/load plugin
• openme_callback(char* event_name, void* params) - call event
• openme_finish(…) - finalize (if needed)
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 62 / 66
Example of OpenME for LLVM 3.2
lib/Transforms/Scalar/LoopUnrollPass.cpp
#include <cJSON.h>
#include "openme.h“
…
bool LoopUnroll::runOnLoop(Loop *L, LPPassManager &LPM) {
struct alc_unroll {
const char *func_name;
const char *loop_name;
cJSON *json;
int factor;
} alc_unroll;
…
alc_unroll.func_name=(Header->getParent()->getName()).data();
alc_unroll.loop_name=(Header->getName()).data();
openme_callback("ALC_TRANSFORM_UNROLL_INIT", &alc_unroll);
…
// Unroll the loop.
alc_unroll.factor=Count;
openme_callback("ALC_TRANSFORM_UNROLL", &alc_unroll);
Count=alc_unroll.factor;
if (!UnrollLoop(L, Count, TripCount, UnrollRuntime, TripMultiple, LI, &LPM))
return false;
…
}
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 63 / 66
Example of OpenME for LLVM 3.2
Alchemist plugin (.so/dll object) - in development
for online/interactive analysis, tuning and adaptation
#include <cJSON.h>
#include <openme.h>
int openme_plugin_init(struct openme_info *ome_info) {
…
openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL_INIT", alc_transform_unroll_init);
openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL", alc_transform_unroll);
openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL_FEATURES", alc_transform_unroll_features);
openme_register_callback(ome_info, "ALC_FINISH", alc_finish);
…
}
extern void alc_transform_unroll_init(struct alc_unroll *alc_unroll){
…
}
extern void alc_transform_unroll(struct alc_unroll *alc_unroll) {
…
}
…
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 64 / 66
Example of OpenME for OpenCL/CUDA C application
• 2mm.c / 2mm.cu
…
#ifdef OPENME
#include <openme.h>
#endif
…
int main(void) {
…
#ifdef OPENME
openme_init(NULL,NULL,NULL,0);
openme_callback("PROGRAM_START", NULL);
#endif
…
#ifdef OPENME
openme_callback("ACC_KERNEL_START", NULL);
#endif
cl_launch_kernel();
or
mm2Cuda(A, B, C, D, E, E_outputFromGpu);
#ifdef OPENME
openme_callback("ACC_KERNEL_END", NULL);
#endif
…
…
#ifdef OPENME
openme_callback("KERNEL_START", NULL);
#endif
mm2_cpu(A, B, C, D, E);
#ifdef OPENME
openme_callback("KERNEL_END", NULL);
#endif
#ifdef OPENME
openme_callback("PROGRAM_END", NULL);
#endif
…
}
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 65 / 66
Example of OpenME for Fortran application
• matmul.F
PROGRAM MATMULPROG
…
INTEGER*8 OBJ, OPENME_CREATE_OBJ_F
CALL OPENME_INIT_F(""//CHAR(0), ""//CHAR(0), ""//CHAR(0), 0)
CALL OPENME_CALLBACK_F("PROGRAM_START"//CHAR(0))
…
CALL OPENME_CALLBACK_F("KERNEL_START"//CHAR(0));
DO I=1, I_REPEAT
CALL MATMUL
END DO
CALL OPENME_CALLBACK_F("KERNEL_END"//CHAR(0));
…
CALL OPENME_CALLBACK_F("PROGRAM_END"//CHAR(0))
END
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 66 / 66
1) Prepare pre-release around May/June 2013 (BSD-style license) - ASK for preview!
2) Reproduce my past published research within new framework:
• Add “classical” classification and predictive models
• Add various exploration strategies (random, focused)
• Add run-time adaptation scenarios (CUDA/OpenCL scheduling, pinning, etc)
• Add co-design scenarios
3) Use framework for analysis and auto-tuning of industrial applications
4) Help to customize framework for industrial usages (consulting)
5) Applying for new funding (academic and industrial)
6) Continue virtual collaborative cTuning Lab to build community:
• Public repository to share applications, datasets, models at cTuning.org:
• New publication model for reproducible research
• Community R&D discussion
http://groups.google.com/group/collective-mind
• Collect data from Android mobiles
Next steps
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 67 / 66
Acknowledgements
• PhD students and postdocs (my Intel Exascale team)
Abdul Wahid Memon, Pablo Oliveira, Yuriy Kashnikov
• Colleague from NCAR, USA
Davide Del Vento and his colleagues/interns
• Colleagues from IBM, CAPS, ARC (Synopsys), Intel, Google, ARM, ST
• Colleagues from Intel (USA)
David Kuck and David Wong
• cTuning community:
• EU FP6, FP7 program and HiPEAC network of excellence
http://www.hipeac.net
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 68 / 66
Main references
• Grigori Fursin. Collective Tuning Initiative: automating and accelerating development and
optimization of computing systems. Proceedings of the GCC Summit’09, Montreal, Canada, June
2009
• Grigori Fursin and Olivier Temam. Collective Optimization: A Practical Collaborative Approach.
ACM Transactions on Architecture and Code Optimization (TACO), December 2010, Volume 7,
Number 4, pages 20-49
• Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea
Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, Francois Bodin, Phil Barnard,
Elton Ashton, Edwin Bonilla, John Thomson, Chris Williams, Michael O'Boyle. MILEPOST GCC:
machine learning enabled self-tuning compiler. International Journal of Parallel Programming
(IJPP), June 2011, Volume 39, Issue 3, pages 296-327
• Yang Chen, Shuangde Fang, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Olivier Temam and
Chengyong Wu. Deconstructing iterative optimization. ACM Transactions on Architecture and
Code Optimization (TACO), October 2012, Volume 9, Number 3
• Yang Chen, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Liang Peng, Olivier Temam,
Chengyong Wu. Evaluating Iterative Optimization across 1000 Data Sets. PLDI'10
• Victor Jimenez, Isaac Gelado, Lluis Vilanova, Marisa Gil, Grigori Fursin and Nacho Navarro.
Predictive runtime code scheduling for heterogeneous architectures. HiPEAC’09
Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 69 / 66
Main references
• Lianjie Luo, Yang Chen, Chengyong Wu, Shun Long and Grigori Fursin. Finding representative
sets of optimizations for adaptive multiversioning applications. SMART'09 co-located with
HiPEAC'09
• Grigori Fursin, John Cavazos, Michael O'Boyle and Olivier Temam. MiDataSets: Creating The
Conditions For A More Realistic Evaluation of Iterative Optimization. HiPEAC’07
• F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M.F.P. O'Boyle, J. Thomson, M. Toussaint
and C.K.I. Williams. Using Machine Learning to Focus Iterative Optimization. CGO’06
•Grigori Fursin, Albert Cohen, Michael O'Boyle and Oliver Temam. A Practical Method For
Quickly Evaluating Program Optimizations. HiPEAC’05
•Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for
Determining a Lower Bound on Execution Time. Concurrency Practice and Experience, 16(2-3),
pages 271-292, 2004
• Grigori Fursin. Iterative Compilation and Performance Prediction for Numerical Applications.
Ph.D. thesis, University of Edinburgh, Edinburgh, UK, January 2004

Mais conteúdo relacionado

Mais procurados

Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache MahoutAman Adhikari
 
Recent Trends in Incremental Clustering: A Review
Recent Trends in Incremental Clustering: A ReviewRecent Trends in Incremental Clustering: A Review
Recent Trends in Incremental Clustering: A ReviewIOSRjournaljce
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text Paul Groth
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...iosrjce
 
Event detection and summarization based on social networks and semantic query...
Event detection and summarization based on social networks and semantic query...Event detection and summarization based on social networks and semantic query...
Event detection and summarization based on social networks and semantic query...ijnlc
 
2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 TutorialAlexander Pico
 
A Survey on Trajectory Data Mining
A Survey on Trajectory Data MiningA Survey on Trajectory Data Mining
A Survey on Trajectory Data MiningCSCJournals
 
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...IJCSIS Research Publications
 
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATACOMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATAcscpconf
 
Combined mining approach to generate patterns for complex data
Combined mining approach to generate patterns for complex dataCombined mining approach to generate patterns for complex data
Combined mining approach to generate patterns for complex datacsandit
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYcscpconf
 
A Survey On Ontology Agent Based Distributed Data Mining
A Survey On Ontology Agent Based Distributed Data MiningA Survey On Ontology Agent Based Distributed Data Mining
A Survey On Ontology Agent Based Distributed Data MiningEditor IJMTER
 

Mais procurados (17)

Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache Mahout
 
Mahout
MahoutMahout
Mahout
 
Recent Trends in Incremental Clustering: A Review
Recent Trends in Incremental Clustering: A ReviewRecent Trends in Incremental Clustering: A Review
Recent Trends in Incremental Clustering: A Review
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
 
mahout introduction
mahout  introductionmahout  introduction
mahout introduction
 
Event detection and summarization based on social networks and semantic query...
Event detection and summarization based on social networks and semantic query...Event detection and summarization based on social networks and semantic query...
Event detection and summarization based on social networks and semantic query...
 
2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
A Survey on Trajectory Data Mining
A Survey on Trajectory Data MiningA Survey on Trajectory Data Mining
A Survey on Trajectory Data Mining
 
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
 
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATACOMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
 
Combined mining approach to generate patterns for complex data
Combined mining approach to generate patterns for complex dataCombined mining approach to generate patterns for complex data
Combined mining approach to generate patterns for complex data
 
Bx044461467
Bx044461467Bx044461467
Bx044461467
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
 
A Survey On Ontology Agent Based Distributed Data Mining
A Survey On Ontology Agent Based Distributed Data MiningA Survey On Ontology Agent Based Distributed Data Mining
A Survey On Ontology Agent Based Distributed Data Mining
 

Destaque

ระบบผู้เชี่ยวชาญด้านผักสวนครัว
ระบบผู้เชี่ยวชาญด้านผักสวนครัวระบบผู้เชี่ยวชาญด้านผักสวนครัว
ระบบผู้เชี่ยวชาญด้านผักสวนครัวkwanAtchariya
 
The national farmers information service (NAFIS) - Extension services through...
The national farmers information service (NAFIS) - Extension services through...The national farmers information service (NAFIS) - Extension services through...
The national farmers information service (NAFIS) - Extension services through...IAALD Community
 
Cyber extension communication
Cyber extension communicationCyber extension communication
Cyber extension communicationAtika Rusli
 
EXPERT SYSTEM FOR EFFECTIVE EXTENSION SERVICE by sabanna.k.meti(M.Sc agri) u...
 EXPERT SYSTEM FOR EFFECTIVE EXTENSION SERVICE by sabanna.k.meti(M.Sc agri) u... EXPERT SYSTEM FOR EFFECTIVE EXTENSION SERVICE by sabanna.k.meti(M.Sc agri) u...
EXPERT SYSTEM FOR EFFECTIVE EXTENSION SERVICE by sabanna.k.meti(M.Sc agri) u...Meti Sagar
 

Destaque (11)

ระบบผู้เชี่ยวชาญด้านผักสวนครัว
ระบบผู้เชี่ยวชาญด้านผักสวนครัวระบบผู้เชี่ยวชาญด้านผักสวนครัว
ระบบผู้เชี่ยวชาญด้านผักสวนครัว
 
Expert system
Expert systemExpert system
Expert system
 
The national farmers information service (NAFIS) - Extension services through...
The national farmers information service (NAFIS) - Extension services through...The national farmers information service (NAFIS) - Extension services through...
The national farmers information service (NAFIS) - Extension services through...
 
Expert systems in agriculture
Expert systems in agricultureExpert systems in agriculture
Expert systems in agriculture
 
Cyber extension communication
Cyber extension communicationCyber extension communication
Cyber extension communication
 
EXPERT SYSTEM FOR EFFECTIVE EXTENSION SERVICE by sabanna.k.meti(M.Sc agri) u...
 EXPERT SYSTEM FOR EFFECTIVE EXTENSION SERVICE by sabanna.k.meti(M.Sc agri) u... EXPERT SYSTEM FOR EFFECTIVE EXTENSION SERVICE by sabanna.k.meti(M.Sc agri) u...
EXPERT SYSTEM FOR EFFECTIVE EXTENSION SERVICE by sabanna.k.meti(M.Sc agri) u...
 
Expert Systems
Expert SystemsExpert Systems
Expert Systems
 
6.expert systems
6.expert systems6.expert systems
6.expert systems
 
3D Animation
3D Animation3D Animation
3D Animation
 
Rural development ppt
Rural development pptRural development ppt
Rural development ppt
 
Topic 8 expert system
Topic 8 expert systemTopic 8 expert system
Topic 8 expert system
 

Semelhante a Collective Mind infrastructure and repository to crowdsource auto-tuning (c-mind.org/repo)

PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems
PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems
PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems Navodaya Institute of Technology
 
Classifier Model using Artificial Neural Network
Classifier Model using Artificial Neural NetworkClassifier Model using Artificial Neural Network
Classifier Model using Artificial Neural NetworkAI Publications
 
Recommendation system using unsupervised machine learning algorithm & assoc
Recommendation system using unsupervised machine learning algorithm & assocRecommendation system using unsupervised machine learning algorithm & assoc
Recommendation system using unsupervised machine learning algorithm & associjerd
 
Costomization of recommendation system using collaborative filtering algorith...
Costomization of recommendation system using collaborative filtering algorith...Costomization of recommendation system using collaborative filtering algorith...
Costomization of recommendation system using collaborative filtering algorith...eSAT Publishing House
 
Metron seas collaboration
Metron seas collaborationMetron seas collaboration
Metron seas collaborationikekala
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data setsIjripublishers Ijri
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
Data clustering and optimization techniques
Data clustering and optimization techniquesData clustering and optimization techniques
Data clustering and optimization techniquesSpyros Ktenas
 
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesData Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesIJAEMSJORNAL
 
Open domain question answering system using semantic role labeling
Open domain question answering system using semantic role labelingOpen domain question answering system using semantic role labeling
Open domain question answering system using semantic role labelingeSAT Publishing House
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data CommonsSimon Twigger
 
Materialized View Generation Using Apriori Algorithm
Materialized View Generation Using Apriori AlgorithmMaterialized View Generation Using Apriori Algorithm
Materialized View Generation Using Apriori Algorithmijdms
 
An effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentAn effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentijdpsjournal
 
Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?Tao He
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...theijes
 

Semelhante a Collective Mind infrastructure and repository to crowdsource auto-tuning (c-mind.org/repo) (20)

PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems
PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems
PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems
 
Classifier Model using Artificial Neural Network
Classifier Model using Artificial Neural NetworkClassifier Model using Artificial Neural Network
Classifier Model using Artificial Neural Network
 
Recommendation system using unsupervised machine learning algorithm & assoc
Recommendation system using unsupervised machine learning algorithm & assocRecommendation system using unsupervised machine learning algorithm & assoc
Recommendation system using unsupervised machine learning algorithm & assoc
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Costomization of recommendation system using collaborative filtering algorith...
Costomization of recommendation system using collaborative filtering algorith...Costomization of recommendation system using collaborative filtering algorith...
Costomization of recommendation system using collaborative filtering algorith...
 
Metron seas collaboration
Metron seas collaborationMetron seas collaboration
Metron seas collaboration
 
H017625354
H017625354H017625354
H017625354
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
Final proj 2 (1)
Final proj 2 (1)Final proj 2 (1)
Final proj 2 (1)
 
UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Data clustering and optimization techniques
Data clustering and optimization techniquesData clustering and optimization techniques
Data clustering and optimization techniques
 
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesData Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
 
Open domain question answering system using semantic role labeling
Open domain question answering system using semantic role labelingOpen domain question answering system using semantic role labeling
Open domain question answering system using semantic role labeling
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 
Resume
ResumeResume
Resume
 
Materialized View Generation Using Apriori Algorithm
Materialized View Generation Using Apriori AlgorithmMaterialized View Generation Using Apriori Algorithm
Materialized View Generation Using Apriori Algorithm
 
An effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentAn effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded content
 
Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
 

Mais de Grigori Fursin

Accelerating open science and AI with automated, portable, customizable and r...
Accelerating open science and AI with automated, portable, customizable and r...Accelerating open science and AI with automated, portable, customizable and r...
Accelerating open science and AI with automated, portable, customizable and r...Grigori Fursin
 
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...Grigori Fursin
 
Enabling open and reproducible computer systems research: the good, the bad a...
Enabling open and reproducible computer systems research: the good, the bad a...Enabling open and reproducible computer systems research: the good, the bad a...
Enabling open and reproducible computer systems research: the good, the bad a...Grigori Fursin
 
CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...
CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...
CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...Grigori Fursin
 
CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...Grigori Fursin
 
Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...Grigori Fursin
 
Artifact Evaluation Experience CGO'15 / PPoPP'15
Artifact Evaluation Experience CGO'15 / PPoPP'15Artifact Evaluation Experience CGO'15 / PPoPP'15
Artifact Evaluation Experience CGO'15 / PPoPP'15Grigori Fursin
 
Collective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the massesCollective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the massesGrigori Fursin
 
Panel at acm_sigplan_trust2014
Panel at acm_sigplan_trust2014Panel at acm_sigplan_trust2014
Panel at acm_sigplan_trust2014Grigori Fursin
 
Collective Mind: a collaborative curation tool for program optimization
Collective Mind: a collaborative curation tool for program optimizationCollective Mind: a collaborative curation tool for program optimization
Collective Mind: a collaborative curation tool for program optimizationGrigori Fursin
 

Mais de Grigori Fursin (10)

Accelerating open science and AI with automated, portable, customizable and r...
Accelerating open science and AI with automated, portable, customizable and r...Accelerating open science and AI with automated, portable, customizable and r...
Accelerating open science and AI with automated, portable, customizable and r...
 
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
 
Enabling open and reproducible computer systems research: the good, the bad a...
Enabling open and reproducible computer systems research: the good, the bad a...Enabling open and reproducible computer systems research: the good, the bad a...
Enabling open and reproducible computer systems research: the good, the bad a...
 
CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...
CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...
CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...
 
CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...CK: from ad hoc computer engineering to collaborative and reproducible data s...
CK: from ad hoc computer engineering to collaborative and reproducible data s...
 
Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...
 
Artifact Evaluation Experience CGO'15 / PPoPP'15
Artifact Evaluation Experience CGO'15 / PPoPP'15Artifact Evaluation Experience CGO'15 / PPoPP'15
Artifact Evaluation Experience CGO'15 / PPoPP'15
 
Collective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the massesCollective Mind: bringing reproducible research to the masses
Collective Mind: bringing reproducible research to the masses
 
Panel at acm_sigplan_trust2014
Panel at acm_sigplan_trust2014Panel at acm_sigplan_trust2014
Panel at acm_sigplan_trust2014
 
Collective Mind: a collaborative curation tool for program optimization
Collective Mind: a collaborative curation tool for program optimizationCollective Mind: a collaborative curation tool for program optimization
Collective Mind: a collaborative curation tool for program optimization
 

Último

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Último (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Collective Mind infrastructure and repository to crowdsource auto-tuning (c-mind.org/repo)

  • 1. Collective Mind infrastructure and repository to crowdsource auto-tuning (c-mind.org/repo) Grigori Fursin INRIA, France HPSC 2013, Taiwan March 2013
  • 2. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 2 / 66 • Collective Mind approach combined with social networking, expert knowledge and predictive modeling • Collective Mind framework basics • Plugin-based type-free and schema-free infrastructure • Unified web interfaces (similar to WEB 2.0 concept) • Portable file (json) based repository • Auto-tuning and predictive modeling scenarios • Demo • Conclusions and future works Background
  • 3. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 3 / 66 Solution Motivation: back to basics Decision (depends on user requirements) Result Available choices (solutions) User requirements: most common: minimize all costs (time, power consumption, price, size, faults, etc) guarantee real-time constraints (bandwidth, QOS, etc) Service/application providers (supercomputing, cloud computing, mobile systems) Should provide choices and help with decisions Hardware and software designers End users Task
  • 4. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 4 / 66 Solutions Challenges Task Result GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA gprof prof perf oprofile PAPI TAU Scalasca VTune Amplifierscheduling algorithm- level TBB MKL ATLASprogram- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threadsprocess pass reordering run-time adaptation per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size cores processors threads power consumptionexecution time reliability Clean up this mess! Simplify analysis, tuning and modelling of computer systems for non-computer engineers Bring together researchers from interdisciplinary communities
  • 5. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 5 / 66 Treat computer system as a black box Task Result Understanding computer systems’ behavior: a physicist’s approach
  • 6. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 6 / 66 Treat computer system as a black box Task Result Understanding computer systems’ behavior: a physicist’s approach Expose object information flow
  • 7. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 7 / 66 Treat computer system as a black box Task Result Expose object information flow Object wrapper and repository: observe behavior and keep history information flow expose characteristics Observe system
  • 8. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 8 / 66 Treat computer system as a black box Task Result Gradually expose properties, characteristics, choices Expose object expose and explore choices information flow Object wrapper and repository: observe behavior and keep history (choices, properties, characteristics, system state, data) information flow expose characteristics Universal Learning Module expose system state expose properties expose characteristics set requirements
  • 9. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 9 / 66 Treat computer system as a black box Task Result Classify, build models, predict behavior Expose object information flow expose system state Object wrapper and repository: observe behavior and keep history (choices, properties, characteristics, system state, data) history (experience) information flow expose properties expose characteristics expose characteristics continuously build, validate or refine classification and predictive model on the fly expose and explore choices set requirements Universal Learning Module Evolutionary approach, not revolutionary, i.e. do not rebuild existing SW/HW stack from scratch but wrap up existing tools and techniques, and gradually clean up the mess!
  • 10. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 10 / 66 Unified web interface Unified web interface Unified web interface Unified web interface Expose and exchange optimization knowledge and models in a unified way through http (WEB 2.0) Use distributed, noSQL repository to store highly heterogeneous “big” data Transparently crowdsource learning of a behavior of any existing mobile, cluster, cloud computer system Extrapolate collective knowledge to build faster and more power efficient computer systems Build self-tuning machines using agent-based models
  • 11. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 11 / 66 Treat computer system as a black box Task Result Gradual decomposition, parameterization, observation and exploration of a system Application Compilers and auxiliary tools Binary and libraries Architecture Run-time environment State of the system Data set Algorithm
  • 12. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 12 / 66 Treat computer system as a black box Task Result Gradual decomposition, parameterization, observation and exploration of a system Application Compilers and auxiliary tools Binary and libraries Architecture Run-time environment State of the system Data set Algorithm Repo/models Repo/models Repo/models Repo/models Repo/models Repo/models Repo/models Repo/models cTuning3 framework aka Collective Mind
  • 13. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 13 / 66 Treat computer system as a black box Task Result Gradual top-down decomposition, parameterization, observation and exploration of a system Application Compilers and auxiliary tools Binary and libraries Architecture Run-time environment State of the system Data set Algorithm Repo/models Repo/models Repo/models Repo/models Repo/models Repo/models Repo/models Repo/models Light-weightinterfacetoconnectmodules,dataandmodels cTuning3 framework aka Collective Mind
  • 14. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 14 / 66 Example of characterizing/explaining behavior of computer systems Gradually expose some characteristics Gradually expose some properties/choices Compile Program time … compiler flags; pragmas … Run code Run-time environment time; CPI, power consumption … pinning/scheduling … System cost; architecture; frequency; cache size… Data set size; values; description … precision … Analyze profile time; size … instrumentation; profiling … Start coarse-grain decomposition of a system (detect coarse-grain effects first). Add universal learning modules. Combine expert knowledge with automatic detection! Start from coarse-grain and gradually move to fine-grain level!
  • 15. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 15 / 66 0 1 2 3 4 5 6 Program/architecturebehavior:CPI How we can explain the following observations for some piece of code (“codelet object”)? (LU-decomposition codelet, Intel Nehalem) Example of characterizing/explaining behavior of computer systems
  • 16. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 16 / 66 Add 1 property: matrix size 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset property: matrix size Example of characterizing/explaining behavior of computer systems
  • 17. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 17 / 66 Try to build a model to correlate objectives (CPI) and features (matrix size). Start from simple models: linear regression (detect coarse grain effects) 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset property: matrix size Example of characterizing/explaining behavior of computer systems
  • 18. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 18 / 66 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset properties: matrix size If more observations, validate model and detect discrepancies! Continuously retrain models to fit new data! Use model to “focus” exploration on “unusual” behavior! Example of characterizing/explaining behavior of computer systems
  • 19. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 19 / 66 Gradually increase model complexity if needed (hierarchical modeling). For example, detect fine-grain effects (singularities) and characterize them. 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset properties: matrix size Example of characterizing/explaining behavior of computer systems
  • 20. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 20 / 66 Start adding more properties (one more architecture with twice bigger cache)! Use automatic approach to correlate all objectives and features. 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset properties: matrix size L3 = 4Mb L3 = 8Mb Example of characterizing/explaining behavior of computer systems
  • 21. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 21 / 66 Continuously build and refine classification (decision trees for example) and predictive models on all collected data to improve predictions. Continue exploring design and optimization spaces (evaluate different architectures, optimizations, compilers, etc.) Focus exploration on unexplored areas, areas with high variability or with high mispredict rate of models β εcM predictive model module CPI = ε + 1000 × β × data size Example of characterizing/explaining behavior of computer systems
  • 22. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 22 / 66 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 1 2 3 4 5 6 Dataset features: matrix size Code/architecturebehavior:CPI Size < 1012 1012 < Size < 2042 Size > 2042 & GCC Size > 2042 & ICC & O2 Size > 2042 & ICC & O3 Optimize decision tree (many different algorithms) Balance precision vs cost of modeling = ROI (coarse-grain vs fine-grain effects) Compact data on-line before sharing with other users! Model optimization and data compaction
  • 23. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 23 / 66 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Code/architecturebehavior:CPI Dataset features: matrix size Collaboratively and continuously add expert advices or automatic optimizations. Extensible and collaborative advice system
  • 24. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 24 / 66 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Code/architecturebehavior:CPI Dataset features: matrix size Collaboratively and continuously add expert advices or automatic optimizations. Automatically characterize problem (extract all possible features: hardware counters, semantic features, static features, state of the system, etc) Add manual analysis if needed Extensible and collaborative advice system
  • 25. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 25 / 66 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Code/architecturebehavior:CPI Dataset features: matrix size cM advice system: Possible problem: Cache conflict misses degrade performance Collaboratively and continuously add expert advices or automatic optimizations. Extensible and collaborative advice system
  • 26. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 26 / 66 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Code/architecturebehavior:CPI Dataset features: matrix size cM advice system: Possible problem: Cache conflict misses degrade performance Possible solution: Array padding (A[N,N] -> A[N,N+1]) Effect: ~30% execution time improvement Collaboratively and continuously add expert advices or automatic optimizations. Extensible and collaborative expert system
  • 27. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 27 / 66 0 50 100 150 200 250 300 350 400 450 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Executiontime,sec Dataset features: matrix size Add dynamic memory characterization through semantically non-equivalent modifications. For example, convert all array accesses to scalars to detect balance between CPU/memory accesses. Intentionally change/break semantics to observe reaction in terms of performance/power etc! for for for addr = a[0,0] load … [addr+index1]… mulss … [addr+index2]… subss … [addr+index3]… store … [addr+index4]… addr + = 4 System reaction to code changes: physicist’s view Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for Determining a Lower Bound on Execution Time. Concurrency Practice and Experience, 16(2-3), pages 271-292, 2004
  • 28. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 28 / 66 0 50 100 150 200 250 300 350 400 450 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Executiontime,sec Dataset features: matrix size for for for addr = a[0,0] load … [addr+index1]… mulss … [addr+index2]… subss … [addr+index3]… store … [addr+index4]… addr + = 0 Add dynamic memory characterization through semantically non-equivalent modifications. For example, convert all array accesses to scalars to detect balance between CPU/memory accesses. Intentionally change/break semantics to observe reaction in terms of performance/power etc! Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for Determining a Lower Bound on Execution Time. Concurrency Practice and Experience, 16(2-3), pages 271-292, 2004 System reaction to code changes: physicist’s view
  • 29. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 29 / 66 0 50 100 150 200 250 300 350 400 450 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Executiontime,sec Dataset features: matrix size Expert or automatic advices based on additional information in the repository! Focus optimizations to speed up search: which/where? Advice: Small gap (arithmetic dominates): • Focus on ILP optimizations • Run on complex out-of-order core • Increase processor frequency to speed up application Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for Determining a Lower Bound on Execution Time. Concurrency Practice and Experience, 16(2-3), pages 271-292, 2004 System reaction to code changes: physicist’s view
  • 30. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 30 / 66 0 50 100 150 200 250 300 350 400 450 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Executiontime,sec Dataset features: matrix size Well-knowngapbetweenarithmetic anddataaccesses Advice: Small gap (arithmetic dominates): • Focus on ILP optimizations • Run on complex out-of-order core • Increase processor frequency to speed up application Expert or automatic advices based on additional information in the repository! Focus optimizations to speed up search: which/where? Advice: Big gap (data accesses dominate): • Focus on memory optimizations • Run on simple core • Decrease processor frequency to save power Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for Determining a Lower Bound on Execution Time. Concurrency Practice and Experience, 16(2-3), pages 271-292, 2004 System reaction to code changes: physicist’s view
  • 31. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 31 / 66 0 50 100 150 200 250 300 350 400 450 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Executiontime,sec Dataset features: matrix size Well-knowngapbetweenarithmetic anddataaccesses Advice: Small gap (arithmetic dominates): • Focus on ILP optimizations • Run on complex out-of-order core • Increase processor frequency to speed up application Expert or automatic advices based on additional information in the repository! Focus optimizations to speed up search: which/where? Advice: Big gap (data accesses dominate): • Focus on memory optimizations • Run on simple core • Decrease processor frequency to save power Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for Determining a Lower Bound on Execution Time. Concurrency Practice and Experience, 16(2-3), pages 271-292, 2004 System reaction to code changes: physicist’s view Can add/remove instructions, Can add/remove threads, etc …
  • 32. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 32 / 66 cd [application_directory] make CC=icc CC_OPTS=-fast or icc -fast *.c time ./a.out < [my_dataset] > [output] record “-fast”, execution time Implementation in open-source cTuning1 framework
  • 33. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 33 / 66 Implementation in open-source cTuning1 framework cd [application_directory] make CC=icc CC_OPTS=-fast or icc -fast *.c time ./a.out < [my_dataset] > [output] record “-fast”, execution time ccc-comp build=“make” compiler=icc opts=“-fast” ccc-comp compiler=icc opts=“-fast” ccc-run prog=./a.out cmd=“< [my dataset]” ccc-time <cmd> ccc-record-stats <file_with_stats> • Low level platform-dependent plugins in C • Communication through text file or directly through MySQL database • High level platform-independent exploration or analysis plugins in PHP • Web services at cTuning.org as plugins in PHP
  • 34. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 34 / 66 Compilation abstraction supports multiple compilers and languages ccc-comp Compiler(s) Multiple user architectures Execution abstraction supports multiple architectures, remote SSH execution, profiling ccc-run Low-level execution abstraction timing, profiling, hardware counters ccc-time Application Interactive Compilation Interface fine-grain analysis and optimization Multiple datasets (from cBench) Implementation in open-source cTuning1 framework
  • 35. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 35 / 66 High-level optimization tools compiler and platform independent - perform iterative optimization: compiler flags or ICI passes/parameters [random, one off, one by one, focused] - user optimization plugins ccc-run-* Compilation abstraction supports multiple compilers and languages ccc-comp Compiler(s) Multiple user architectures High-level machine learning tools - build ML models - extract program static/dynamic features - predict “good” program optimizations ccc-ml-* Execution abstraction supports multiple architectures, remote SSH execution, profiling ccc-run Low-level execution abstraction timing, profiling, hardware counters ccc-time Application Interactive Compilation Interface fine-grain analysis and optimization Multiple datasets (from cBench) Implementation in open-source cTuning1 framework
  • 36. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 36 / 66 High-level optimization tools compiler and platform independent - perform iterative optimization: compiler flags or ICI passes/parameters [random, one off, one by one, focused] - user optimization plugins ccc-run-* Compilation abstraction supports multiple compilers and languages ccc-comp Communication with COD ccc-db-send-stats-* Compiler(s) Multiple user architectures Collective Optimization Database (COD) MySQL Common database: keep info about all architectures, environments, programs, compilers, etc Optimization database: keep info about interesting optimization cases from the community and expert advices/knowledge High-level machine learning tools - build ML models - extract program static/dynamic features - predict “good” program optimizations ccc-ml-* COD tools - administration db/admin - optimization analysis db/analysis Execution abstraction supports multiple architectures, remote SSH execution, profiling ccc-run Communication with COD ccc-db-send-stats* Low-level execution abstraction timing, profiling, hardware counters ccc-time Application Interactive Compilation Interface fine-grain analysis and optimization Multiple datasets (from cBench) CCC configuration configure all tools, COD, architecture, compiler, benchmarks, etc ccc-configure-* Implementation in open-source cTuning1 framework
  • 37. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 37 / 66 High-level optimization tools compiler and platform independent - perform iterative optimization: compiler flags or ICI passes/parameters [random, one off, one by one, focused] - user optimization plugins ccc-run-* Compilation abstraction supports multiple compilers and languages ccc-comp Communication with COD ccc-db-send-stats-* Compiler(s) Multiple user architectures Collective Optimization Database (COD) MySQL Common database: keep info about all architectures, environments, programs, compilers, etc Optimization database: keep info about interesting optimization cases from the community and expert advices/knowledge High-level machine learning tools - build ML models - extract program static/dynamic features - predict “good” program optimizations ccc-ml-* COD tools - administration db/admin - optimization analysis db/analysis Execution abstraction supports multiple architectures, remote SSH execution, profiling ccc-run Communication with COD ccc-db-send-stats* Low-level execution abstraction timing, profiling, hardware counters ccc-time Application Interactive Compilation Interface fine-grain analysis and optimization Multiple datasets (from cBench) Web access to COD http://ctuning.org CCC configuration configure all tools, COD, architecture, compiler, benchmarks, etc ccc-configure-* Implementation in open-source cTuning1 framework
  • 38. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 38 / 66 MySQL-based Collective Optimization Database Platforms unique PLATFORM_ID Compilers unique COMPILER_ID Runtime environments unique RE_ID Programs unique PROGRAM_ID Datasets unique DATASET_ID Platform features unique PLATFORM_FEATURE_ID Global platform optimization flags unique OPT_PLATFORM_ID Global optimization flags unique OPT_ID Optimization passes unique OPT_PASSES_ID Compilation info unique COMPILE_ID Execution info unique RUN_ID unique RUN_ID_ASSOCIATE Program passes associated COMPILE_ID Program features associated COMPILE_ID Common Optimization Database (shared among all users) Local or shared databases with optimization cases
  • 39. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 39 / 66 Problems with cTuning1 • Difficult to extend (C, various hardwired components, need to change schema and types in MySQL) • No convenient way of sharing modules, benchmarks, data sets, models (manual, csv files, emails, etc) • Problems with repository scalability • Complex, hardwired interfaces
  • 40. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 40 / 66 cd [application_directory] make CC=icc CC_OPTS=-fast or icc -fast *.c time ./a.out < [my_dataset] > [output] record “-fast”, execution time cTuning3 aka Collective Mind framework basics
  • 41. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 41 / 66 cd [application_directory] make CC=icc CC_OPTS=-fast or icc -fast *.c time ./a.out < [my_dataset] > [output] record “-fast”, execution time cTuning3 aka Collective Mind framework basics cM kernel code.source build Universal cM FE cM plugins (modules) code run compiler build … E nd-users or cM developers CMD python python or any other language
  • 42. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 42 / 66 cd [application_directory] make CC=icc CC_OPTS=-fast or icc -fast *.c time ./a.out < [my_dataset] > [output] record “-fast”, execution time cTuning3 aka Collective Mind framework basics cM kernel code.source build Universal cM FE cM plugins (modules) code run compiler build … E nd-users or cM developers CMD python python or any other language cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line) cm code.source build ct_compiler=icc13 ct_optimizations=-fast cm compiler build -- icc -fast *.c cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
  • 43. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 43 / 66 cTuning3 aka Collective Mind framework basics Simple and minimalistic high-level cM interface - one function (!) should be easy to connect to any language if needed schema and type-free (only strings) - easily extended when needed for research (agile methodology)! (python dictionary) output = cm_kernel.access ( (python dictionary) input ) Input: { cm_run_module_uoa - cM plugin name (or some UID) cm_action - cM plugin action (function) parameters - (module and action dependent) } Output: { cm_return - if 0, success if >0, error if <0, warning cm_error - if cm_return>0, error message parameters - (module and action dependent) }
  • 44. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 44 / 66 Application Compilers and auxiliary tools Binary and libraries Architecture Run-time environment State of the system Data set Algorithm Repo/models Repo/models Repo/models Repo/models Repo/models Repo/models Repo/models Repo/models Collective Mind Repository basics
  • 45. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 45 / 66 Application Compilers and auxiliary tools Binary and libraries Architecture Run-time environment State of the system Data set Algorithm Repo/models Repo/models Repo/models Repo/models Repo/models Repo/models Repo/models Repo/models .cmr / module UID or alias (cM UOA) / data UID or alias (cm UOA) Repository root First level directory Second level directory Very flexible and portable Easy to access, edit and move data Can be per application, experiment, architecture, etc Can be easily shared (through web, SVN, GIT, FTP) Collective Mind Repository basics
  • 46. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 46 / 66 Schema-free extensible data meta-representation cM uses JSON as internal data representation format: JSON or JavaScript Object Notation, is a text-based open standard designed for human-readable data interchange (from Wikipedia) • very intuitive to use and modify • nearly native for python and php; simple libraries for Java, C, C++, … • easy to index with powerful indexing services (cM uses ElasticSearch) cM records input and output of the module for reproducibility! Data is referenced by CID: (Repository UID:) Module UID: Data UID
  • 47. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 47 / 66 Example of JSON entry ctuning.compiler:icc-12.x-linux { "all_compiler_flags_desc": { "##base_flag": { "type": "text“ "desc_text": "compiler flag: -O3", "field_size": "7", "has_choice": "yes", "choice": [ "-O0", "-O1", "-O2", "-Os", "-O3", "-fast" ], "default_value": "-O3", "explorable": "yes", "explore_level": "1", "explore_type": "fixed", "forbid_disable_at_random": "yes" }, … } … } Schema-free extensible data representation
  • 48. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 48 / 66 Treat computer system as a black box Task Result Execute code Input dictionary that can be represented as JSON {“run_cmd”:’image’, “binary_name”:”a.out”, “run_set_env”:[], “run_output_files”:[], “work_dir”:””, “run_time_out”:””, “code_deps”:[], “run_os_uoa”:””, …} Function (for example “run”) invoke object, observe behavior and keep history python module that has a UID and alias (for example, “code” / 688dee2e7014f1fb) Universal modules/functions Output dictionary that can be represented as JSON {“cm_return”:’’, “failed_run”:””, “run_time_by_module”:””, …} gradually describe input and output (choices, characteristics, etc) Save input/output and related files Index input and output for fast search
  • 49. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 49 / 66 cM repository (.cmr) cM data Web Command line and scripts cM kernel Programs Tools Systems OpenME cM interface php shell (cm) C C++ Fortran Java End user research and development scenarios End user instrumented and adaptive programs, tools and systems cM kernel in Python* •native support for other languages maybe added later access load data load module create entry delete entry list entries auth user … kernel core repo user module web code code.source os package dataset … … … … … … … … … … … … Collective Mind overall structure • Gradually add more modules, interfaces and data depending on user/project/company needs • Gradually add more parameters • Gradually expose choices, properties, characteristics Community or workgroup cmx - user- friendly CMD extension
  • 50. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 50 / 66 Academia: public, open-source modules and data Collaborative, reproducible experiments: research LEGO code.source code system os dataset package ctuning.processor ctuning.compiler choice os.script Industry: proprietary modules and data code.source code system os dataset package ctuning.processor ctuning.compiler choice os.script •Continuously adding “basic blocks” (modules) •Adding tools, applications, datasets •Gradually stabilize interfaces Users can start connecting modules and data together to prepare experimental pipelines with various observation, characterization, auto-tuning and predictive scenarios!
  • 51. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 51 / 66 Experimental pipelines for auto-tuning and modeling •Init pipeline •Detected system information •Initialize parameters •Prepare dataset •Clean program •Prepare compiler flags •Use compiler profiling •Use cTuning CC/MILEPOST GCC for fine-grain program analysis and tuning •Use universal Alchemist plugin (with any OpenME-compatible compiler or tool) •Use Alchemist plugin (currently for GCC) •Build program •Get objdump and md5sum (if supported) •Use OpenME for fine-grain program analysis and online tuning (build & run) •Use 'Intel VTune Amplifier' to collect hardware counters •Use 'perf' to collect hardware counters •Set frequency (in Unix, if supported) •Get system state before execution •Run program •Check output for correctness (use dataset UID to save different outputs) •Finish OpenME •Misc info •Observed characteristics •Observed statistical characteristics •Finalize pipeline
  • 52. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 52 / 66 Currently prepared experiments •Polybench - numerical kernels with exposed parameters of all matrices in cM • CPU: 28 prepared benchmarks • CUDA: 15 prepared benchmarks • OpenCL: 15 prepared benchmarks • cBench - 23 benchmarks with 20 and 1000 datasets per benchmark • Codelets - 44 codelets from embedded domain (provided by CAPS Entreprise) • SPEC 2000/2006 • Description of 32-bit and 64-bit OS: Windows, Linux, Android • Description of major compilers: GCC 4.x, LLVM 3.x, Open64/Pathscale 5.x, ICC 12.x • Support for collection of hardware counters: perf, Intel vTune • Support for frequency modification • Validated on laptops, mobiles, tables, GRID/cloud - can work even from the USB key
  • 53. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 53 / 66 Visualize and analyze optimization spaces Program: cBench: susan corners Processor: ARM v6, 830MHz Compiler: Sourcery GCC for ARM v4.6.1 OS: Android OS v2.3.5 System: Samsung Galaxy Y Data set: MiDataSet #1, image, 600x450x8b PGM, 263KB
  • 54. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 54 / 66 Gradually increase granularity and complexity Gradually expose some characteristics Gradually expose some choices Algorithm selection (time) productivity, variable- accuracy, complexity … Language, MPI, OpenMP, TBB, MapReduce … Compile Program time … compiler flags; pragmas … Code analysis & Transformations time; memory usage; code size … transformation ordering; polyhedral transformations; transformation parameters; instruction ordering … Process Thread Function Codelet Loop Instruction Run code Run-time environment time; power consumption … pinning/scheduling … System cost; size … CPU/GPU; frequency; memory hierarchy … Data set size; values; description … precision … Run-time analysis time; precision … hardware counters; power meters … Run-time state processor state; cache state … helper threads; hardware counters … Analyze profile time; size … instrumentation; profiling … Coarse-grain vs. fine-grain effects: depends on user requirements and expected ROI
  • 55. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 55 / 66 Interactive compilers, tools and applications Application Source-to-source transformation tools binary execution Binary transformation tools Production Compilers Traditional compilation, analysis and optimization Often internal compiler decisions are not known or there is no precise control even through pragmas. Interference with internal compiler optimizations complicates program analysis and characterization. Current pragma based auto- tuning frameworks are very complex. What about finer- grain level?
  • 56. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 56 / 66 Detect optimization flags Optimization manager Pass 1 GCC Data Layer AST, CFG, CF, etc Compiler ... Pass N OpenME - interactive plugin and event-based interface to “open up” applications and tools
  • 57. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 57 / 66 Detect optimization flags Optimization manager Event Pass N Event Pass 1 GCC Data Layer AST, CFG, CF, etc Feature Event Open ME Compiler with OpenME ... OpenME - interactive plugin and event-based interface to “open up” applications and tools
  • 58. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 58 / 66 Detect optimization flags Optimization manager Event Pass N Event Pass 1 GCC Data Layer AST, CFG, CF, etc Feature Event Open ME ... OpenME Plugins cM modules Selecting pass sequences Extracting static program features <Dynamically linked shared libraries> ... OpenME - interactive plugin and event-based interface to “open up” applications and tools Compiler with OpenME
  • 59. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 59 / 66 Detect optimization flags Optimization manager Event Pass N Event Pass 1 GCC Data Layer AST, CFG, CF, etc Feature Event Open ME ... OpenME Plugins cM modules Selecting pass sequences Extracting static program features <Dynamically linked shared libraries> ... We collaborated with Google and Mozilla to move similar framework to mainline GCC so that everyone can use it for research. Now available in GCC >=4.6 OpenME - interactive plugin and event-based interface to “open up” applications and tools Compiler with OpenME
  • 60. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 60 / 66 Add cM wrapper (compiler) Application Source-to-source transformation tools Production Compiler with OpenME binary execution Binary transformation tools Very simple plugin framework for any compiler or tool Full control over optimization decisions! Remove interference between different tools cM framework OpenME - interactive plugin and event-based interface to “open up” applications and tools
  • 61. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 61 / 66 Example of OpenME for LLVM 3.2 tools/clang/tools/driver/cc1_main.cpp #include "openme.h“ … int cc1_main(const char **ArgBegin, const char **ArgEnd, const char *Argv0, void *MainAddr) { openme_init("UNI_ALCHEMIST_USE", "UNI_ALCHEMIST_PLUGINS", NULL, 0); … // Execute the frontend actions. Success = ExecuteCompilerInvocation(Clang.get()); openme_callback("ALC_FINISH", NULL); … } OpenME: 3 functions only! • openme_init(…) - initialize/load plugin • openme_callback(char* event_name, void* params) - call event • openme_finish(…) - finalize (if needed)
  • 62. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 62 / 66 Example of OpenME for LLVM 3.2 lib/Transforms/Scalar/LoopUnrollPass.cpp #include <cJSON.h> #include "openme.h“ … bool LoopUnroll::runOnLoop(Loop *L, LPPassManager &LPM) { struct alc_unroll { const char *func_name; const char *loop_name; cJSON *json; int factor; } alc_unroll; … alc_unroll.func_name=(Header->getParent()->getName()).data(); alc_unroll.loop_name=(Header->getName()).data(); openme_callback("ALC_TRANSFORM_UNROLL_INIT", &alc_unroll); … // Unroll the loop. alc_unroll.factor=Count; openme_callback("ALC_TRANSFORM_UNROLL", &alc_unroll); Count=alc_unroll.factor; if (!UnrollLoop(L, Count, TripCount, UnrollRuntime, TripMultiple, LI, &LPM)) return false; … }
  • 63. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 63 / 66 Example of OpenME for LLVM 3.2 Alchemist plugin (.so/dll object) - in development for online/interactive analysis, tuning and adaptation #include <cJSON.h> #include <openme.h> int openme_plugin_init(struct openme_info *ome_info) { … openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL_INIT", alc_transform_unroll_init); openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL", alc_transform_unroll); openme_register_callback(ome_info, "ALC_TRANSFORM_UNROLL_FEATURES", alc_transform_unroll_features); openme_register_callback(ome_info, "ALC_FINISH", alc_finish); … } extern void alc_transform_unroll_init(struct alc_unroll *alc_unroll){ … } extern void alc_transform_unroll(struct alc_unroll *alc_unroll) { … } …
  • 64. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 64 / 66 Example of OpenME for OpenCL/CUDA C application • 2mm.c / 2mm.cu … #ifdef OPENME #include <openme.h> #endif … int main(void) { … #ifdef OPENME openme_init(NULL,NULL,NULL,0); openme_callback("PROGRAM_START", NULL); #endif … #ifdef OPENME openme_callback("ACC_KERNEL_START", NULL); #endif cl_launch_kernel(); or mm2Cuda(A, B, C, D, E, E_outputFromGpu); #ifdef OPENME openme_callback("ACC_KERNEL_END", NULL); #endif … … #ifdef OPENME openme_callback("KERNEL_START", NULL); #endif mm2_cpu(A, B, C, D, E); #ifdef OPENME openme_callback("KERNEL_END", NULL); #endif #ifdef OPENME openme_callback("PROGRAM_END", NULL); #endif … }
  • 65. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 65 / 66 Example of OpenME for Fortran application • matmul.F PROGRAM MATMULPROG … INTEGER*8 OBJ, OPENME_CREATE_OBJ_F CALL OPENME_INIT_F(""//CHAR(0), ""//CHAR(0), ""//CHAR(0), 0) CALL OPENME_CALLBACK_F("PROGRAM_START"//CHAR(0)) … CALL OPENME_CALLBACK_F("KERNEL_START"//CHAR(0)); DO I=1, I_REPEAT CALL MATMUL END DO CALL OPENME_CALLBACK_F("KERNEL_END"//CHAR(0)); … CALL OPENME_CALLBACK_F("PROGRAM_END"//CHAR(0)) END
  • 66. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 66 / 66 1) Prepare pre-release around May/June 2013 (BSD-style license) - ASK for preview! 2) Reproduce my past published research within new framework: • Add “classical” classification and predictive models • Add various exploration strategies (random, focused) • Add run-time adaptation scenarios (CUDA/OpenCL scheduling, pinning, etc) • Add co-design scenarios 3) Use framework for analysis and auto-tuning of industrial applications 4) Help to customize framework for industrial usages (consulting) 5) Applying for new funding (academic and industrial) 6) Continue virtual collaborative cTuning Lab to build community: • Public repository to share applications, datasets, models at cTuning.org: • New publication model for reproducible research • Community R&D discussion http://groups.google.com/group/collective-mind • Collect data from Android mobiles Next steps
  • 67. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 67 / 66 Acknowledgements • PhD students and postdocs (my Intel Exascale team) Abdul Wahid Memon, Pablo Oliveira, Yuriy Kashnikov • Colleague from NCAR, USA Davide Del Vento and his colleagues/interns • Colleagues from IBM, CAPS, ARC (Synopsys), Intel, Google, ARM, ST • Colleagues from Intel (USA) David Kuck and David Wong • cTuning community: • EU FP6, FP7 program and HiPEAC network of excellence http://www.hipeac.net
  • 68. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 68 / 66 Main references • Grigori Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems. Proceedings of the GCC Summit’09, Montreal, Canada, June 2009 • Grigori Fursin and Olivier Temam. Collective Optimization: A Practical Collaborative Approach. ACM Transactions on Architecture and Code Optimization (TACO), December 2010, Volume 7, Number 4, pages 20-49 • Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, Francois Bodin, Phil Barnard, Elton Ashton, Edwin Bonilla, John Thomson, Chris Williams, Michael O'Boyle. MILEPOST GCC: machine learning enabled self-tuning compiler. International Journal of Parallel Programming (IJPP), June 2011, Volume 39, Issue 3, pages 296-327 • Yang Chen, Shuangde Fang, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Olivier Temam and Chengyong Wu. Deconstructing iterative optimization. ACM Transactions on Architecture and Code Optimization (TACO), October 2012, Volume 9, Number 3 • Yang Chen, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Liang Peng, Olivier Temam, Chengyong Wu. Evaluating Iterative Optimization across 1000 Data Sets. PLDI'10 • Victor Jimenez, Isaac Gelado, Lluis Vilanova, Marisa Gil, Grigori Fursin and Nacho Navarro. Predictive runtime code scheduling for heterogeneous architectures. HiPEAC’09
  • 69. Grigori Fursin “Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning” HPSC 2013, NTU, Taiwan March, 2013 69 / 66 Main references • Lianjie Luo, Yang Chen, Chengyong Wu, Shun Long and Grigori Fursin. Finding representative sets of optimizations for adaptive multiversioning applications. SMART'09 co-located with HiPEAC'09 • Grigori Fursin, John Cavazos, Michael O'Boyle and Olivier Temam. MiDataSets: Creating The Conditions For A More Realistic Evaluation of Iterative Optimization. HiPEAC’07 • F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M.F.P. O'Boyle, J. Thomson, M. Toussaint and C.K.I. Williams. Using Machine Learning to Focus Iterative Optimization. CGO’06 •Grigori Fursin, Albert Cohen, Michael O'Boyle and Oliver Temam. A Practical Method For Quickly Evaluating Program Optimizations. HiPEAC’05 •Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for Determining a Lower Bound on Execution Time. Concurrency Practice and Experience, 16(2-3), pages 271-292, 2004 • Grigori Fursin. Iterative Compilation and Performance Prediction for Numerical Applications. Ph.D. thesis, University of Edinburgh, Edinburgh, UK, January 2004