3. Tiramisu: A Code Optimization Framework for
High Performance Systems
https://www.csail.mit.edu/research/tiramisu-framework-code-optimizat
ion-and-code-generation
MIT CSAIL
5. Tiramisuは、Halide & ISLを使っている
・Halide
https://github.com/halide/Halide
・ISL (Integer Set Library)
http://isl.gforge.inria.fr/
Facebook Research : Tensor Comprehensions
https://github.com/facebookresearch/TensorComprehensions
Tensor Comprehensions (TC) is a fully-functional C++ library to automatically
synthesize high-performance machine learning kernels
using Halide, ISL and NVRTC or LLVM.
8. 4). representation
The challenge of representation is
addressed by using a unified framework based on
polyhedral sets to represent the four layers.
「polyhedral sets」
よくわからないので、
誰か?教えてください
11. ・Layer I : Abstract Algorithm
・Layer II : Computation Management
・Layer III : Data Management
・Layer IV : Communication Managenent
・Code generation: Abstract Syntax Tree
https://arxiv.org/pdf/1804.10694.pdf
12. The first layer defines abstract computations,
which are not yet scheduled or mapped to memory.
Each computation represents an expression to compute.
https://arxiv.org/pdf/1804.10694.pdf
Layer I : Abstract Algorithm
13. {b1(i, j, c) : 0 ≤ i < N ∧ 0 ≤ j < M ∧ 0 ≤ c < 3}
The iteration domain is the set of tuples b1(i, j, c) such that
0 ≤ i < N ∧ 0 ≤ j < M ∧ 0 ≤ c < 3
https://arxiv.org/pdf/1804.10694.pdf
Iteration domain
15. Affine transformations including loop tiling, skewing, loop fusion, distribution,
splitting, reordering, and many others can be expressed as an affine map that maps
computations from Layer I into the time-space domain in Layer II.
We call this map a time-space map.
Layer I の iteration domain を time-space domain に変換
https://arxiv.org/pdf/1804.10694.pdf
Time-space Maps
16. Layer I:iteration domain
{C(i, j) : 0 ≤ i < N ∧ 0 ≤ j < N } : A(i, j) + B(i, j)
Time-space mapping として、(16 x 16 tiles) を!
{C(i, j) →C(i1, j1, i2, j2) : i1 = f loor (i/16) ∧ i2 = i%16∧
j1 = f loor (j/16) ∧ j2 = j%16 ∧ 0 ≤ i < N ∧ 0 ≤ j < N }
Layer II:time-space domain
{C(i1, j1, i2, j2) : i1 = f loor (i/16) ∧ i2 = i%16 ∧ j1 = f loor (j/16)∧j2 = j%16 ∧ 0 ≤
i < N ∧ 0 ≤ j < N } :
A(i1 ∗ 16 + i2, j1 ∗ 16 + j2) + B(i1 ∗ 16 + i2, j1 ∗ 16 + j2)
https://arxiv.org/pdf/1804.10694.pdf
サンプル:Time-space Maps
17. Time dimensions : => When
実行の順番(他の computation に対して) を指定する
Space dimensions : => Where
各 computation を実行するプロセッサ を指定する
Time-space domain (Time-space Maps)
https://arxiv.org/pdf/1804.10694.pdf
Layer II: Computation Management
19. Data Management では、計算結果を蓄えておくメモリの場所を指定する
allocation/deallocation statements
a set of access relations, which map a computation from Layer
II to array elements read or written by that computation.
https://arxiv.org/pdf/1804.10694.pdf
Layer III: Data Management
28. void generate_function_1(std::string name, int size, int val0, int val1 )
{
tiramisu::global::set_default_tiramisu_options();
tiramisu::function function0(name);
tiramisu::constant N("N", tiramisu::expr((int32_t) size), p_int32, true,
NULL, 0, &function0 );
テストコード (tests/test_01.cpp)
https://github.com/rbaghdadi/tiramisu/blob/master/tests/test_01.cpp#L16
29. A class to represent functions in Tiramisu.
A function in Tiramisu is composed of a set of computations (tiramisu::computation).
例:
std::string name(“sample”);
tiramisu::function function0(name);
function クラス
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L97
computionsの集合!
30. void generate_function_1(std::string name, int size, int val0, int val1 )
{
tiramisu::global::set_default_tiramisu_options();
tiramisu::function function0(name);
tiramisu::constant N("N", tiramisu::expr((int32_t) size), p_int32, true,
NULL, 0, &function0 );
テストコード (tests/test_01.cpp)
https://github.com/rbaghdadi/tiramisu/blob/master/tests/test_01.cpp#L16
31. A class that represents loop invariants.
An object of the invariant class can be an expression,
a symbolic constant
or a variable that is invariant to all the loops of the function.
例:
tiramisu::constant N("N", tiramisu::expr((int32_t) size),
p_int32, true, NULL, 0, &function0);
constant クラス
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L3667
33. A class that represents constant variable references
例:
tiramisu::var i("i"), j("j"), i0("i0"), j0("j0"), i1("i1"), j1("j1")
var クラス
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/expr.h#L1641
37. A class that represents computations.
A computation is an expression associated with an iteration domain.
A computation indicates what needs to be computed
(the expression that should be computed).
A computation has three representations:
Level I
Level II
Level III
(最新の論文では、Layer I/II/III/IV と表現している。
Layer IVは、Communication Managenent)
computation クラス
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L1225
40. A class that represents buffers.
Buffers have two use cases:
- used to store the results of computations, and
- used to represent input arguments to functions.
例: 入力バッファ
tiramisu::buffer input_buffer("input_buffer", {size},
tiramisu::p_uint8, a_input, &function0);
結果用のバッファ
tiramisu::buffer result_scalar("result_scalar", {1},
tiramisu::p_uint8, a_output, &function0);
buffer クラス
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L957
42. void set_access(std::string access_str);
void set_access(isl_map *access);
Set the access relation of the computation.
The access relation is a relation from computations to buffer
locations. access_str is a string that represents the relation.
It is encoded in the ISL format,
(http://isl.gforge.inria.fr/user.html#Sets-and-Relations)
例、
S0.set_access("{S0[i,j]->buf0[i,j]}");
computation::set_access メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L3130
44. void tile(tiramisu::var L0, tiramisu::var L1, int sizeX, int sizeY,
tiramisu::var L0_outer, tiramisu::var L1_outer,
tiramisu::var L0_inner, tiramisu::var L1_inner );
Tile the two loop levels L0 and L1 with rectangular tiling.
sizeX and sizeY represent the tile size.
L0 and L1 should be two consecutive loop levels.
L0_outer, L1_outer, L0_inner, L1_inner are the names
of the new dimensions created after tiling.
例、
S0.tile(i, j, 2, 2, i0, j0, i1, j1);
computation::tile メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L3424
46. void tag_parallel_level(tiramisu::var L);
void tag_parallel_level(int L);
Tag the loop level L to be parallelized.
例、
S0.tag_parallel_level(i0);
computation::tag_parallel_level メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L3424
48. void set_arguments(const std::vector<tiramisu::buffer *> &buffer_vec );
Set the arguments of the function.
The arguments of the function are provided as a vector of
pointers to buffers. Each buffer represents an argument
to the function.
During code generation, the arguments in the vector will
become the arguments of the generated function
(with the order of their appearance in the vector).
function::set_arguments メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L918
50. void gen_time_space_domain();
Generate the time-space domain of the function.
In this representation, the logical time of execution
and the processor where the computation
will be executed are both specified.
function::gen_time_space_domain メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L910
52. void gen_isl_ast();
Generate an isl AST that represents the function.
function::gen_isl_ast メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L905
54. void gen_halide_stmt();
Generate a Halide stmt that represents the function.
function::gen_halide_stmt メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L897
56. void gen_halide_obj(const std::string &obj_file_name,
Halide::Target::OS os,
Halide::Target::Arch arch, int bits ) const;
Generate an object file that contains the compiled function.
This function relies on Halide to generate the object file.
obj_file_name : the name of the generated file.
os : the target operating system (Halide::Target::OS).
arch : the architecture of the target (the instruction set).
bits : the bit-width of the target machine.
(must be 0 for unknown, or 32 or 64 )
function::gen_halide_obj メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L897
60. // C++ code with a Tiramisu expression.
#include "tiramisu.h"
void foo(int N, int array_a[N], int array_b[N], int array_c[N])
{
tiramisu::init();
// Declare an iterator and inputs
tiramisu::iter i, j;
tiramisu::in A(i,j), B(i,j);
Tiramisu expressions (README.md)
https://github.com/Tiramisu-Compiler/tiramisu/blob/master/README.md#example
61. // Declare the Tiramisu expression (algorithm)
tiramisu::comp C(i,j) = A(i,j) + B(i,j);
// Specify optimizations
C.parallelize(i).vectorize(j, 4);
// Realize, compile and run the expression
C.realize(tiramisu::int32_t, {N});
C.compile({(A, array_a), (B, array_b), (C, array_c)});
C.run();
}
Tiramisu expressions (README.md)
https://github.com/Tiramisu-Compiler/tiramisu/blob/master/README.md#example
67. virtual void add_definitions(std::string iteration_domain_str, tiramisu::expr e,
bool schedule_this_computation, tiramisu::primitive_t t,
tiramisu::function *fct );
Add definitions of computations that have the same name as this
computation.
The arguments of this function are identical to the arguments of
the computation constructor.
In general, this function is used to express reductions
and to express computation updates.
function::add_definitions メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L2541
69. tiramisu::computation& get_update(int index);
Returns the index update that has been added to this computation such that:
- If index == 0, then this computation is returned.
- If > 0, then it returns the pth computation added
through add_definitions.
function::get_update メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L3065
73. void after(computation &comp, tiramisu::var iterator);
Schedule this computation to run after the computation comp.
This computation is placed after comp in the loop level level.
level is a loop level in this computation.
computation::after メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L2598
74. 例、
{S0[i,j]: 0<=i<N and 0<=j<N} and {S1[i,j]: 0<=i<N and 0<=j<N}
S1.after(S0, i)
for (i=0; i<N; i++)
{
for (j=0; j<N; j++)
S0;
for (j=0; j<N; j++)
S1;
}
computation::after メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L2598
75. 例、
{S0[i,j]: 0<=i<N and 0<=j<N} and {S1[i,j]: 0<=i<N and 0<=j<N}
S1.after(S0, j)
for (i=0; i<N; i++)
for (j=0; j<N; j++)
{
S0;
S1;
}
computation::after メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L2598
76. 例、
{S0[i,j]: 0<=i<N and 0<=j<N} and {S1[i,j]: 0<=i<N and 0<=j<N}
S1.after(S0, computation::root)
for (i=0; i<N; i++)
for (j=0; j<N; j++)
S0;
for (i=0; i<N; i++)
for (j=0; j<N; j++)
S1;
computation::after メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L2598
77. 例、
{S0[i,j]: 0<=i<N and 0<=j<N}, {S1[i,j]: 0<=i<N and 0<=j<N}
and {S2[i,j]: 0<=i<N and 0<=j<N}.
for (i=0; i<N; i++)
for (j=0; j<N; j++)
S0;
for (i=0; i<N; i++)
for (j=0; j<N; j++)
S1;
for (i=0; i<N; i++)
for (j=0; j<N; j++)
S2;
computation::fuse_after メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L2939
79. 例、
S2.fuse_after(i, S1);
S1.fuse_after(i, S0);
for (i=0; i<N; i++)
{
for (j=0; j<N; j++)
S0;
for (j=0; j<N; j++)
S1;
for (j=0; j<N; j++)
S2;
}
computation::fuse_after メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L2939
80. void before(computation &consumer, tiramisu::var L);
Schedule this computation to run
before the computation consumer at the loop level L
computation::before メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L2598
81. void between(computation &before_comp, tiramisu::var before_l,
computation &after_comp, tiramisu::var after_l );
Schedule this computation to run
after before_comp at the loop level before_l,
and before after_comp at loop level after_l.
The outermost loop level is 0.
computation::between メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L2598
82. void bind_to(buffer *buff);
Bind this computation to a buffer. i.e., create a one-to-one data
mapping between the computation and the buffer.
In Tiramisu, a tiramisu computation cannot directly consume
values from buffers.
Buffers should first be wrapped in computations.
computation::bind_to メソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h#L2840
84. void compute_at(computation &consumer, tiramisu::var L );
void compute_at(computation &consumer, int L );
void interchange(tiramisu::var L0, tiramisu::var L1 );
void set_inline(bool is_inline = true );
computation クラス のいろいろなメソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h
85. void shift(tiramisu::var L0, int n );
void split(tiramisu::var L0, int sizeX );
void split(tiramisu::var L0, int sizeX, tiramisu::var L0_outer, tiramisu::var L0_inner );
void tile(int L0, int L1, int sizeX, int sizeY );
void tile(int L0, int L1, int L2, int sizeX, int sizeY, int sizeZ );
void unroll(tiramisu::var L, int fac );
void unroll(tiramisu::var L, int fac, tiramisu::var L_outer, tiramisu::var L_inner );
void vectorize(tiramisu::var L, int v );
void vectorize(tiramisu::var L, int v, tiramisu::var L_outer, tiramisu::var L_inner );
computation クラス のいろいろなメソッド
https://github.com/rbaghdadi/tiramisu/blob/master/include/tiramisu/core.h