SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
PL/CUDA
~Fusion of HPC Grade Power with In-Database Analytics~
The PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>
The PG-Strom Project
about myself...
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics2
▌KaiGai Kohei
 tw: @kkaigai
 https://github.com/kaigai
▌PostgreSQL
 SELinux, FDW, CustomScan, ...
▌PG-Strom
 GPU acceleration for PostgreSQL
▌Works
 NEC OSS Promotion Center
 Development of the software and
its business opportunity
The PG-Strom Project
PG-Strom Overview (1/2) – Architecture
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics3
Application
Storage
Query
Optimizer
Query
Executor
PG-Strom
Extension
SQL Parser
Storage Manager
GPU
No Storage
Changes
No Query
Changes
 Features
• automatic GPU code generation
from the supplied SQL
• asynchronous & massive
parallel execution on GPUs
• WHERE-clause, JOIN, GROUP
BY, and projection are
supported
 Advantages
• transparent acceleration by the
power of thousands cores
• Low cost solution for analytic
processing
The PG-Strom Project
Characteristics of GPU (Graphic Processor Unit)
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics4
GPU CPU
Model
NVIDIA
Tesla P100
Intel Xeon
E5-2699v4
Architecture Pascal Broadwell
Launch Q2-2016 Q1-2016
# of transistors 15billion 7.2billion
# of cores
3584
(simple)
22
(functional)
core clock
1.126GHz
~1.303GHz
2.20GHz
~3.60GHz
Perk FFLOPS
(FP32)
9.3 TFLOPS
1.2 TFLOPS
(with AVX2)
DRAM Size 16GB (HBM2) max 1.5TB (DDR4)
Memory Band 732GB/s 76.8GB/s
Power
Consumption
250W 145W
The PG-Strom Project
PG-Strom Overview (2/2) – GPU binary generation on the fly
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics
QUERY: SELECT cat, count(*), avg(x) FROM t0
WHERE x between y and y + 20.0 GROUP BY cat;
:
STATIC_FUNCTION(bool)
gpupreagg_qual_eval(kern_context *kcxt,
kern_data_store *kds,
size_t kds_index)
{
pg_float8_t KPARAM_1 = pg_float8_param(kcxt,1);
pg_float8_t KVAR_3 = pg_float8_vref(kds,kcxt,2,kds_index);
pg_float8_t KVAR_4 = pg_float8_vref(kds,kcxt,3,kds_index);
return EVAL((pgfn_float8ge(kcxt, KVAR_3, KVAR_4) &&
pgfn_float8le(kcxt, KVAR_3,
pgfn_float8pl(kcxt, KVAR_4, KPARAM_1))));
} :
E.g) Transform of arithmetic operations
in the WHERE-clause to CUDA programs
Reference to input data
SQL expression in CUDA source code
Run-time
Compiler
(nvrtc)
Just-in-time
Compile
Parallel
Execution
5
The PG-Strom Project
GPU accelerates SQL performance
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics6
▌Test Query:
SELECT cat, count(*), avg(x)
FROM t0 NATURAL JOIN t1 [NATURAL JOIN t2 ...]
GROUP BY cat;
 t0 contains 100M rows, t1...t8 contains 100K rows (like a start schema)
40.44
62.79
79.82
100.50
119.51
144.55
201.12
248.95
9.96 9.93 9.96 9.99 10.02 10.03 9.98 10.00
0
50
100
150
200
250
300
2 3 4 5 6 7 8 9
QueryResponseTime[sec]
Number of joined tables
PG-Strom microbenchmark with JOIN/GROUP BY
PostgreSQL v9.5 PG-Strom v1.0
CPU: Xeon E5-2670v3
GPU: GTX1080
RAM: 384GB
OS: CentOS 7.2
DB: PostgreSQL 9.5 +
PG-Strom v1.0
The PG-Strom Project
Feedbacks from users during v1.0 development
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics7
Application
Storage
Query
Optimizer
Query
Executor
PG-Strom
Extension
SQL Parser
Storage Manager
GPU
Heavy computing
intensive workloads
• In-database Analytics
• Scientific R&D, Marketing, ...
 by PL/CUDA + Matrix-Array
Heavy I/O
intensive workloads
• Generic large OLAP
• ETL, Reporting, ...
 by SSD-to-GPU P2P DMA
The PG-Strom ProjectPGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics8
Introduction of PL/CUDA
The PG-Strom Project
Our Failure (1/3) – Write an algorithm logic in SQL
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics9
Apr-2016
The PG-Strom Project
Our Failure (2/3) – Performance benefit (?)
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics10
Apr-2016
The PG-Strom Project
Our Failure (3/3) – Problems
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics11
▌Problem.1 – Who writes algorithms in SQL?
 Majority of mathematical algorithms are developed based on
the manner of procedural programming language.
 Although users don’t need to write algorithm logics in CUDA,
they also have to write up the algorithm logic using SQL puzzle.
▌Problem.2 – Performance Benefit
 Yeah, PG-Strom is much faster than PostgreSQL to execute the core
logic of min-max method. It is an excellent result but nobody knows
how many people run the algorithm on PostgreSQL.
 Performance of GpuProjection is almost equivalent to the CPU version
of implementation which is designed to a particular problem. Why?
 Inefficient code due to SQL compatibility
 Inefficient data format due to PostgreSQL’s row-format
The PG-Strom Project
Our Answer – PL/CUDA + Matrix-like Array
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics12
CREATE FUNCTION my_logic(matrix, matrix)
RETURNS vector
AS $$
$$ LANGUAGE ‘plcuda’;
User define CUDA code block
Storage
GPU Kernel
User defined
CUDA code block
Post SQL Process
 Tables JOIN
 Window
Function
 ORDER BY
 GROUP BY
 etc....
Load function’s
arguments
Write-back
result set
PL/CUDA
method for manual optimization
ArrayType header a1 a2 aN… b1 b2 bN… c1 c2 cN… d1 d2 dN…
𝑎1 ⋯ 𝑑1
⋮ ⋱ ⋮
𝑎 𝑁 ⋯ 𝑑 𝑁
Matrix of
4cols x Nrows
Matrix-like Array
2D Array without NULL, to represent a matrix
The PG-Strom Project
Example of PL/CUDA function definition
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics13
CREATE OR REPLACE FUNCTION
knn_gpu_similarity(int, int[], int[])
RETURNS float4[]
AS $$
#plcuda_begin
cl_int k = arg1.value;
MatrixType *Q = (MatrixType *) arg2.value;
MatrixType *D = (MatrixType *) arg3.value;
MatrixType *R = (MatrixType *) results;
:
nloops = (ARRAY_MATRIX_HEIGHT(Q) + (part_sz - k - 1)) / (part_sz - k);
for (loop=0; loop < nloops; loop++) {
/* 1. calculation of the similarity */
for (i = get_local_id(); i < part_sz * part_nums; i += get_local_size()) {
j = i % part_sz; /* index within partition */
/* index of database matrix (D) */
dindex = part_nums * get_global_index() + (i / part_sz);
/* index of query matrix (Q) */
qindex = loop * (part_sz - k) + (j - k);
values[i] = knn_similarity_compute(D, dindex, Q, qindex);
}
}
:
#plcuda_end
$$ LANGUAGE 'plcuda';
CUDA code block
The PG-Strom Project
How GPU Kernels are built
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics14
CREATE OR REPLACE FUNCTION
my_cuda_func(float[])
RETURNS int[]
$$
#plcuda_sanity_check func_sc
#plcuda_begin
#plcuda_end
#plcuda_working_bufsz func_wb
$$ LANGUAGE ‘plcuda’;
User define CUDA code block
bool func_sc(float[])
helper function for sanity check
bigint func_wb(float[])
helper function for buffer size estimation
GPU
Binary
GPU
Kernel
Working
Buffer
Input
arguments
Run-time
compiler
input/output
of SQL data
User define
CUDA code block
Common Library
Routines
source program
of GPU code
Load the
arguments
The PG-Strom Project
Why automatic generated code was not best for performance
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics15
 NULL checks for each variable references
 Overflow checks for each primitive operators
 Function call instead of primitive operators
STATIC_FUNCTION(pg_float4_t)
pgfn_float4mul(kern_context *kcxt, pg_float4_t arg1, pg_float4_t arg2)
{
pg_float4_t result;
result.isnull = arg1.isnull | arg2.isnull;
if (!result.isnull)
{
result.value = arg1.value * arg2.value;
CHECKFLOATVAL(&kcxt->e, result,
isinf(arg1.value) || isinf(arg2.value),
arg1.value == 0.0 || arg2.value == 0.0);
}
return result;
}
select x*y from c_test;
The PG-Strom Project
Disadvantage because of data format (1/2)
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics16
▌Row-oriented data
× includes unreferenced values
× many steps for data references
〇 common data format with PostgreSQL
▌Column-oriented data
〇 load only referenced variables
〇 data reference by 1 step
× needs data format exchange
GPU
core
GPU
core
GPU
core
a c d feb
a c d feb
a d feb
a c d feb
GPU
core
eb
GPU
core
GPU
core
GPU
core
GPU
core
b
b
b
b
b
GPU
core
GPU
core
e
e
e
e
e
Usual SQL load cannot justify the cost for data transformation,
Advanced algorithm processing is improved by columnar format.
The PG-Strom Project
Disadvantage because of data format (2/2)
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics17
▌Case of random memory access
 Increase of memory transaction, but less usage rate of the data-bus
▌Case of coalesced memory access
 Least number of memory transaction, and maximum usage rate of the data-bus
32bit
Memory transaction width: 256bit
32bit 32bit32bit 32bit 32bit
32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit
Memory transaction width: 256bit
32bit x 8 = 256bit is valid data
in 256bit width memory transaction
(Bus usage ratio: 100.0%)
Only 32bit x 1 = 32bit is valid data
in 256bit width memory transaction
(Bus usage ratio: 12.5%)
GPU cores
GPU cores
The PG-Strom Project
2D-Array as Matrix
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics18
▌datatype[] array_matrix(variadic datatype[])
 An aggregate function to construct a 2D-array from the input stream.
 datatype is either of int2, int4, int8, float4 or float8.
 The 2D-array does not contain NULL.
▌SETOF record matrix_unnest(datatype[])
 Deform a 2D-array into stream of multiple records
▌Downside
 Unable to handle variable-length data type
 1GB limit of varlena data type in PostgreSQL
ArrayType header a1 a2 aN… b1 b2 bN… c1 c2 cN… d1 d2 dN…
𝑎1 ⋯ 𝑑1
⋮ ⋱ ⋮
𝑎 𝑁 ⋯ 𝑑 𝑁
Matrix of
4cols x Nrows
Matrix-like Array
2D Array without NULL, to represent a matrix
The PG-Strom Project
Example to call PL/CUDA function
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics19
SELECT row_number() OVER (),
float4_as_int4(R.key_id) key_id,
R.score
FROM matrix_unnest(
(SELECT my_plcuda_function(A.matrix,
B.matrix)
FROM (SELECT cbind(array_matrix(id),
array_matrix(x, y, z)) matrix
FROM normal_table
WHERE tag LIKE ‘%abc%’) A,
(SELECT matrix
FROM matrix_table) B
)
) AS R(key_id real, score real)
ORDER BY score DESC
LIMIT 1000;
Invocation of PL/CUDA
function with two
Matrix arguments
Construct, or load
pre-built array-matrix
Deform array-matrix
to generic records
Post-process by SQL
(JOIN, window-function)
The PG-Strom ProjectPGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics20
Case Study
similarity search on drug discovery
The PG-Strom Project
Background – relationship of disease and chemical compounds
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics21
target disease
relevant protein
chemical compounds
(= candidate of drugs)
Discovery of chemical compounds which are “active” to the target protein
inactive active
active
but toxicity
academic papers
The PG-Strom Project
k-NN Similarity Search on Chemical Compounds (1/2)
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics22
Database chemical compounds set
(D; 10M records scale)
Query chemical
compounds set
(Q; ~1000 records scale)
Search by
Similarity
Target Protein
“similar compounds” will
have higher probability of active
Picks up active
chemical compounds
to the target protein
from academic papers
The PG-Strom Project
k-NN Similarity Search on Chemical Compounds (2/2)
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics23
Similarity
is
definition of distance
ID NAME Fingerprint (1024bit)
1 CHEMBL153534 000000000001000000100000000000000100000000000001000000...
2 CHEMBL405398 000000000000000100100000000000000000000000000000100000...
3 CHEMBL503634 000001000000000000000000001000000100000000000000000000...
: : :
Data structure of the chemical compounds
Similarity by Jaccard index:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝐴, 𝐵 = 𝐴 𝐹𝑃 ∩ 𝐵 𝐹𝑃 𝐴 𝐹𝑃 ∪ 𝐵 𝐹𝑃
The PG-Strom Project
Scale of the computing
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics24
Database chemical compounds set
(D; 10M records scale)
Q: Query chemical
compounds set
average of
the top-3
𝑑𝑖 of D-compounds
Distance of Q-set
and 𝑑𝑖 compounds
𝑑𝑗 of D-compounds
average of
the top-3
Distance of Q-set
and 𝑑𝑗 compounds
Order of calculation:
𝑂 𝑄 × 𝐷 + 𝑂 𝐷 × 𝑄𝑙𝑜𝑔𝑄
(make distance) (sorting+average)
The PG-Strom Project
Implementation of PL/CUDA function (1/3)
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics25
Step-1
Split all the logical combination of Q and D
into multiple partitions, then assign them
on SMM; execution unit of GPU.
Step-2
Each GPU core calculates a similarity score
between a Q-compound and a D-compound,
then store the score on “shared memory”;
which is fast and close to SMM.
The PG-Strom Project
Implementation of PL/CUDA function (2/3)
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics26
Step-3
Bitonic-sorting by the similarity score, then
reorder the Q-compounds Step-5
Make an average by the
top-k values, then store it
on the result buffer.
Step-4
Repeat from Step-2, if # of Q-compounds is larger
than shared memory size. Top-k items are kept.
The PG-Strom Project
Implementation of PL/CUDA function (3/3)
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics27
CREATE OR REPLACE FUNCTION
knn_gpu_similarity(int, -- k-value
int[], -- ID+bitmap of Q
int[]) -- ID+bitmap of D
RETURNS float4[] -- result: ID+similarity
AS $$
#plcuda_decl
:
#plcuda_begin
#plcuda_kernel_blocksz ¥
knn_gpu_similarity_main_block_size
#plcuda_num_threads ¥
knn_gpu_similarity_main_num_threads
#plcuda_shmem_blocksz 8192
cl_int k = arg1.value;
MatrixType *Q = (MatrixType *) arg2.value;
MatrixType *D = (MatrixType *) arg3.value;
MatrixType *R = (MatrixType *) results;
:
for (loop=0; loop < nloops; loop++)
{
/* 1. calculation of the similarity */
for (i = get_local_id();
i < part_sz * part_nums;
i += get_local_size()) {
j = i % part_sz; /* index within partition */
dindex = part_nums * get_global_index()
+ (i / part_sz);
qindex = loop * (part_sz - k) + (j - k);
if (dindex < ARRAY_MATRIX_HEIGHT(D) &&
qindex < ARRAY_MATRIX_HEIGHT(Q)) {
values[i] = knn_similarity_compute(D, dindex,
Q, qindex);
}
}
__syncthreads();
/* 2. sorting by the similarity for each partition */
knn_similarity_sorting(values, part_sz, part_nums);
__syncthreads();
:
}
#plcuda_end
#plcuda_sanity_check knn_gpu_similarity_sanity_check
#plcuda_working_bufsz 0
#plcuda_results_bufsz knn_gpu_similarity_results_bufsz
$$ LANGUAGE 'plcuda';
real[] -- ID+Similarity of D-compounds (2xN)
knn_gpu_similarity(int k, -- k-value
int[] Q, -- ID+Fingerprint of Q-compounds (33xM)
int[] D); -- ID+Fingerprint of D-compounds (33xN)
The PG-Strom Project
Invocation of PL/CUDA function
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics28
PREPARE knn_sim_rand_10m_gpu_v2(int) -- arg1:@k-value
AS
SELECT row_number() OVER (),
fp.name,
similarity
FROM (SELECT float4_as_int4(key_id) key_id, similarity
FROM matrix_unnest(
(SELECT rbind( knn_gpu_similarity($1,Q.matrix,
D.matrix))
FROM (SELECT cbind(array_matrix(id),
array_matrix(bitmap)) matrix
FROM finger_print_query) Q,
(SELECT matrix
FROM finger_print_10m_matrix) D
)
) AS sim(key_id real, similarity real)
ORDER BY similarity DESC) sim,
finger_print_10m fp
WHERE fp.id = sim.key_id
LIMIT 1000;
Post-process by SQL; like lookup of compounds
name by compounds-id (tables JOIN), making
rank of similarity by window function.
Execution of PL/CUDA function
with Q-/D-matrix as argument
Transform the records read from tables
to Array-Matrix type.
(Pre-build is also possible)
Transform the Array-Matrix (3xN),
return value of PL/CUDA function,
into usual record data x Nrows.
The PG-Strom Project
Performance results
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics29
 For comparison to CPU cases, we implemented an equivalent SQL function by C.
 # of D-compounds is 10M records, # of Q-compounds is 10, 50, 100, 500 and 1000.
 Up to 10B combination search; almost equivalent size for real drug discovery research.
 HW) CPU: Xeon E5-2670v3, GPU: GTX980 / GTX1080, RAM:384GB
 SW) CentOS7, CUDA8.0, PostgreSQL v9.5 + PG-Strom v1.0
30.25
145.29
295.95
1503.31
3034.94
12.97 13.46 13.90 18.16 24.6513.00 13.23 13.59 16.01 19.13
0
500
1000
1500
2000
2500
3000
3500
10 50 100 500 1000
QueryResponseTime[sec]
Number of Query Compounds [Q]
Similarity search of chemical compounds by k-NN method (k=3, D=10M)
CPU(E5-2670v3) GTX980 GTX1080
x150 times
faster!
The PG-Strom ProjectPGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics30
Another Usage
k-means clustering in database system
The PG-Strom Project
Clustering Analysis
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics31
The PG-Strom Project
k-means clustering algorithm
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics32
1. Assign cluster randomly 2. Make centroid for each
cluster
3. Chose the nearest
cluster from the centroid
The PG-Strom Project
k-means clustering algorithm
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics33
1. Assign cluster randomly 5. Make centroid for each
cluster again
6. All the cluster get fully
converged
4. Repeat until convergence
The PG-Strom Project
k-means clustering on PL/CUDA
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics34
CREATE OR REPLACE FUNCTION
gpu_kmeans(real[], -- ID + Data Matrix
int, -- k-value (number of clusters)
int = 10, -- max number of iteration
int = 1) -- seed of initial randomness
RETURNS int[]
AS $$
#plcuda_decl
:
KERNEL_FUNCTION_MAXTHREADS(void)
update_centroid(MatrixType *D,
MatrixType *R,
MatrixType *C)
{
:
/* accumulate the local centroid */
for (did = get_global_id();
did < nitems;
did += get_global_size())
{
/* pick up the target cluster */
cid = r_values[nitems + did];
atomicAdd(&l_cent[cid], 1.0);
for (index=1; index < width; index++)
atomicAdd(&l_cent[index * k_value + cid],
d_values[index * nitems + did]);
}
__syncthreads();
/* write back to the global C-matrix */
for (index = get_local_id();
index < width * k_value;
index += get_local_size())
atomicAdd(&c_values[index], l_cent[index]);
}
:
#plcuda_begin
:
status = pgstromLaunchDynamicKernel4((void *)
setup_initial_cluster,
(kern_arg_t)(D),
(kern_arg_t)(R),
(kern_arg_t)(C),
(kern_arg_t)(r_seed),
nitems, 0, 0);
if (status != cudaSuccess)
PLCUDA_RUNTIME_ERROR_RETURN(status);
for (loop=0; loop < nloops; loop++)
{
:
status = pgstromLaunchDynamicKernelMaxThreads3(
(void *)kmeans_update_cluster,
(kern_arg_t)(D),
(kern_arg_t)(R),
(kern_arg_t)(C),
(kern_arg_t)k_value,
nitems, 0,
sizeof(cl_int) + sizeof(cl_float));
if (status != cudaSuccess)
PLCUDA_RUNTIME_ERROR_RETURN(status);
:
}
#plcuda_sanity_check gpu_kmeans_sanity_check
#plcuda_working_bufsz gpu_kmeans_working_bufsz
#plcuda_results_bufsz gpu_kmeans_results_bufsz
#plcuda_end
$$ LANGUAGE 'plcuda';
The PG-Strom Project
Test data for k-means clustering
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics35
▌Dataset overview
 A collection of datasets of vehicle traffic,
observed between two points for a set duration
of time over a period of 6 months
 449 observation points, at Aarhus, Denmark.
 13.5M records from Feb to June of 2014
▌Data contains
 average speed
 average measured time
 number of vehicles
 latitude/longitude of the observation points
 etc...
▌What we did
 categorize the zone of road into 5-classes
according to the characteristics of vehicle’s
running style.
The PG-Strom Project
Invocation of GPU k-means function
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics36
SELECT report_id, k, c
FROM (SELECT report_id, k, c,
row_number() OVER (PARTITION BY report_id
ORDER BY c DESC) rank
FROM (SELECT report_id, k, count(*) c
FROM matrix_unnest(
(SELECT gpu_kmeans ( array_matrix(
int4_as_float4(report_id),
avg_measured_time,
avg_speed,
vehicle_count),
5)
FROM tr_rawdata)
) R(report_id int, k int)
GROUP BY report_id, k
) __summary_1
) __summary_2
WHERE rank = 1;
Make a matrix from the raw-data
Run k-means clustering logic
Pick-up most frequent cluster
The PG-Strom Project
GPU k-means (1/3) – clustering by all the data
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics37
$ wget -O map.png "`psql traffic -At -f ~/traffic.sql`"
bypass highway?
Road towards
downtown?
Beltway?
The PG-Strom Project
GPU k-means (2/3) – Daytime and Nighttime
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics38
daytime (8-17) nighttime (18-7)
The PG-Strom Project
GPU k-means (3/3) – Weekdays and Weekend
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics39
weekdays weekend
The PG-Strom Project
Invocation of GPU k-means function
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics40
SELECT report_id, k, c
FROM (SELECT report_id, k, c,
row_number() OVER (PARTITION BY report_id
ORDER BY c DESC) rank
FROM (SELECT report_id, k, count(*) c
FROM matrix_unnest(
(SELECT gpu_kmeans ( array_matrix(
int4_as_float4(report_id),
avg_measured_time,
avg_speed,
vehicle_count),
5)
FROM tr_rawdata
WHERE extract('hour' from timestamp)
between 7 and 17
)
) R(report_id int, k int)
GROUP BY report_id, k
) __summary_1
) __summary_2
WHERE rank = 1;
Just add a line to select
different input data set.
Flexibility of SQL!
The PG-Strom ProjectPGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics41
Summary
The PG-Strom Project
Summary
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics42
▌What is PL/CUDA
 Original concept of PG-Strom is automatic optimization.
 PL/CUDA pulls out full capability of GPU instead of manual optimization.
Likely, nobody has written advanced algorithm in SQL :-)
▌Advantages
 TFLOPS grade computing engine for analytics in-database
 No need to export entire dataset for analytics by external applications
 Allows to utilize SQL flexibility for pre-/post-processing of the core
analytics algorithms
▌Future Challenges
 Data size larger than 1GB, because of varlena restriction in PostgreSQL
 Asynchronous execution. CPU parallel
 Time to construct matrix. If likely static, we can construct preliminary.
The PG-Strom Project
Resources
PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics43
▌Repository
https://github.com/pg-strom/devel
▌Today’s Slides
http://www.slideshare.net/kaigai/pgconfsv2016-plcuda
▌Contact
 kaigai@ak.jp.nec.com
 Tw: @kkaigai
Question?

Mais conteúdo relacionado

Mais procurados

GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...Kohei KaiGai
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwJan Holčapek
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStromKohei KaiGai
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Ural-PDC
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDASavith Satheesh
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdwKohei KaiGai
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAprithan
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_ProcessingKohei KaiGai
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - EnglishKohei KaiGai
 
Parallel K means clustering using CUDA
Parallel K means clustering using CUDAParallel K means clustering using CUDA
Parallel K means clustering using CUDAprithan
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)Kohei KaiGai
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?Shinnosuke Furuya
 
20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdwKohei KaiGai
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU deviceKohei KaiGai
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_IndexKohei KaiGai
 
How to Burn Multi-GPUs using CUDA stress test memo
How to Burn Multi-GPUs using CUDA stress test memoHow to Burn Multi-GPUs using CUDA stress test memo
How to Burn Multi-GPUs using CUDA stress test memoNaoto MATSUMOTO
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multiKohei KaiGai
 

Mais procurados (20)

GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdw
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
Debugging CUDA applications
Debugging CUDA applicationsDebugging CUDA applications
Debugging CUDA applications
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
Parallel K means clustering using CUDA
Parallel K means clustering using CUDAParallel K means clustering using CUDA
Parallel K means clustering using CUDA
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU device
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
 
How to Burn Multi-GPUs using CUDA stress test memo
How to Burn Multi-GPUs using CUDA stress test memoHow to Burn Multi-GPUs using CUDA stress test memo
How to Burn Multi-GPUs using CUDA stress test memo
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi
 

Destaque

Schulung: Einführung in das GPU-Computing mit NVIDIA CUDA
Schulung: Einführung in das GPU-Computing mit NVIDIA CUDASchulung: Einführung in das GPU-Computing mit NVIDIA CUDA
Schulung: Einführung in das GPU-Computing mit NVIDIA CUDAJörn Dinkla
 
GPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsGPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsMarcos Gonzalez
 
PL/CUDA - GPU Accelerated In-Database Analytics
PL/CUDA - GPU Accelerated In-Database AnalyticsPL/CUDA - GPU Accelerated In-Database Analytics
PL/CUDA - GPU Accelerated In-Database AnalyticsKohei KaiGai
 
Acceleration of Statistical Detection of Zero-day Malware in the Memory Dump ...
Acceleration of Statistical Detection of Zero-day Malware in the Memory Dump ...Acceleration of Statistical Detection of Zero-day Malware in the Memory Dump ...
Acceleration of Statistical Detection of Zero-day Malware in the Memory Dump ...Igor Korkin
 
Implementazione di un vincolo table su un CSP solver GPU-based
Implementazione di un vincolo table su un CSP solver GPU-basedImplementazione di un vincolo table su un CSP solver GPU-based
Implementazione di un vincolo table su un CSP solver GPU-basedTommaso Campari
 
20170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#820170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#8Kohei KaiGai
 
pgconfasia2016 lt ssd2gpu
pgconfasia2016 lt ssd2gpupgconfasia2016 lt ssd2gpu
pgconfasia2016 lt ssd2gpuKohei KaiGai
 
20170310_InDatabaseAnalytics_#1
20170310_InDatabaseAnalytics_#120170310_InDatabaseAnalytics_#1
20170310_InDatabaseAnalytics_#1Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
ICISA 2010 Conference Presentation
ICISA 2010 Conference PresentationICISA 2010 Conference Presentation
ICISA 2010 Conference PresentationGonçalo Amador
 
SQL+GPU+SSD=∞ (Japanese)
SQL+GPU+SSD=∞ (Japanese)SQL+GPU+SSD=∞ (Japanese)
SQL+GPU+SSD=∞ (Japanese)Kohei KaiGai
 
An Intelligent Storage?
An Intelligent Storage?An Intelligent Storage?
An Intelligent Storage?Kohei KaiGai
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with DataSeth Familian
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 
Actor Model and C++: what, why and how?
Actor Model and C++: what, why and how?Actor Model and C++: what, why and how?
Actor Model and C++: what, why and how?Yauheni Akhotnikau
 
並列クエリを実行するPostgreSQLのアーキテクチャ
並列クエリを実行するPostgreSQLのアーキテクチャ並列クエリを実行するPostgreSQLのアーキテクチャ
並列クエリを実行するPostgreSQLのアーキテクチャKohei KaiGai
 

Destaque (18)

Schulung: Einführung in das GPU-Computing mit NVIDIA CUDA
Schulung: Einführung in das GPU-Computing mit NVIDIA CUDASchulung: Einführung in das GPU-Computing mit NVIDIA CUDA
Schulung: Einführung in das GPU-Computing mit NVIDIA CUDA
 
GPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsGPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel Applications
 
PL/CUDA - GPU Accelerated In-Database Analytics
PL/CUDA - GPU Accelerated In-Database AnalyticsPL/CUDA - GPU Accelerated In-Database Analytics
PL/CUDA - GPU Accelerated In-Database Analytics
 
CUDA-Aware MPI
CUDA-Aware MPICUDA-Aware MPI
CUDA-Aware MPI
 
Acceleration of Statistical Detection of Zero-day Malware in the Memory Dump ...
Acceleration of Statistical Detection of Zero-day Malware in the Memory Dump ...Acceleration of Statistical Detection of Zero-day Malware in the Memory Dump ...
Acceleration of Statistical Detection of Zero-day Malware in the Memory Dump ...
 
Implementazione di un vincolo table su un CSP solver GPU-based
Implementazione di un vincolo table su un CSP solver GPU-basedImplementazione di un vincolo table su un CSP solver GPU-based
Implementazione di un vincolo table su un CSP solver GPU-based
 
20170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#820170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#8
 
pgconfasia2016 lt ssd2gpu
pgconfasia2016 lt ssd2gpupgconfasia2016 lt ssd2gpu
pgconfasia2016 lt ssd2gpu
 
20170310_InDatabaseAnalytics_#1
20170310_InDatabaseAnalytics_#120170310_InDatabaseAnalytics_#1
20170310_InDatabaseAnalytics_#1
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
ICISA 2010 Conference Presentation
ICISA 2010 Conference PresentationICISA 2010 Conference Presentation
ICISA 2010 Conference Presentation
 
SQL+GPU+SSD=∞ (Japanese)
SQL+GPU+SSD=∞ (Japanese)SQL+GPU+SSD=∞ (Japanese)
SQL+GPU+SSD=∞ (Japanese)
 
An Intelligent Storage?
An Intelligent Storage?An Intelligent Storage?
An Intelligent Storage?
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 
Actor Model and C++: what, why and how?
Actor Model and C++: what, why and how?Actor Model and C++: what, why and how?
Actor Model and C++: what, why and how?
 
並列クエリを実行するPostgreSQLのアーキテクチャ
並列クエリを実行するPostgreSQLのアーキテクチャ並列クエリを実行するPostgreSQLのアーキテクチャ
並列クエリを実行するPostgreSQLのアーキテクチャ
 

Semelhante a PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics

20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_ENKohei KaiGai
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfDLow6
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PROIDEA
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to AcceleratorsDilum Bandara
 
Application Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsApplication Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsGanesan Narayanasamy
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCinside-BigData.com
 
BlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow DemoBlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow DemoRodrigo Aramburu
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Kohei KaiGai
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...AMD Developer Central
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent
 

Semelhante a PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics (20)

20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
 
Stress your DUT
Stress your DUTStress your DUT
Stress your DUT
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
Application Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsApplication Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systems
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
BlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow DemoBlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow Demo
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
0507036
05070360507036
0507036
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 

Mais de Kohei KaiGai

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_HistoryKohei KaiGai
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_APIKohei KaiGai
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrowKohei KaiGai
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_countKohei KaiGai
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0Kohei KaiGai
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCacheKohei KaiGai
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGISKohei KaiGai
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGISKohei KaiGai
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_OnlineKohei KaiGai
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdwKohei KaiGai
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_FdwKohei KaiGai
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_TokyoKohei KaiGai
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.JapanKohei KaiGai
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_BetaKohei KaiGai
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStromKohei KaiGai
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGaiKohei KaiGai
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStromKohei KaiGai
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdwKohei KaiGai
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_FdwKohei KaiGai
 
20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LTKohei KaiGai
 

Mais de Kohei KaiGai (20)

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_Online
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.Japan
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStrom
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStrom
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw
 
20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT
 

Último

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 

Último (20)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 

PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics

  • 1. PL/CUDA ~Fusion of HPC Grade Power with In-Database Analytics~ The PG-Strom Project / NEC OSS Promotion Center KaiGai Kohei <kaigai@ak.jp.nec.com>
  • 2. The PG-Strom Project about myself... PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics2 ▌KaiGai Kohei  tw: @kkaigai  https://github.com/kaigai ▌PostgreSQL  SELinux, FDW, CustomScan, ... ▌PG-Strom  GPU acceleration for PostgreSQL ▌Works  NEC OSS Promotion Center  Development of the software and its business opportunity
  • 3. The PG-Strom Project PG-Strom Overview (1/2) – Architecture PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics3 Application Storage Query Optimizer Query Executor PG-Strom Extension SQL Parser Storage Manager GPU No Storage Changes No Query Changes  Features • automatic GPU code generation from the supplied SQL • asynchronous & massive parallel execution on GPUs • WHERE-clause, JOIN, GROUP BY, and projection are supported  Advantages • transparent acceleration by the power of thousands cores • Low cost solution for analytic processing
  • 4. The PG-Strom Project Characteristics of GPU (Graphic Processor Unit) PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics4 GPU CPU Model NVIDIA Tesla P100 Intel Xeon E5-2699v4 Architecture Pascal Broadwell Launch Q2-2016 Q1-2016 # of transistors 15billion 7.2billion # of cores 3584 (simple) 22 (functional) core clock 1.126GHz ~1.303GHz 2.20GHz ~3.60GHz Perk FFLOPS (FP32) 9.3 TFLOPS 1.2 TFLOPS (with AVX2) DRAM Size 16GB (HBM2) max 1.5TB (DDR4) Memory Band 732GB/s 76.8GB/s Power Consumption 250W 145W
  • 5. The PG-Strom Project PG-Strom Overview (2/2) – GPU binary generation on the fly PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics QUERY: SELECT cat, count(*), avg(x) FROM t0 WHERE x between y and y + 20.0 GROUP BY cat; : STATIC_FUNCTION(bool) gpupreagg_qual_eval(kern_context *kcxt, kern_data_store *kds, size_t kds_index) { pg_float8_t KPARAM_1 = pg_float8_param(kcxt,1); pg_float8_t KVAR_3 = pg_float8_vref(kds,kcxt,2,kds_index); pg_float8_t KVAR_4 = pg_float8_vref(kds,kcxt,3,kds_index); return EVAL((pgfn_float8ge(kcxt, KVAR_3, KVAR_4) && pgfn_float8le(kcxt, KVAR_3, pgfn_float8pl(kcxt, KVAR_4, KPARAM_1)))); } : E.g) Transform of arithmetic operations in the WHERE-clause to CUDA programs Reference to input data SQL expression in CUDA source code Run-time Compiler (nvrtc) Just-in-time Compile Parallel Execution 5
  • 6. The PG-Strom Project GPU accelerates SQL performance PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics6 ▌Test Query: SELECT cat, count(*), avg(x) FROM t0 NATURAL JOIN t1 [NATURAL JOIN t2 ...] GROUP BY cat;  t0 contains 100M rows, t1...t8 contains 100K rows (like a start schema) 40.44 62.79 79.82 100.50 119.51 144.55 201.12 248.95 9.96 9.93 9.96 9.99 10.02 10.03 9.98 10.00 0 50 100 150 200 250 300 2 3 4 5 6 7 8 9 QueryResponseTime[sec] Number of joined tables PG-Strom microbenchmark with JOIN/GROUP BY PostgreSQL v9.5 PG-Strom v1.0 CPU: Xeon E5-2670v3 GPU: GTX1080 RAM: 384GB OS: CentOS 7.2 DB: PostgreSQL 9.5 + PG-Strom v1.0
  • 7. The PG-Strom Project Feedbacks from users during v1.0 development PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics7 Application Storage Query Optimizer Query Executor PG-Strom Extension SQL Parser Storage Manager GPU Heavy computing intensive workloads • In-database Analytics • Scientific R&D, Marketing, ...  by PL/CUDA + Matrix-Array Heavy I/O intensive workloads • Generic large OLAP • ETL, Reporting, ...  by SSD-to-GPU P2P DMA
  • 8. The PG-Strom ProjectPGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics8 Introduction of PL/CUDA
  • 9. The PG-Strom Project Our Failure (1/3) – Write an algorithm logic in SQL PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics9 Apr-2016
  • 10. The PG-Strom Project Our Failure (2/3) – Performance benefit (?) PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics10 Apr-2016
  • 11. The PG-Strom Project Our Failure (3/3) – Problems PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics11 ▌Problem.1 – Who writes algorithms in SQL?  Majority of mathematical algorithms are developed based on the manner of procedural programming language.  Although users don’t need to write algorithm logics in CUDA, they also have to write up the algorithm logic using SQL puzzle. ▌Problem.2 – Performance Benefit  Yeah, PG-Strom is much faster than PostgreSQL to execute the core logic of min-max method. It is an excellent result but nobody knows how many people run the algorithm on PostgreSQL.  Performance of GpuProjection is almost equivalent to the CPU version of implementation which is designed to a particular problem. Why?  Inefficient code due to SQL compatibility  Inefficient data format due to PostgreSQL’s row-format
  • 12. The PG-Strom Project Our Answer – PL/CUDA + Matrix-like Array PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics12 CREATE FUNCTION my_logic(matrix, matrix) RETURNS vector AS $$ $$ LANGUAGE ‘plcuda’; User define CUDA code block Storage GPU Kernel User defined CUDA code block Post SQL Process  Tables JOIN  Window Function  ORDER BY  GROUP BY  etc.... Load function’s arguments Write-back result set PL/CUDA method for manual optimization ArrayType header a1 a2 aN… b1 b2 bN… c1 c2 cN… d1 d2 dN… 𝑎1 ⋯ 𝑑1 ⋮ ⋱ ⋮ 𝑎 𝑁 ⋯ 𝑑 𝑁 Matrix of 4cols x Nrows Matrix-like Array 2D Array without NULL, to represent a matrix
  • 13. The PG-Strom Project Example of PL/CUDA function definition PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics13 CREATE OR REPLACE FUNCTION knn_gpu_similarity(int, int[], int[]) RETURNS float4[] AS $$ #plcuda_begin cl_int k = arg1.value; MatrixType *Q = (MatrixType *) arg2.value; MatrixType *D = (MatrixType *) arg3.value; MatrixType *R = (MatrixType *) results; : nloops = (ARRAY_MATRIX_HEIGHT(Q) + (part_sz - k - 1)) / (part_sz - k); for (loop=0; loop < nloops; loop++) { /* 1. calculation of the similarity */ for (i = get_local_id(); i < part_sz * part_nums; i += get_local_size()) { j = i % part_sz; /* index within partition */ /* index of database matrix (D) */ dindex = part_nums * get_global_index() + (i / part_sz); /* index of query matrix (Q) */ qindex = loop * (part_sz - k) + (j - k); values[i] = knn_similarity_compute(D, dindex, Q, qindex); } } : #plcuda_end $$ LANGUAGE 'plcuda'; CUDA code block
  • 14. The PG-Strom Project How GPU Kernels are built PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics14 CREATE OR REPLACE FUNCTION my_cuda_func(float[]) RETURNS int[] $$ #plcuda_sanity_check func_sc #plcuda_begin #plcuda_end #plcuda_working_bufsz func_wb $$ LANGUAGE ‘plcuda’; User define CUDA code block bool func_sc(float[]) helper function for sanity check bigint func_wb(float[]) helper function for buffer size estimation GPU Binary GPU Kernel Working Buffer Input arguments Run-time compiler input/output of SQL data User define CUDA code block Common Library Routines source program of GPU code Load the arguments
  • 15. The PG-Strom Project Why automatic generated code was not best for performance PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics15  NULL checks for each variable references  Overflow checks for each primitive operators  Function call instead of primitive operators STATIC_FUNCTION(pg_float4_t) pgfn_float4mul(kern_context *kcxt, pg_float4_t arg1, pg_float4_t arg2) { pg_float4_t result; result.isnull = arg1.isnull | arg2.isnull; if (!result.isnull) { result.value = arg1.value * arg2.value; CHECKFLOATVAL(&kcxt->e, result, isinf(arg1.value) || isinf(arg2.value), arg1.value == 0.0 || arg2.value == 0.0); } return result; } select x*y from c_test;
  • 16. The PG-Strom Project Disadvantage because of data format (1/2) PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics16 ▌Row-oriented data × includes unreferenced values × many steps for data references 〇 common data format with PostgreSQL ▌Column-oriented data 〇 load only referenced variables 〇 data reference by 1 step × needs data format exchange GPU core GPU core GPU core a c d feb a c d feb a d feb a c d feb GPU core eb GPU core GPU core GPU core GPU core b b b b b GPU core GPU core e e e e e Usual SQL load cannot justify the cost for data transformation, Advanced algorithm processing is improved by columnar format.
  • 17. The PG-Strom Project Disadvantage because of data format (2/2) PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics17 ▌Case of random memory access  Increase of memory transaction, but less usage rate of the data-bus ▌Case of coalesced memory access  Least number of memory transaction, and maximum usage rate of the data-bus 32bit Memory transaction width: 256bit 32bit 32bit32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit Memory transaction width: 256bit 32bit x 8 = 256bit is valid data in 256bit width memory transaction (Bus usage ratio: 100.0%) Only 32bit x 1 = 32bit is valid data in 256bit width memory transaction (Bus usage ratio: 12.5%) GPU cores GPU cores
  • 18. The PG-Strom Project 2D-Array as Matrix PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics18 ▌datatype[] array_matrix(variadic datatype[])  An aggregate function to construct a 2D-array from the input stream.  datatype is either of int2, int4, int8, float4 or float8.  The 2D-array does not contain NULL. ▌SETOF record matrix_unnest(datatype[])  Deform a 2D-array into stream of multiple records ▌Downside  Unable to handle variable-length data type  1GB limit of varlena data type in PostgreSQL ArrayType header a1 a2 aN… b1 b2 bN… c1 c2 cN… d1 d2 dN… 𝑎1 ⋯ 𝑑1 ⋮ ⋱ ⋮ 𝑎 𝑁 ⋯ 𝑑 𝑁 Matrix of 4cols x Nrows Matrix-like Array 2D Array without NULL, to represent a matrix
  • 19. The PG-Strom Project Example to call PL/CUDA function PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics19 SELECT row_number() OVER (), float4_as_int4(R.key_id) key_id, R.score FROM matrix_unnest( (SELECT my_plcuda_function(A.matrix, B.matrix) FROM (SELECT cbind(array_matrix(id), array_matrix(x, y, z)) matrix FROM normal_table WHERE tag LIKE ‘%abc%’) A, (SELECT matrix FROM matrix_table) B ) ) AS R(key_id real, score real) ORDER BY score DESC LIMIT 1000; Invocation of PL/CUDA function with two Matrix arguments Construct, or load pre-built array-matrix Deform array-matrix to generic records Post-process by SQL (JOIN, window-function)
  • 20. The PG-Strom ProjectPGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics20 Case Study similarity search on drug discovery
  • 21. The PG-Strom Project Background – relationship of disease and chemical compounds PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics21 target disease relevant protein chemical compounds (= candidate of drugs) Discovery of chemical compounds which are “active” to the target protein inactive active active but toxicity academic papers
  • 22. The PG-Strom Project k-NN Similarity Search on Chemical Compounds (1/2) PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics22 Database chemical compounds set (D; 10M records scale) Query chemical compounds set (Q; ~1000 records scale) Search by Similarity Target Protein “similar compounds” will have higher probability of active Picks up active chemical compounds to the target protein from academic papers
  • 23. The PG-Strom Project k-NN Similarity Search on Chemical Compounds (2/2) PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics23 Similarity is definition of distance ID NAME Fingerprint (1024bit) 1 CHEMBL153534 000000000001000000100000000000000100000000000001000000... 2 CHEMBL405398 000000000000000100100000000000000000000000000000100000... 3 CHEMBL503634 000001000000000000000000001000000100000000000000000000... : : : Data structure of the chemical compounds Similarity by Jaccard index: 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝐴, 𝐵 = 𝐴 𝐹𝑃 ∩ 𝐵 𝐹𝑃 𝐴 𝐹𝑃 ∪ 𝐵 𝐹𝑃
  • 24. The PG-Strom Project Scale of the computing PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics24 Database chemical compounds set (D; 10M records scale) Q: Query chemical compounds set average of the top-3 𝑑𝑖 of D-compounds Distance of Q-set and 𝑑𝑖 compounds 𝑑𝑗 of D-compounds average of the top-3 Distance of Q-set and 𝑑𝑗 compounds Order of calculation: 𝑂 𝑄 × 𝐷 + 𝑂 𝐷 × 𝑄𝑙𝑜𝑔𝑄 (make distance) (sorting+average)
  • 25. The PG-Strom Project Implementation of PL/CUDA function (1/3) PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics25 Step-1 Split all the logical combination of Q and D into multiple partitions, then assign them on SMM; execution unit of GPU. Step-2 Each GPU core calculates a similarity score between a Q-compound and a D-compound, then store the score on “shared memory”; which is fast and close to SMM.
  • 26. The PG-Strom Project Implementation of PL/CUDA function (2/3) PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics26 Step-3 Bitonic-sorting by the similarity score, then reorder the Q-compounds Step-5 Make an average by the top-k values, then store it on the result buffer. Step-4 Repeat from Step-2, if # of Q-compounds is larger than shared memory size. Top-k items are kept.
  • 27. The PG-Strom Project Implementation of PL/CUDA function (3/3) PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics27 CREATE OR REPLACE FUNCTION knn_gpu_similarity(int, -- k-value int[], -- ID+bitmap of Q int[]) -- ID+bitmap of D RETURNS float4[] -- result: ID+similarity AS $$ #plcuda_decl : #plcuda_begin #plcuda_kernel_blocksz ¥ knn_gpu_similarity_main_block_size #plcuda_num_threads ¥ knn_gpu_similarity_main_num_threads #plcuda_shmem_blocksz 8192 cl_int k = arg1.value; MatrixType *Q = (MatrixType *) arg2.value; MatrixType *D = (MatrixType *) arg3.value; MatrixType *R = (MatrixType *) results; : for (loop=0; loop < nloops; loop++) { /* 1. calculation of the similarity */ for (i = get_local_id(); i < part_sz * part_nums; i += get_local_size()) { j = i % part_sz; /* index within partition */ dindex = part_nums * get_global_index() + (i / part_sz); qindex = loop * (part_sz - k) + (j - k); if (dindex < ARRAY_MATRIX_HEIGHT(D) && qindex < ARRAY_MATRIX_HEIGHT(Q)) { values[i] = knn_similarity_compute(D, dindex, Q, qindex); } } __syncthreads(); /* 2. sorting by the similarity for each partition */ knn_similarity_sorting(values, part_sz, part_nums); __syncthreads(); : } #plcuda_end #plcuda_sanity_check knn_gpu_similarity_sanity_check #plcuda_working_bufsz 0 #plcuda_results_bufsz knn_gpu_similarity_results_bufsz $$ LANGUAGE 'plcuda'; real[] -- ID+Similarity of D-compounds (2xN) knn_gpu_similarity(int k, -- k-value int[] Q, -- ID+Fingerprint of Q-compounds (33xM) int[] D); -- ID+Fingerprint of D-compounds (33xN)
  • 28. The PG-Strom Project Invocation of PL/CUDA function PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics28 PREPARE knn_sim_rand_10m_gpu_v2(int) -- arg1:@k-value AS SELECT row_number() OVER (), fp.name, similarity FROM (SELECT float4_as_int4(key_id) key_id, similarity FROM matrix_unnest( (SELECT rbind( knn_gpu_similarity($1,Q.matrix, D.matrix)) FROM (SELECT cbind(array_matrix(id), array_matrix(bitmap)) matrix FROM finger_print_query) Q, (SELECT matrix FROM finger_print_10m_matrix) D ) ) AS sim(key_id real, similarity real) ORDER BY similarity DESC) sim, finger_print_10m fp WHERE fp.id = sim.key_id LIMIT 1000; Post-process by SQL; like lookup of compounds name by compounds-id (tables JOIN), making rank of similarity by window function. Execution of PL/CUDA function with Q-/D-matrix as argument Transform the records read from tables to Array-Matrix type. (Pre-build is also possible) Transform the Array-Matrix (3xN), return value of PL/CUDA function, into usual record data x Nrows.
  • 29. The PG-Strom Project Performance results PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics29  For comparison to CPU cases, we implemented an equivalent SQL function by C.  # of D-compounds is 10M records, # of Q-compounds is 10, 50, 100, 500 and 1000.  Up to 10B combination search; almost equivalent size for real drug discovery research.  HW) CPU: Xeon E5-2670v3, GPU: GTX980 / GTX1080, RAM:384GB  SW) CentOS7, CUDA8.0, PostgreSQL v9.5 + PG-Strom v1.0 30.25 145.29 295.95 1503.31 3034.94 12.97 13.46 13.90 18.16 24.6513.00 13.23 13.59 16.01 19.13 0 500 1000 1500 2000 2500 3000 3500 10 50 100 500 1000 QueryResponseTime[sec] Number of Query Compounds [Q] Similarity search of chemical compounds by k-NN method (k=3, D=10M) CPU(E5-2670v3) GTX980 GTX1080 x150 times faster!
  • 30. The PG-Strom ProjectPGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics30 Another Usage k-means clustering in database system
  • 31. The PG-Strom Project Clustering Analysis PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics31
  • 32. The PG-Strom Project k-means clustering algorithm PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics32 1. Assign cluster randomly 2. Make centroid for each cluster 3. Chose the nearest cluster from the centroid
  • 33. The PG-Strom Project k-means clustering algorithm PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics33 1. Assign cluster randomly 5. Make centroid for each cluster again 6. All the cluster get fully converged 4. Repeat until convergence
  • 34. The PG-Strom Project k-means clustering on PL/CUDA PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics34 CREATE OR REPLACE FUNCTION gpu_kmeans(real[], -- ID + Data Matrix int, -- k-value (number of clusters) int = 10, -- max number of iteration int = 1) -- seed of initial randomness RETURNS int[] AS $$ #plcuda_decl : KERNEL_FUNCTION_MAXTHREADS(void) update_centroid(MatrixType *D, MatrixType *R, MatrixType *C) { : /* accumulate the local centroid */ for (did = get_global_id(); did < nitems; did += get_global_size()) { /* pick up the target cluster */ cid = r_values[nitems + did]; atomicAdd(&l_cent[cid], 1.0); for (index=1; index < width; index++) atomicAdd(&l_cent[index * k_value + cid], d_values[index * nitems + did]); } __syncthreads(); /* write back to the global C-matrix */ for (index = get_local_id(); index < width * k_value; index += get_local_size()) atomicAdd(&c_values[index], l_cent[index]); } : #plcuda_begin : status = pgstromLaunchDynamicKernel4((void *) setup_initial_cluster, (kern_arg_t)(D), (kern_arg_t)(R), (kern_arg_t)(C), (kern_arg_t)(r_seed), nitems, 0, 0); if (status != cudaSuccess) PLCUDA_RUNTIME_ERROR_RETURN(status); for (loop=0; loop < nloops; loop++) { : status = pgstromLaunchDynamicKernelMaxThreads3( (void *)kmeans_update_cluster, (kern_arg_t)(D), (kern_arg_t)(R), (kern_arg_t)(C), (kern_arg_t)k_value, nitems, 0, sizeof(cl_int) + sizeof(cl_float)); if (status != cudaSuccess) PLCUDA_RUNTIME_ERROR_RETURN(status); : } #plcuda_sanity_check gpu_kmeans_sanity_check #plcuda_working_bufsz gpu_kmeans_working_bufsz #plcuda_results_bufsz gpu_kmeans_results_bufsz #plcuda_end $$ LANGUAGE 'plcuda';
  • 35. The PG-Strom Project Test data for k-means clustering PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics35 ▌Dataset overview  A collection of datasets of vehicle traffic, observed between two points for a set duration of time over a period of 6 months  449 observation points, at Aarhus, Denmark.  13.5M records from Feb to June of 2014 ▌Data contains  average speed  average measured time  number of vehicles  latitude/longitude of the observation points  etc... ▌What we did  categorize the zone of road into 5-classes according to the characteristics of vehicle’s running style.
  • 36. The PG-Strom Project Invocation of GPU k-means function PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics36 SELECT report_id, k, c FROM (SELECT report_id, k, c, row_number() OVER (PARTITION BY report_id ORDER BY c DESC) rank FROM (SELECT report_id, k, count(*) c FROM matrix_unnest( (SELECT gpu_kmeans ( array_matrix( int4_as_float4(report_id), avg_measured_time, avg_speed, vehicle_count), 5) FROM tr_rawdata) ) R(report_id int, k int) GROUP BY report_id, k ) __summary_1 ) __summary_2 WHERE rank = 1; Make a matrix from the raw-data Run k-means clustering logic Pick-up most frequent cluster
  • 37. The PG-Strom Project GPU k-means (1/3) – clustering by all the data PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics37 $ wget -O map.png "`psql traffic -At -f ~/traffic.sql`" bypass highway? Road towards downtown? Beltway?
  • 38. The PG-Strom Project GPU k-means (2/3) – Daytime and Nighttime PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics38 daytime (8-17) nighttime (18-7)
  • 39. The PG-Strom Project GPU k-means (3/3) – Weekdays and Weekend PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics39 weekdays weekend
  • 40. The PG-Strom Project Invocation of GPU k-means function PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics40 SELECT report_id, k, c FROM (SELECT report_id, k, c, row_number() OVER (PARTITION BY report_id ORDER BY c DESC) rank FROM (SELECT report_id, k, count(*) c FROM matrix_unnest( (SELECT gpu_kmeans ( array_matrix( int4_as_float4(report_id), avg_measured_time, avg_speed, vehicle_count), 5) FROM tr_rawdata WHERE extract('hour' from timestamp) between 7 and 17 ) ) R(report_id int, k int) GROUP BY report_id, k ) __summary_1 ) __summary_2 WHERE rank = 1; Just add a line to select different input data set. Flexibility of SQL!
  • 41. The PG-Strom ProjectPGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics41 Summary
  • 42. The PG-Strom Project Summary PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics42 ▌What is PL/CUDA  Original concept of PG-Strom is automatic optimization.  PL/CUDA pulls out full capability of GPU instead of manual optimization. Likely, nobody has written advanced algorithm in SQL :-) ▌Advantages  TFLOPS grade computing engine for analytics in-database  No need to export entire dataset for analytics by external applications  Allows to utilize SQL flexibility for pre-/post-processing of the core analytics algorithms ▌Future Challenges  Data size larger than 1GB, because of varlena restriction in PostgreSQL  Asynchronous execution. CPU parallel  Time to construct matrix. If likely static, we can construct preliminary.
  • 43. The PG-Strom Project Resources PGconf.SV 2016 - PL/CUDA / Fusion of HPC Grade Power with In-Database Analytics43 ▌Repository https://github.com/pg-strom/devel ▌Today’s Slides http://www.slideshare.net/kaigai/pgconfsv2016-plcuda ▌Contact  kaigai@ak.jp.nec.com  Tw: @kkaigai