PG-Stromとは?
PostgreSQL Unconference 2021-May - PG-Strom New Features / GpuCache
3
【機能】
集計/解析ワークロードの透過的なGPU高速化
SSD-to-GPU Direct SQLによるPCIeバスレベルの最適化
Apache Arrow対応によるデータ交換、インポート時間をほぼゼロに
GPUメモリにデータを常駐。OLTPワークロードとの共存
PostGIS関数のサポート(一部)。位置情報分析を高速化
PG-Strom: GPUとNVME/PMEMの能力を最大限に引き出し、
テラバイト級のデータを高速処理するPostgreSQL向け拡張モジュール
App
GPU
off-loading
for IoT/Big-Data
for ML/Analytics
➢ SSD-to-GPU Direct SQL
➢ Columnar Store (Arrow_Fdw)
➢ GPU Memory Store (Gstore_Fdw)
➢ Asymmetric Partition-wise
JOIN/GROUP BY
➢ BRIN-Index Support
➢ NVME-over-Fabric Support
➢ Inter-process Data Frame
for Python scripts
➢ Procedural Language for
GPU native code (w/ cuPy)
➢ PostGIS Support
NEW
NEW
GpuCacheの設定(1/2)
▌GpuCacheが発動する条件
テーブルに AFTER INSERT OR UPDATE OR DELETE FOR ROW で、
pgstrom.gpucache_sync_trigger() を実行するトリガが設定されている。
テーブルに AFTER TRUNCATE FOR STATEMENT で、
pgstrom.gpucache_sync_trigger() を実行するトリガが設定されている。
(論理レプリケーションの場合は更に)
レプリケーション先でも上記トリガを実行するよう設定されている。
▌設定例
# create trigger row_sync after insert or update or delete
on mytest for each row
execute function pgstrom.gpucache_sync_trigger();
# create trigger stmt_sync after truncate
on mytest for each statement
execute function pgstrom.gpucache_sync_trigger();
# alter table mytest enable always trigger row_sync;
# alter table mytest enable always trigger stmt_sync;
PostgreSQL Unconference 2021-May - PG-Strom New Features / GpuCache
14
GpuCacheの設定(2/2)
▌GpuCache設定のカスタマイズ
# create trigger row_sync after insert or update or delete
on mytest for each row execute function
pgstrom.gpucache_sync_trigger(‘max_num_rows=25000000’);
ここに <key>=<value>[,...] 形式で設定を記述できる↑
▌指定できるパラメータ
✓ max_num_rows=NROWS (default: 10M rows)
GPUキャッシュ上に確保できる行数を指定する。コミット前の更新行など、
一時的に行数が増える可能性があるため、余裕を持って指定する。
✓ redo_buffer_size=SIZE (default: 160MB )
REDOバッファのサイズを指定する。単位として’k’,’m’,’g’を指定できる。
✓ gpu_sync_interval=SECONDS (default: 5sec)
REDOバッファが最後に書き込まれてから SECONDS 秒経過すると、
更新行数が少なくともGPU側へ反映を行う。
✓ gpu_sync_threshold=SIZE (default: 25% of redo_buffer_size)
REDOバッファの書き込みのうち、未反映分の大きさが SIZE バイトに達すると、
GPU側でログの反映を行う。単位として’k’,’m’,’g’を指定できる。
PostgreSQL Unconference 2021-May - PG-Strom New Features / GpuCache
15
GpuCacheの性能評価(2/3)- 更新性能
▌テストスクリプト
mytest.sql
¥set __dev_id (int(random(1, 8000000)))
¥set __new_x (double(random(1385650, 1410000)) / 10000.0)
¥set __new_y (double(random( 350000, 361550)) / 10000.0)
UPDATE dpoints
SET ts = now(), x = :__new_x, y = :__new_y
WHERE dev_id = :__dev_id;
$ pgbench -n -f mytest.sql postgres -c 32 -j 32 -T 10 -M prepared
:
number of transactions actually processed: 1258486
latency average = 0.255 ms
tps = 125661.080344 (excluding connections establishing)
➔ 32コネクション、32スレッドで上記のスクリプトを10秒間ひたすら回し続ける。
▌結果
GpuCacheあり TPS=125,661
GpuCacheなし TPS=133,938
▌参考結果(dpointsを unlogged table として定義した場合)
GpuCacheあり TPS=144,096
GpuCacheなし TPS=163,724
GpuCacheへの追記により、
TPSで10%程度のコスト。
PostgreSQL Unconference 2021-May - PG-Strom New Features / GpuCache
17
GpuCacheの性能評価(3/3)- 分析性能
▌テストスクリプト
SELECT n03_001 pref, n03_004 city, count(*)
FROM giscity g, dpoints d
WHERE n03_001 = ‘東京都’
AND st_contains(g.geom,st_makepoint(d.x, d.y))
GROUP BY pref, city
ORDER BY pref, city;
▌結果
PG-Strom + GpuCache: 857.203ms
PG-Strom(GpuCacheなし): 1368.695ms
➔データ構造にかなり左右されるハズ。
今回のデータは(機器ID、タイムスタンプ、緯度、経度)のみだが、
通常はこれに加えてメタデータ相当のフィールドをロードする必要あり。
PostgreSQL: 55749.379ms
PostgreSQL Unconference 2021-May - PG-Strom New Features / GpuCache
18
補足:分析クエリ実行計画(PG-Strom = On)
GroupAggregate (actual time=848.333..848.357 rows=54 loops=1)
Group Key: g.n03_001, g.n03_004
-> Sort (actual time=848.322..848.328 rows=54 loops=1)
Sort Key: g.n03_004
Sort Method: quicksort Memory: 29kB
-> Custom Scan (GpuPreAgg) (actual time=847.395..847.407 rows=54 loops=1)
Reduction: Local
Combined GpuJoin: enabled
GPU Preference: GPU0 (NVIDIA Tesla V100-PCIE-16GB)
-> Custom Scan (GpuJoin) on dpoints d (never executed)
Outer Scan: dpoints d (never executed)
Depth 1: GpuGiSTJoin
HeapSize: 7841.91KB (estimated: 3251.36KB), IndexSize: 13.25MB
IndexFilter: (g.geom ~ st_makepoint(d.x, d.y)) on giscity_geom
Rows Fetched by Index: 804104
JoinQuals: st_contains(g.geom, st_makepoint(d.x, d.y))
GPU Preference: GPU0 (NVIDIA Tesla V100-PCIE-16GB)
GPU Cache: NVIDIA Tesla V100-PCIE-16GB [max_num_rows: 12000000]
GPU Cache Size: main: 772.51M, extra: 0
-> Seq Scan on giscity g (actual time=0.173..19.394 rows=6173 loops=1)
Filter: ((n03_001)::text = '東京都'::text)
Rows Removed by Filter: 112726
Planning Time: 0.351 ms
Execution Time: 857.203 ms
(24 rows)
PostgreSQL Unconference 2021-May - PG-Strom New Features / GpuCache
19
補足:分析クエリ実行計画(PG-Strom = Off)
Finalize GroupAggregate (actual time=56053.249..56084.157 rows=54 loops=1)
Group Key: g.n03_001, g.n03_004
-> Gather Merge (actual time=56052.975..56084.093 rows=270 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial GroupAggregate (actual time=56033.094..56053.135 rows=54 loops=5)
Group Key: g.n03_001, g.n03_004
-> Sort (actual time=56032.296..56040.879 rows=81620 loops=5)
Sort Key: g.n03_004
Sort Method: quicksort Memory: 10188kB
Worker 0: Sort Method: quicksort Memory: 10090kB
Worker 1: Sort Method: quicksort Memory: 10147kB
Worker 2: Sort Method: quicksort Memory: 9950kB
Worker 3: Sort Method: quicksort Memory: 10060kB
-> Nested Loop (actual time=0.841..55383.095 rows=81620 loops=5)
-> Parallel Seq Scan on dpoints d
(actual time=0.017..161.622 rows=1600000 loops=5)
-> Index Scan using giscity_geom on giscity g
(actual time=0.033..0.034 rows=0 loops=8000000)
Index Cond: (geom ~ st_makepoint(d.x, d.y))
Filter: (((n03_001)::text = '東京都'::text) AND
st_contains(geom, st_makepoint(d.x, d.y)))
Rows Removed by Filter: 1
Planning Time: 0.162 ms
Execution Time: 56087.117 ms
(22 rows)
PostgreSQL Unconference 2021-May - PG-Strom New Features / GpuCache
20
GpuCacheと論理レプリケーション(3/6)
(インスタンス①での設定)
=# create table dpoints_odd
(dev_id int primary key, ts timestamp, x float, y float, check(dev_id % 2 = 1));
CREATE TABLE
=# grant SELECT ON dpoints_odd to slave;
GRANT
=# insert into dpoints_odd
(select x, now(), random(), random() from generate_series(1,8000000, 2) x);
INSERT 0 4000000
=# CREATE PUBLICATION repl_odd FOR TABLE dpoints_odd ;
CREATE PUBLICATION
(インスタンス②での設定)
=# create table dpoints_even
(dev_id int primary key, ts timestamp, x float, y float, check (dev_id % 2 = 0));
CREATE TABLE
=# grant SELECT on dpoints_even to slave;
GRANT
=# insert into dpoints_even
(select x, now(), random(), random() from generate_series(2,8000000, 2) x);
INSERT 0 4000000
=# CREATE PUBLICATION repl_even FOR TABLE dpoints_even ;
CREATE PUBLICATION
トランザクション側の設定(x2台分)
PostgreSQL Unconference 2021-May - PG-Strom New Features / GpuCache
23
GpuCacheと論理レプリケーション(4/6)
=# create table dpoints
(dev_id int not null, ts timestamp, x float, y float) partition by range ((dev_id % 2));
=# create table dpoints_even partition of dpoints for values from ( 0 ) to ( 1 );
=# create unique index dpoints_even__dev_id on dpoints_even(dev_id);
=# alter table dpoints_even add primary key using index dpoints_even__dev_id;
=# create trigger row_sync after insert or update or delete on dpoints_even for row
execute function pgstrom.gpucache_sync_trigger();
=# create trigger stmt_sync after truncate on dpoints_even for statement
execute function pgstrom.gpucache_sync_trigger();
=# alter table dpoints_even enable always trigger row_sync;
=# alter table dpoints_even enable always trigger stmt_sync;
// 同じ操作を dpoints_odd にも繰り返し
=# create table dpoints_odd partition of dpoints for values from ( 1 ) to ( 2 );
=# create unique index dpoints_odd__dev_id on dpoints_even(dev_id);
=# alter table dpoints_odd add primary key using index dpoints_odd__dev_id;
=# create trigger row_sync after insert or update or delete on dpoints_odd for row
execute function pgstrom.gpucache_sync_trigger();
=# create trigger stmt_sync after truncate on dpoints_odd for statement
execute function pgstrom.gpucache_sync_trigger();
=# alter table dpoints_odd enable always trigger row_sync;
=# alter table dpoints_odd enable always trigger stmt_sync;
// 論理レプリケーションの開始
=# CREATE SUBSCRIPTION repl_odd
CONNECTION 'host=172.31.19.228 dbname=postgres user=slave' PUBLICATION repl_odd;
=# CREATE SUBSCRIPTION repl_even
CONNECTION 'host=172.31.20.230 dbname=postgres user=slave' PUBLICATION repl_even;
分析側の設定(GPUインスタンス)
論理レプリケーションのスレーブ側で
トリガを起動させるために必要な設定
PostgreSQL Unconference 2021-May - PG-Strom New Features / GpuCache
24
GpuCacheと論理レプリケーション(5/6)
$ cat mytest_even.sql
¥set __dev_id (int(random(1, 4000000) * 2))
¥set __new_x (double(random(1,1000000)) / 10000.0)
¥set __new_y (double(random(1,1000000)) / 10000.0)
UPDATE dpoints_even
SET ts = now(), x = :__new_x, y = :__new_y
WHERE dev_id = :__dev_id;
$ /usr/pgsql-12/bin/pgbench -h 172.31.19.228 -n -f mytest_odd.sql postgres ¥
-c 96 -j 96 -T 10 -M prepared &
/usr/pgsql-12/bin/pgbench -h 172.31.20.230 -n -f mytest_even.sql postgres ¥
-c 96 -j 96 -T 10 -M prepared
:
number of transactions actually processed: 242223
latency average = 3.988 ms
tps = 24073.372665 (including connections establishing)
tps = 31549.989987 (excluding connections establishing)
:
number of transactions actually processed: 210376
latency average = 4.600 ms
tps = 20870.313617 (including connections establishing)
tps = 26237.257896 (excluding connections establishing)
8vCPU + EBS(gp3)のインスタンスでも5.7k TPS/s 程度出る
PostgreSQL Unconference 2021-May - PG-Strom New Features / GpuCache
25
GpuCacheと論理レプリケーション(6/6)
postgres=# explain select * from dpoints where x < 0.001 and y < 0.001 order by dev_id;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Sort (cost=10598.31..10598.31 rows=3 width=28)
Sort Key: dpoints_even.dev_id
-> Append (cost=4285.90..10598.28 rows=3 width=28)
-> Custom Scan (GpuScan) on dpoints_even (cost=4285.90..5301.80 rows=2 width=28)
GPU Filter: ((x < '0.001'::double precision) AND (y < '0.001'::double precision))
GPU Cache: NVIDIA Tesla V100-SXM2-16GB [max_num_rows: 10485760]
GPU Cache Size: main: 675.03M, extra: 0
-> Custom Scan (GpuScan) on dpoints_odd (cost=4285.90..5296.47 rows=1 width=28)
GPU Filter: ((x < '0.001'::double precision) AND (y < '0.001'::double precision))
GPU Cache: NVIDIA Tesla V100-SXM2-16GB [max_num_rows: 10485760]
GPU Cache Size: main: 675.03M, extra: 0
(11 rows)
postgres=# select * from dpoints where x < 0.001 and y < 0.001 order by dev_id;
dev_id | ts | x | y
---------+----------------------------+------------------------+-----------------------
932294 | 2021-05-10 04:08:16.916965 | 6.436344180471565e-05 | 0.0008733261591729047
1141774 | 2021-05-10 04:08:16.916965 | 0.0007466930297361785 | 0.0002913637187091922
1396325 | 2021-05-10 04:08:59.035472 | 0.00012029229877441594 | 0.0006813698993006767
2281051 | 2021-05-10 04:08:59.035472 | 0.0006355283323991046 | 0.0008848143919486517
5580669 | 2021-05-10 04:08:59.035472 | 1.9338331895824012e-05 | 0.0009735620154209812
7717535 | 2021-05-10 04:08:59.035472 | 0.0001952751708920175 | 0.0009738177919409452
(6 rows)
GPUサーバでの実行計画とデータの内容を確認
dpoints_even由来、
内容も一致
dpoints_odd由来、
内容も一致
PostgreSQL Unconference 2021-May - PG-Strom New Features / GpuCache
26