SlideShare uma empresa Scribd logo
1 de 52
Baixar para ler offline
NVMEandGPUacceleratesPostgreSQLbeyondthelimitation
〜Our challenge to the 10GB/s for big-data query execution〜
HeteroDB,Inc
Chief Architect & CEO
KaiGai Kohei <kaigai@heterodb.com>
Today’s Topic: Unveil how this benchmark results were achieved?
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-2
Benchmark conditions:
 By the PostgreSQL v11beta3 + PG-Strom v2.1devel on a single-node server system
 13 queries of Star-schema benchmark onto the 1055GB data set
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3 Q3_1 Q3_2 Q3_3 Q3_4 Q4_1 Q4_2 Q4_3
QueryExecutionThroughput[MB/s]
Star Schema Benchmark for PostgreSQL 11beta3 + PG-Strom v2.1devel
PG-Strom v2.1devel
max 13.5GB/s in query execution throughput on single-node PostgreSQL
about HeteroDB
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-3
Corporate overview
 Name HeteroDB,Inc
 Established 4th-Jul-2017
 Headcount 2 (KaiGai and Kashiwagi)
 Location Shinagawa, Tokyo, Japan
 Businesses Sales of accelerated database product
Technical consulting on GPU&DB region
By the heterogeneous-computing technology on the database area,
we provides a useful, fast and cost-effective data analytics platform
for all the people who need the power of analytics.
CEO Profile
 KaiGai Kohei – He has contributed for PostgreSQL and Linux kernel
development in the OSS community more than ten years, especially,
for security and database federation features of PostgreSQL.
 Award of “Genius Programmer” by IPA MITOH program (2007)
 The top-5 posters finalist at GPU Technology Conference 2017.
Features of RDBMS
✓ High-availability / Clustering
✓ DB administration and backup
✓ Transaction control
✓ BI and visualization
➔ We can use the products that
support PostgreSQL as-is.
Core technology – PG-Strom
PG-Strom: An extension module for PostgreSQL, to accelerate SQL
workloads by the thousands cores and wide-band memory of GPU.
GPU
Big-data Analytics
PG-Strom
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-4
Mass data loading from
the storage device rapidly
Machine-learning & Statistics
GPU’s characteristics - mostly as a computing accelerator
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-5
Over 10years history in HPC, then massive popularization in Machine-Learning
NVIDIA Tesla V100
Super Computer
(TITEC; TSUBAME3.0) Computer Graphics Machine-Learning
Today’s Topic
How I/O workloads are accelerated by GPU that is a computing accelerator?
Simulation
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-6
How PostgreSQL utilizes GPU?
〜Architecture of PG-Strom〜
Interactions on PostgreSQL ïƒł PG-Strom Interactions
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-7
SQL Parser
Query
Optimizer
Query
Executor
Storage Manager
Database Files
GPU Path
constructor
GPU Code
generator
GPU JIT
compiler
(NVRTC)
GpuScan
GpuJoin
GpuPreAgg
PL/CUDA
nvme-strom Gstore_fdw
CustomScan Interface
(PgSQL v9.5~)
PG-Strom
Construction of query plan with CustomScan
Scan
t0 Scan
t1
Join
t0,t1
Statistics)
nrows: 1.2M
width: 80
Index: none
candidate
HashJoin
cost=4000
candidate
MergeJoin
cost=12000
candidate
NestLoop
cost=99999
candidate
Parallel
Hash Join
cost=3000
candidate
GpuJoin
cost=2500
WINNER!
Built-in execution path of PostgreSQLProposition by extensions
since PostgreSQL v9.5
since PostgreSQL v9.6
GpuJoin
t0,t1
Statistics)
nrows: 4000
width: 120
Index: t1.id
Competition of multiple algorithms, then chosen by the “cost”.
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-8
How CustomScan performs
As long as consistent results are made, implementations are flexible.
CustomScan (GpuJoin)
(*BeginCustomScan)(...)
(*ExecCustomScan)(...)
(*EndCustomScan)(...)
:
SeqScan
on t0
SeqScan
on t1
GroupAgg
key: cat
ExecInitGpuJoin(...)
 Initialize GPU context
 Kick asynchronous JIT compilation of
the GPU program auto-generated
ExecGpuJoin(...)
 Read records from the t0 and t1, and
copy to the DMA buffer
 Kick asynchronous GPU tasks
 Fetch results from the completed GPU
tasks, then pass them to the next step
(GroupAgg)
ExecEndGpuJoin(...)
 Wait for completion of the
asynchronous tasks (if any)
 Release of GPU resource
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-9
GPU code generation from SQL - Example of WHERE-clause
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-10
QUERY: SELECT cat, count(*), avg(x) FROM t0
WHERE x between y and y + 20.0 GROUP BY cat;
:
STATIC_FUNCTION(bool)
gpupreagg_qual_eval(kern_context *kcxt,
kern_data_store *kds,
size_t kds_index)
{
pg_float8_t KPARAM_1 = pg_float8_param(kcxt,1);
pg_float8_t KVAR_3 = pg_float8_vref(kds,kcxt,2,kds_index);
pg_float8_t KVAR_4 = pg_float8_vref(kds,kcxt,3,kds_index);
return EVAL((pgfn_float8ge(kcxt, KVAR_3, KVAR_4) &&
pgfn_float8le(kcxt, KVAR_3,
pgfn_float8pl(kcxt, KVAR_4, KPARAM_1))));
} :
E.g) Transformation of the numeric-formula
in WHERE-clause to CUDA C code on demand
Reference to input data
SQL expression in CUDA source code
Run-time
compiler
Parallel
Execution
EXPLAIN shows query execution plan
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-11
postgres=# EXPLAIN ANALYZE
SELECT cat,count(*),sum(ax) FROM tbl NATURAL JOIN t1 WHERE cid % 100 < 50 GROUP BY cat;
QUERY PLAN
---------------------------------------------------------------------------------------------------
GroupAggregate (cost=203498.81..203501.80 rows=26 width=20)
(actual time=1511.622..1511.632 rows=26 loops=1)
Group Key: tbl.cat
-> Sort (cost=203498.81..203499.26 rows=182 width=20)
(actual time=1511.612..1511.613 rows=26 loops=1)
Sort Key: tbl.cat
Sort Method: quicksort Memory: 27kB
-> Custom Scan (GpuPreAgg) (cost=203489.25..203491.98 rows=182 width=20)
(actual time=1511.554..1511.562 rows=26 loops=1)
Reduction: Local
Combined GpuJoin: enabled
-> Custom Scan (GpuJoin) on tbl (cost=13455.86..220069.26 rows=1797115 width=12)
(never executed)
Outer Scan: tbl (cost=12729.55..264113.41 rows=6665208 width=8)
(actual time=50.726..1101.414 rows=19995540 loops=1)
Outer Scan Filter: ((cid % 100) < 50)
Rows Removed by Outer Scan Filter: 10047462
Depth 1: GpuHashJoin (plan nrows: 6665208...1797115,
actual nrows: 9948078...2473997)
HashKeys: tbl.aid
JoinQuals: (tbl.aid = t1.aid)
KDS-Hash (size plan: 11.54MB, exec: 7125.12KB)
-> Seq Scan on t1 (cost=0.00..2031.00 rows=100000 width=12)
(actual time=0.016..15.407 rows=100000 loops=1)
Planning Time: 0.721 ms
Execution Time: 1595.815 ms
(19 rows)
What’s happen?
GpuScan + GpuJoin + GpuPreAgg Combined Kernel (1/3)
Aggregation
GROUP BY
JOIN
SCAN
SELECT cat, count(*), avg(x)
FROM t0 JOIN t1 ON t0.id = t1.id
WHERE y like ‘%abc%’
GROUP BY cat;
count(*), avg(x)
GROUP BY cat
t0 JOIN t1
ON t0.id = t1.id
WHERE y like ‘%abc%’
results
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-12
GpuScan
GpuJoin
Agg
+
GpuPreAgg
SeqScan
HashJoin
Agg
GpuScan + GpuJoin + GpuPreAgg Combined Kernel (2/3)
GpuScan
kernel
GpuJoin
kernel
GpuPreAgg
kernel
DMA
Buffer
GPU
CPU
Storage
Simple replacement of the logics makes ping-pong of
data-transfer between CPU and GPU
DMA
Buffer
DMA
Buffer
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-13
DMA
Buffer
Agg
(PostgreSQL)
results
GpuScan + GpuJoin + GpuPreAgg Combined Kernel (3/3)
GpuScan
kernel
GpuJoin
kernel
GpuPreAgg
kernel
DMA
Buffer
GPU
CPU
Storage
Save the data-transfer by data exchange on the GPU device memory
GPU
Buffer
GPU
Buffer results
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-14
DMA
Buffer
Agg
(PostgreSQL)
Combined GPU kernel for SCAN + JOIN + GROUP BY
data size
= Large
data size
= Small
Usually, amount of data size to be written back from GPU
is much smaller than the data size sent to GPU
EXPLAIN shows query execution plan (again)
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-15
postgres=# EXPLAIN ANALYZE
SELECT cat,count(*),sum(ax) FROM tbl NATURAL JOIN t1 WHERE cid % 100 < 50 GROUP BY cat;
QUERY PLAN
---------------------------------------------------------------------------------------------------
GroupAggregate (cost=203498.81..203501.80 rows=26 width=20)
(actual time=1511.622..1511.632 rows=26 loops=1)
Group Key: tbl.cat
-> Sort (cost=203498.81..203499.26 rows=182 width=20)
(actual time=1511.612..1511.613 rows=26 loops=1)
Sort Key: tbl.cat
Sort Method: quicksort Memory: 27kB
-> Custom Scan (GpuPreAgg) (cost=203489.25..203491.98 rows=182 width=20)
(actual time=1511.554..1511.562 rows=26 loops=1)
Reduction: Local
Combined GpuJoin: enabled
-> Custom Scan (GpuJoin) on tbl (cost=13455.86..220069.26 rows=1797115 width=12)
(never executed)
Outer Scan: tbl (cost=12729.55..264113.41 rows=6665208 width=8)
(actual time=50.726..1101.414 rows=19995540 loops=1)
Outer Scan Filter: ((cid % 100) < 50)
Rows Removed by Outer Scan Filter: 10047462
Depth 1: GpuHashJoin (plan nrows: 6665208...1797115,
actual nrows: 9948078...2473997)
HashKeys: tbl.aid
JoinQuals: (tbl.aid = t1.aid)
KDS-Hash (size plan: 11.54MB, exec: 7125.12KB)
-> Seq Scan on t1 (cost=0.00..2031.00 rows=100000 width=12)
(actual time=0.016..15.407 rows=100000 loops=1)
Planning Time: 0.721 ms
Execution Time: 1595.815 ms
(19 rows)
Combined GPU kernel processes this portion
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-16
Re-definition of the GPU’s role
〜How GPU accelerates I/O workloads〜
A usual composition of x86_64 server
GPUSSD
CPU
RAM
HDD
N/W
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-17
Data flow to process a massive amount of data
CPU RAM
SSD GPU
PCIe
PostgreSQL
Data Blocks
Normal Data Flow
All the records, including junks, must be loaded
onto RAM once, because software cannot check
necessity of the rows prior to the data loading.
So, amount of the I/O traffic over PCIe bus tends
to be large.
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-18
Unless records are not loaded to CPU/RAM once, over the PCIe bus,
software cannot check its necessity even if it is “junk”.
Core Feature: SSD-to-GPU Direct SQL
CPU RAM
SSD GPU
PCIe
PostgreSQL
Data Blocks
NVIDIA GPUDirect RDMA
It allows to load the data blocks on NVME-SSD
to GPU using peer-to-peer DMA over PCIe-bus;
bypassing CPU/RAM. WHERE-clause
JOIN
GROUP BY
Run SQL by GPU
to reduce the data size
Data Size: Small
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-19
v2.0
Element technology - GPUDirect RDMA (1/2)
▌P2P data transfer technology between GPU and other PCIe devices, bypass CPU
 Originally designed for multi-nodes MPI over Infiniband
 Infrastructure of Linux kernel driver for other PCIe devices, including NVME-SSDs.
Copyright (c) NVIDIA corporation, 2015
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-20
Element technology - GPUDirect RDMA (2/2)
Physical
address space
PCIe BAR1 Area
GPU
device
memory
RAM
NVMe-SSD Infiniband
HBA
PCIe device
GPUDirect RDMA
It enables to map GPU device
memory on physical address
space of the host system
Once “physical address of GPU device memory”
appears, we can use is as source or destination
address of DMA with PCIe devices.
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-21
0xf0000000
0xe0000000
DMA Request
SRC: 1200th sector
LEN: 40 sectors
DST: 0xe0200000
Benchmark Results – single-node version
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-22
2172.3 2159.6 2158.9 2086.0 2127.2 2104.3
1920.3
2023.4 2101.1 2126.9
1900.0 1960.3
2072.1
6149.4 6279.3 6282.5
5985.6 6055.3 6152.5
5479.3
6051.2 6061.5 6074.2
5813.7 5871.8 5800.1
0
1000
2000
3000
4000
5000
6000
7000
Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3 Q3_1 Q3_2 Q3_3 Q3_4 Q4_1 Q4_2 Q4_3
QueryProcessingThroughput[MB/sec]
Star Schema Benchmark on NVMe-SSD + md-raid0
PgSQL9.6(SSDx3) PGStrom2.0(SSDx3) H/W Spec (3xSSD)
SSD-to-GPU Direct SQL pulls out an awesome performance close to the H/W spec
 Measurement by the Star Schema Benchmark; which is a set of typical batch / reporting workloads.
 CPU: Intel Xeon E5-2650v4, RAM: 128GB, GPU: NVIDIA Tesla P40, SSD: Intel 750 (400GB; SeqRead 2.2GB/s)x3
 Size of dataset is 353GB (sf: 401), to ensure I/O bounds workload
SSD-to-GPU Direct SQL - Software Stack
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-23
Filesystem
(ext4, xfs)
nvme_strom
kernel module
NVMe SSD
NVIDIA Tesla GPU
PostgreSQL
pg_strom
extension
read(2) ioctl(2)
Hardware
Layer
Operating
System
Software
Layer
Database
Software
Layer
blk-mq
nvme pcie nvme rdma
Network HBA
File-offset to NVME-SSD sector
number translation
NVMEoF Target
(JBOF)
NVMe
Request
■ Other software component
■ Our developed software
■ Hardware
NVME over Fabric
nvme_strom v2.0 supports
NVME-over-Fabric (RDMA).
uninitialized
File
BLK-100: unknown
BLK-101: unknown
BLK-102: unknown
BLK-103: unknown
BLK-104: unknown
BLK-105: unknown
BLK-106: unknown
BLK-107: unknown
BLK-108: unknown
BLK-109: unknown
Consistency with disk buffers of PostgreSQL / Linux kernel
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-24
BLK-100: uncached
BLK-101: cached by PG
BLK-102: uncached
BLK-103: uncached
BLK-104: cached by OS
BLK-105: cached by PG
BLK-106: uncached
BLK-107: uncached
BLK-108: cached by OS
BLK-109: uncached
BLCKSZ
(=8KB)
Transfer Size
Per Request
BLCKSZ *
NChunks
BLK-108: cached by OS
BLK-104: cached by OS
BLK-105: cached by PG
BLK-101: cached by PG
BLK-100: uncached
BLK-102: uncached
BLK-103: uncached
BLK-106: uncached
BLK-107: uncached
BLK-109: uncached
unused
SSD-to-GPU
P2P DMA
Userspace DMA
Buffer (RAM)
Device Memory
(GPU)
CUDA API
(userspace)
cuMemcpyHtoDAsync
① PG-Strom checks PG’s shared buffer of the blocks to read
✓ If block is already cached on the shared buffer of PostgreSQL.
✓ If block may not be all-visible using visibility-map.
② NVME-Strom checks OS’s page cache of the blocks to read
✓ If block is already cached on the page cache of Linux kernel.
✓ But, on fast-SSD mode, page cache shall not be copied unless it is not diry.
⑱ Then, kicks SSD-to-GPU P2P DMA on the uncached blocks
BLK-108: cached by OS
BLK-104: cached by OS
BLK-105: cached by PG
BLK-101: cached by PG
40min queries
in the former version,
now 30sec
Case Study: Log management and analytics solution
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-25
Database server with
GPU+NVME-SSD
Search by SQL
Security incident
‱ Fast try & error
‱ Familiar interface
‱ Search on row-data
Aug 12 18:01:01 saba systemd: Started
Session 21060 of user root.
Understanding the
situation of damage
Identifying the impact
Reporting to the manager
Run faster, beyond the limitation
Approach① – Faster NVME-SSD (1/2)
Intel DC P4600 (2.0TB, HHHL)
SeqRead: 3200MB/s, SeqWrite: 1575MB/s
RandRead: 610k IOPS, RandWrite: 196k IOPS
Interface: PCIe 3.0 (x4)
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-27
May I pull out the maximum performance of
them?
Approach① – Faster NVME-SSD (1/2)
Broadwell-EP is capable up to 7.1GB/s for P2P DMA routing performance.
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-28
Approach② – The latest CPU generation
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-29
Supermicro 1019GP-TT
CPU: Xeon Gold 6126T (2.6GHz, 12C)
RAM: 192GB (32GB DDR4-2666 x6)
GPU: NVIDIA Tesla P40 (3840C, 24GB) x1
SSD: Intel SSD DC P4600 (2.0TB, HHHL) x3
HDD: 2.0TB (SATA, 72krpm) x6
N/W: 10Gb ethernet x2ports
Approach② – The latest CPU generation
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-30
Skylake-SP improved the P2P DMA routing performance to 8.5GB/s.
GPU
SSD-1
SSD-2
SSD-3
md-raid0
Xeon
Gold
6126T
routing
by CPU
Consideration for the hardware configuration (1/3)
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-31
① SSD and GPU are connected
to the same PCIe-switch
OK
② CPU controls PCIe-bus, and
SSD and GPU are directly
connected to the same CPU
Workable
⑱ SSD and GPU are connected
to the different CPUs
Not Supported
CPU CPU
PLX
SSD GPU
PCIe-switch
CPU CPU
SSD GPU
CPU CPU
SSD GPU
QPI
A pair of SSD and GPU must be under a particular CPU or PLX(PCIe-switch).
PLX is more preferable than CPU.
https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#supported-systems
Consideration for the hardware configuration (2/3)
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-32
PCIe-switch can make CPU more relaxed.
CPU CPU
PLX
SSD GPU
PLX
SSD GPU
PLX
SSD GPU
PLX
SSD GPU
SCAN SCAN SCAN SCAN
JOIN JOIN JOIN JOIN
GROUP BY GROUP BY GROUP BY GROUP BY
Very small amount of data
GATHER GATHER
Hardware with PCIe switch (1/2) - HPC Server
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-33
 Pros
✓ Legacy but stable hardware
 Cons
✓ Less flexibility for re-configuration, or hardware upgrade
Supermicro SYS-4029TRT2
x96 lane
PCIe switch
x96 lane
PCIe switch
CPU2 CPU1
QPI
Gen3
x16
Gen3 x16
for each slot
Gen3 x16
for each slot
Gen3
x16
Hardware with PCIe switch (2/2) - I/O expansion box
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-34
 Pros
✓ Flexibility of re-configuration and hardware upgrade
 Cons
✓ Special hardware is required (but no extra drivers)
NEC ExpEther 40G (4slots)
4 slots of
PCIe Gen3 x8
PCIe
Swich
40Gb
Ethernet Network
switch
CPU
NIC
extra I/O boxes
Consideration for the hardware configuration (3/3)
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-35
Most of the SQL workloads are executed inside of the I/O box
CPU CPU
PLX
SSD GPU
PLX
SSD GPU
PLX
SSD GPU
PLX
SSD GPU
SCAN SCAN SCAN SCAN
JOIN JOIN JOIN JOIN
GROUP BY GROUP BY GROUP BY GROUP BY
GATHER GATHER
I/O BoxI/O BoxI/O BoxI/O Box
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-36
Optimization of the Storage Path
with I/O Expansion Box
System configuration with I/O expansion boxes
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-37
PCIe I/O Expansion Box
Host System
(x86_64 server)
NVMe SSD
PostgreSQL Tables
PostgreSQL
Data Blocks
Internal
PCIe Switch
SSD-to-GPU P2P DMA
(Large data size)
GPU
WHERE-clause
JOIN
GROUP BY
PCIe over
Ethernet
Pre-processed
small data
A few GB/s
SQL execution
performance
per box
A few GB/s
SQL execution
performance
per box
A few GB/s
SQL execution
performance
per box
NIC / HBA
Simplified DB operations and APP development
by the simple single-node PostgreSQL configuration
Enhancement of capacity & performance
Visible as leafs of partitioned child-tables on PostgreSQL
v2.1
Hardware optimized table partition layout
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-38
lineorder
lineorder_p0
lineorder_p1
lineorder_p2
reminder=0
reminder=1
reminder=2
customer date
supplier parts
tablespace: nvme0
tablespace: nvme1
tablespace: nvme2
key
INSERT Hashed key
hash = f(key)
hash % 3 = 2
hash % 3 = 0
Raw data
1053GB
Partial data
351GB
Partial data
351GB
Partial data
351GB
v2.1
Partition leaf for each I/O expansion box on behalf of the tablespaces
Partition-wise GpuJoin/GpuPreAgg1/3
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-39
lineorder
lineorder_p0
lineorder_p1
lineorder_p2
reminder=0
reminder=1
reminder=2
customer date
supplier parts
tablespace: nvme0
tablespace: nvme1
tablespace: nvme2
New in PostgreSQL v11: Parallel scan of the partition leafs
Scan
Scan
Scan
Gather
Join
Agg
Query
Results
Scan
Massive records
makes hard to gather
v2.1
Partition-wise GpuJoin/GpuPreAgg2/3
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-40
lineorder
lineorder_p0
lineorder_p1
lineorder_p2
reminder=0
reminder=1
reminder=2
customer date
supplier parts
tablespace: nvme0
tablespace: nvme1
tablespace: nvme2
Preferable: Gathering the partition-leafs after JOIN / GROUP BY
Join
Gather
Agg
Query
Results
Scan
Scan
PreAgg
Join
Scan
PreAgg
Join
Scan
PreAgg
v2.1
Partition-wise GpuJoin/GpuPreAgg3/3
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-41
ssbm =# EXPLAIN SELECT sum(lo_extendedprice*lo_discount) as revenue
FROM lineorder,date1
WHERE lo_orderdate = d_datekey
AND d_year = 1993
AND lo_discount between 1 and 3
AND lo_quantity < 25;
QUERY PLAN
------------------------------------------------------------------------------
Aggregate
-> Gather
Workers Planned: 9
-> Parallel Append
-> Parallel Custom Scan (GpuPreAgg)
Reduction: NoGroup
Combined GpuJoin: enabled
GPU Preference: GPU2 (Tesla P40)
-> Parallel Custom Scan (GpuJoin) on lineorder_p2
Outer Scan: lineorder_p2
Outer Scan Filter: ((lo_discount >= '1'::numeric) AND (lo_discount <= '3'::numeric)
AND (lo_quantity < '25'::numeric))
Depth 1: GpuHashJoin (nrows 102760469...45490403)
HashKeys: lineorder_p2.lo_orderdate
JoinQuals: (lineorder_p2.lo_orderdate = date1.d_datekey)
KDS-Hash (size: 66.03KB)
GPU Preference: GPU2 (Tesla P40)
NVMe-Strom: enabled
-> Seq Scan on date1
Filter: (d_year = 1993)
-> Parallel Custom Scan (GpuPreAgg)
Reduction: NoGroup
Combined GpuJoin: enabled
GPU Preference: GPU1 (Tesla P40)
:
Portion to be executed
on the 3rd I/O expansion box.
v2.1
Benchmark (1/3) - System configuration
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-42
x 1
x 3
x 6
x 3
NEC Express5800/R120h-2m
CPU: Intel Xeon E5-2603 v4 (6C, 1.7GHz)
RAM: 64GB
OS: Red Hat Enterprise Linux 7
(kernel: 3.10.0-862.9.1.el7.x86_64)
CUDA-9.2.148 + driver 396.44
DB: PostgreSQL 11beta3 + PG-Strom v2.1devel
NEC ExpEther 40G (4slots)
I/F: PCIe 3.0 x8 (x16 physical) ... 4slots
with internal PCIe switch
N/W: 40Gb-ethernet
Intel DC P4600 (2.0TB; HHHL)
SeqRead: 3200MB/s, SeqWrite: 1575MB/s
RandRead: 610k IOPS, RandWrite: 196k IOPS
I/F: PCIe 3.0 x4
NVIDIA Tesla P40
# of cores: 3840 (1.3GHz)
Device RAM: 24GB (347GB/s, GDDR5)
CC: 6.1 (Pascal, GP104)
I/F: PCIe 3.0 x16
SPECIAL THANKS FOR
v2.1
Benchmark (2/3) - Result of query execution performance
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-43
 13 SSBM queries to 1055GB database in total (a.k.a 351GB per I/O expansion box)
 Raw I/O data transfer without SQL execution was up to 9GB/s.
In other words, SQL execution was faster than simple storage read with raw-I/O.
2,388 2,477 2,493 2,502 2,739 2,831
1,865
2,268 2,442 2,418
1,789 1,848
2,202
13,401 13,534 13,536 13,330
12,696
12,965
12,533
11,498
12,312 12,419 12,414 12,622 12,594
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3 Q3_1 Q3_2 Q3_3 Q3_4 Q4_1 Q4_2 Q4_3
QueryExecutionThroughput[MB/s]
Star Schema Benchmark for PgSQL v11beta3 / PG-Strom v2.1devel on NEC ExpEther x3
PostgreSQL v11beta3 PG-Strom v2.1devel Raw I/O Limitation
max 13.5GB/s for query execution performance with 3x I/O expansion boxes!!
v2.1
Benchmark (3/3) - Density of I/O per expansion box
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-44
0
2000
4000
6000
8000
10000
12000
14000
16000
nvme0n1 nvme1n1 nvme2n1 nvme3n1 nvme4n1 nvme5n1
I/O workload balances over the I/O expansion boxes, more scaling are expected
 On SQL execution, raw-I/O performance was 5000〜5100MB/s per expansion box, and 2600MB/s per NVME-SSD.
 Overall performance was balanced, so we can expect performance scaling if more expansion boxes.
➔ 4.5GB/s x8 = 36GB/s is expected if 8 expansion box configuration; close to the commercial DWH solutions.
v2.1
PostgreSQL is a successor of XXXX?
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-45
13.5GB/s with
3x I/O boxes
30GB/s with
8x I/O boxes
is EXPECTED
Single-node PostgreSQL now offers DWH-appliance grade performance
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-46
Conclusion
Expected usage – Log data processing and analysis
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-47
As a data management, analytics and machine-learning platform for log data daily growing up
Manufacturing Logistics Mobile Home electronics
GPU + NVME-SSD
Why PG-Strom?
 It covers around 30-50TB data with single node by addition of I/O expansion box.
 It allows to process the raw log data as is, more than max performance of H/W.
 Users can continue to use the familiar SQL statement and applications.
Summary
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-48
 PG-Strom
An extension module for PostgreSQL, to accelerate SQL execution by GPU. It pulls out maximum
potential of hardware to summarize and analyze large data more than terabytes class.
 Core feature: SSD-to-GPU Direct SQL
It directly transfers the data blocks on NVME-SSD to GPU by P2P DMA, and runs SQL workloads on
GPU prior to data loading onto the host system. By reduction of the data to be processed, it
improves the performance of I/O bound jobs.
 Multiple GPU/SSD configuration with I/O expansion box
To avoid saturation of CPU which performs PCIe root complex, it exchanges P2P DMA packets close
to the storage device by I/O expansion box that mounts PCIe switch.
Not only mitigation of CPU loads, but also allows enhancement of database capacity and
performance on demand. We measured 13.5GB/s in SQL execution by 3x expansion box. More
performance is expected according to the investment of hardware.
 Expected use scenario
Database system which stores massive logs, including M2M.
Simpleness of operations by single-node PostgreSQL, and continuity of the skill-set by the familiar
SQL statement and applications.
Adoption targets: small〜middle Hadoop clusters, or entry-class DWH solutions.
Enhancement (1/2) - NVME-over-Fabric and GPU RDMA support ready
Host System
SSD SSD
SSD SSD
SSD SSD
HBA
SSD SSD
SSD SSD
SSD SSD
HBA
SSD SSD
SSD SSD
SSD SSD
HBA
CPU
HBA
InfiniBand or
100Gb-Ethernet
JBOF JBOF JBOF
More NVME-SSDs than what we can install on single-node with GPUs
GPU-node
Storage-node
SSD-to-GPU Direct SQL
over 100Gb Ethernet
with RDMA protocol
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-49
Enhancement (2/2) – FDW for Parquet with SSD-to-GPU Direct WIP
Special optimization for Parquet file as a columnar format on SSD-to-GPU Direct.
It allows data exchange on IoT grade system with no data import.
Foreign Scan
GpuJoin
GpuPreAgg
Agg
Parquet File
Storage I/O
SSD→CPU
CPU → GPU
GPU → GPU
GPU → CPU
Data Format
Conversion
(CPU)
Most of the workloads here
GpuJoin on
Parquet Files
GpuPreAgg
Agg
Storage I/O
SSD→GPU
Point-1
Smaller data
loading from
the storage
Point-2
Optimal structure
for GPU’s memory
subsystem
Current behavior Support of Parquet File
Parquet File
Combined GPU kernel
for Parquet Format
GPU → GPU
GPU → CPU
Parquet data format
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-50
Resources
NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-51
▌Repository
https://github.com/heterodb/pg-strom
▌Documentation
http://heterodb.github.io/pg-strom/
▌Software Distribution
https://heterodb.github.io/swdc/
▌Contact
ML: pgstrom@heterodb.com
e-mail: kaigai@heterodb.com
twitter: @kkaigai
20181016_pgconfeu_ssd2gpu_multi

Mais conteĂșdo relacionado

Mais procurados

Dbts2013 ç‰čæżƒjpoug log_file_sync
Dbts2013 ç‰čæżƒjpoug log_file_syncDbts2013 ç‰čæżƒjpoug log_file_sync
Dbts2013 ç‰čæżƒjpoug log_file_sync
Koji Shinkubo
 
Sql server ぼバックケップべăƒȘă‚čトケぼćŸș瀎
Sql server ぼバックケップべăƒȘă‚čトケぼćŸș瀎Sql server ぼバックケップべăƒȘă‚čトケぼćŸș瀎
Sql server ぼバックケップべăƒȘă‚čトケぼćŸș瀎
Masayuki Ozawa
 

Mais procurados (20)

Dbts2013 ç‰čæżƒjpoug log_file_sync
Dbts2013 ç‰čæżƒjpoug log_file_syncDbts2013 ç‰čæżƒjpoug log_file_sync
Dbts2013 ç‰čæżƒjpoug log_file_sync
 
Sql server ぼバックケップべăƒȘă‚čトケぼćŸș瀎
Sql server ぼバックケップべăƒȘă‚čトケぼćŸș瀎Sql server ぼバックケップべăƒȘă‚čトケぼćŸș瀎
Sql server ぼバックケップべăƒȘă‚čトケぼćŸș瀎
 
Attunityăźă‚œăƒȘăƒ„ăƒŒă‚·ăƒ§ăƒłăšç•°çšźăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čăƒ»ă‚Żăƒ©ă‚Šăƒ‰ç§»èĄŒäș‹äŸ‹ăźă”çŽč介
Attunityăźă‚œăƒȘăƒ„ăƒŒă‚·ăƒ§ăƒłăšç•°çšźăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čăƒ»ă‚Żăƒ©ă‚Šăƒ‰ç§»èĄŒäș‹äŸ‹ăźă”çŽč介Attunityăźă‚œăƒȘăƒ„ăƒŒă‚·ăƒ§ăƒłăšç•°çšźăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čăƒ»ă‚Żăƒ©ă‚Šăƒ‰ç§»èĄŒäș‹äŸ‹ăźă”çŽč介
Attunityăźă‚œăƒȘăƒ„ăƒŒă‚·ăƒ§ăƒłăšç•°çšźăƒ‡ăƒŒă‚żăƒ™ăƒŒă‚čăƒ»ă‚Żăƒ©ă‚Šăƒ‰ç§»èĄŒäș‹äŸ‹ăźă”çŽč介
 
TPC-DSから歊ぶPostgreSQLăźćŒ±ç‚čăšä»ŠćŸŒăźć±•æœ›
TPC-DSから歊ぶPostgreSQLăźćŒ±ç‚čăšä»ŠćŸŒăźć±•æœ›TPC-DSから歊ぶPostgreSQLăźćŒ±ç‚čăšä»ŠćŸŒăźć±•æœ›
TPC-DSから歊ぶPostgreSQLăźćŒ±ç‚čăšä»ŠćŸŒăźć±•æœ›
 
Exadata X8M-2 KVMä»źæƒłćŒ–ăƒ™ă‚čăƒˆăƒ—ăƒ©ă‚Żăƒ†ă‚Łă‚č
Exadata X8M-2 KVMä»źæƒłćŒ–ăƒ™ă‚čăƒˆăƒ—ăƒ©ă‚Żăƒ†ă‚Łă‚čExadata X8M-2 KVMä»źæƒłćŒ–ăƒ™ă‚čăƒˆăƒ—ăƒ©ă‚Żăƒ†ă‚Łă‚č
Exadata X8M-2 KVMä»źæƒłćŒ–ăƒ™ă‚čăƒˆăƒ—ăƒ©ă‚Żăƒ†ă‚Łă‚č
 
Parallel Query in AWS Aurora MySQL
Parallel Query in AWS Aurora MySQLParallel Query in AWS Aurora MySQL
Parallel Query in AWS Aurora MySQL
 
Oracle Database 11g,12că‹ă‚‰ăźă‚ąăƒƒăƒ•ă‚šă‚Żă‚™ăƒŹăƒŒăƒˆă‚™ćŻŸç­–ăšă‚Żăƒ©ă‚Šăƒˆă‚™ç§»èĄŒ (Oracle Cloudă‚Šă‚§ăƒ“ăƒŠăƒŒă‚·ăƒȘăƒŒă‚ș: 2021ćčŽ7...
Oracle Database 11g,12că‹ă‚‰ăźă‚ąăƒƒăƒ•ă‚šă‚Żă‚™ăƒŹăƒŒăƒˆă‚™ćŻŸç­–ăšă‚Żăƒ©ă‚Šăƒˆă‚™ç§»èĄŒ (Oracle Cloudă‚Šă‚§ăƒ“ăƒŠăƒŒă‚·ăƒȘăƒŒă‚ș: 2021ćčŽ7...Oracle Database 11g,12că‹ă‚‰ăźă‚ąăƒƒăƒ•ă‚šă‚Żă‚™ăƒŹăƒŒăƒˆă‚™ćŻŸç­–ăšă‚Żăƒ©ă‚Šăƒˆă‚™ç§»èĄŒ (Oracle Cloudă‚Šă‚§ăƒ“ăƒŠăƒŒă‚·ăƒȘăƒŒă‚ș: 2021ćčŽ7...
Oracle Database 11g,12că‹ă‚‰ăźă‚ąăƒƒăƒ•ă‚šă‚Żă‚™ăƒŹăƒŒăƒˆă‚™ćŻŸç­–ăšă‚Żăƒ©ă‚Šăƒˆă‚™ç§»èĄŒ (Oracle Cloudă‚Šă‚§ăƒ“ăƒŠăƒŒă‚·ăƒȘăƒŒă‚ș: 2021ćčŽ7...
 
Ganglia たUIにGrafanaă‚’èżœćŠ ă™ă‚‹è©±
Ganglia たUIにGrafanaă‚’èżœćŠ ă™ă‚‹è©±Ganglia たUIにGrafanaă‚’èżœćŠ ă™ă‚‹è©±
Ganglia たUIにGrafanaă‚’èżœćŠ ă™ă‚‹è©±
 
PostgreSQL 12は ここがă‚čă‚Žă‚€ïŒ 性胜æ”č斄やpluggable storage engineăȘă©ăźæ–°æ©Ÿèƒœă‚’ćŸčćș•è§ŁèȘŹïœž NTTăƒ‡ăƒŒă‚ż テクノ...
PostgreSQL 12は ここがă‚čă‚Žă‚€ïŒ 性胜æ”č斄やpluggable storage engineăȘă©ăźæ–°æ©Ÿèƒœă‚’ćŸčćș•è§ŁèȘŹïœž NTTăƒ‡ăƒŒă‚ż テクノ...PostgreSQL 12は ここがă‚čă‚Žă‚€ïŒ 性胜æ”č斄やpluggable storage engineăȘă©ăźæ–°æ©Ÿèƒœă‚’ćŸčćș•è§ŁèȘŹïœž NTTăƒ‡ăƒŒă‚ż テクノ...
PostgreSQL 12は ここがă‚čă‚Žă‚€ïŒ 性胜æ”č斄やpluggable storage engineăȘă©ăźæ–°æ©Ÿèƒœă‚’ćŸčćș•è§ŁèȘŹïœž NTTăƒ‡ăƒŒă‚ż テクノ...
 
RL4J ă§ć§‹ă‚ă‚‹æ·±ć±€ćŒ·ćŒ–ć­Šçż’
RL4J ă§ć§‹ă‚ă‚‹æ·±ć±€ćŒ·ćŒ–ć­Šçż’RL4J ă§ć§‹ă‚ă‚‹æ·±ć±€ćŒ·ćŒ–ć­Šçż’
RL4J ă§ć§‹ă‚ă‚‹æ·±ć±€ćŒ·ćŒ–ć­Šçż’
 
Best Practices of running PostgreSQL in Virtual Environments
Best Practices of running PostgreSQL in Virtual EnvironmentsBest Practices of running PostgreSQL in Virtual Environments
Best Practices of running PostgreSQL in Virtual Environments
 
PostgreSQL運甚知理慄門
PostgreSQL運甚知理慄門PostgreSQL運甚知理慄門
PostgreSQL運甚知理慄門
 
SEILはIIJăźă‚łă‚łă«äœżă‚ă‚ŒăŠă„ă‚‹
SEILはIIJăźă‚łă‚łă«äœżă‚ă‚ŒăŠă„ă‚‹SEILはIIJăźă‚łă‚łă«äœżă‚ă‚ŒăŠă„ă‚‹
SEILはIIJăźă‚łă‚łă«äœżă‚ă‚ŒăŠă„ă‚‹
 
NVIDIA cuQuantum SDK ă«ă‚ˆă‚‹é‡ć­ć›žè·Żă‚·ăƒŸăƒ„ăƒŹăƒŒă‚żăƒŒăźé«˜é€ŸćŒ–
NVIDIA cuQuantum SDK ă«ă‚ˆă‚‹é‡ć­ć›žè·Żă‚·ăƒŸăƒ„ăƒŹăƒŒă‚żăƒŒăźé«˜é€ŸćŒ–NVIDIA cuQuantum SDK ă«ă‚ˆă‚‹é‡ć­ć›žè·Żă‚·ăƒŸăƒ„ăƒŹăƒŒă‚żăƒŒăźé«˜é€ŸćŒ–
NVIDIA cuQuantum SDK ă«ă‚ˆă‚‹é‡ć­ć›žè·Żă‚·ăƒŸăƒ„ăƒŹăƒŒă‚żăƒŒăźé«˜é€ŸćŒ–
 
PostgreSQLレプăƒȘă‚±ăƒŒă‚·ăƒ§ăƒł10摹ćčŽïŒćŸčćș•çŽč介PostgreSQL Conference Japan 2019èŹ›æŒ”èł‡æ–™ïŒ‰
PostgreSQLレプăƒȘă‚±ăƒŒă‚·ăƒ§ăƒł10摹ćčŽïŒćŸčćș•çŽč介PostgreSQL Conference Japan 2019èŹ›æŒ”èł‡æ–™ïŒ‰PostgreSQLレプăƒȘă‚±ăƒŒă‚·ăƒ§ăƒł10摹ćčŽïŒćŸčćș•çŽč介PostgreSQL Conference Japan 2019èŹ›æŒ”èł‡æ–™ïŒ‰
PostgreSQLレプăƒȘă‚±ăƒŒă‚·ăƒ§ăƒł10摹ćčŽïŒćŸčćș•çŽč介PostgreSQL Conference Japan 2019èŹ›æŒ”èł‡æ–™ïŒ‰
 
PostgreSQLたăƒȘă‚«ăƒăƒȘè¶…ć…„é–€(もしくはWAL、CHECKPOINT、ă‚Șăƒłăƒ©ă‚€ăƒłăƒăƒƒă‚Żă‚ąăƒƒăƒ—ăźä»•ç”„ăż)
PostgreSQLたăƒȘă‚«ăƒăƒȘè¶…ć…„é–€(もしくはWAL、CHECKPOINT、ă‚Șăƒłăƒ©ă‚€ăƒłăƒăƒƒă‚Żă‚ąăƒƒăƒ—ăźä»•ç”„ăż)PostgreSQLたăƒȘă‚«ăƒăƒȘè¶…ć…„é–€(もしくはWAL、CHECKPOINT、ă‚Șăƒłăƒ©ă‚€ăƒłăƒăƒƒă‚Żă‚ąăƒƒăƒ—ăźä»•ç”„ăż)
PostgreSQLたăƒȘă‚«ăƒăƒȘè¶…ć…„é–€(もしくはWAL、CHECKPOINT、ă‚Șăƒłăƒ©ă‚€ăƒłăƒăƒƒă‚Żă‚ąăƒƒăƒ—ăźä»•ç”„ăż)
 
ă—ă€ă“ăXenずzfsă§äœœă‚‹ćź¶ćș­ć†…vdiă‚”ăƒŒăƒ2015ćčŽç‰ˆ
ă—ă€ă“ăXenずzfsă§äœœă‚‹ćź¶ćș­ć†…vdiă‚”ăƒŒăƒ2015ćčŽç‰ˆă—ă€ă“ăXenずzfsă§äœœă‚‹ćź¶ćș­ć†…vdiă‚”ăƒŒăƒ2015ćčŽç‰ˆ
ă—ă€ă“ăXenずzfsă§äœœă‚‹ćź¶ćș­ć†…vdiă‚”ăƒŒăƒ2015ćčŽç‰ˆ
 
[Cloud OnAir] GCP でèȘ°ă§ă‚‚ć§‹ă‚ă‚‰ă‚Œă‚‹ HPC 2019ćčŽ5月9æ—„ 攟送
[Cloud OnAir] GCP でèȘ°ă§ă‚‚ć§‹ă‚ă‚‰ă‚Œă‚‹ HPC 2019ćčŽ5月9æ—„ 攟送[Cloud OnAir] GCP でèȘ°ă§ă‚‚ć§‹ă‚ă‚‰ă‚Œă‚‹ HPC 2019ćčŽ5月9æ—„ 攟送
[Cloud OnAir] GCP でèȘ°ă§ă‚‚ć§‹ă‚ă‚‰ă‚Œă‚‹ HPC 2019ćčŽ5月9æ—„ 攟送
 
Db2 & Db2 Warehouse v11.5.4 æœ€æ–°æƒ…ć ±ă‚ąăƒƒăƒ—ăƒ‡ăƒŒăƒˆ2020ćčŽ8月25æ—„
Db2 & Db2 Warehouse v11.5.4 æœ€æ–°æƒ…ć ±ă‚ąăƒƒăƒ—ăƒ‡ăƒŒăƒˆ2020ćčŽ8月25æ—„Db2 & Db2 Warehouse v11.5.4 æœ€æ–°æƒ…ć ±ă‚ąăƒƒăƒ—ăƒ‡ăƒŒăƒˆ2020ćčŽ8月25æ—„
Db2 & Db2 Warehouse v11.5.4 æœ€æ–°æƒ…ć ±ă‚ąăƒƒăƒ—ăƒ‡ăƒŒăƒˆ2020ćčŽ8月25æ—„
 
Java11ăžăźăƒžă‚€ă‚°ăƒŹăƒŒă‚·ăƒ§ăƒłă‚Źă‚€ăƒ‰ ~Apache Hadoopたäș‹äŸ‹~
Java11ăžăźăƒžă‚€ă‚°ăƒŹăƒŒă‚·ăƒ§ăƒłă‚Źă‚€ăƒ‰ ~Apache Hadoopたäș‹äŸ‹~Java11ăžăźăƒžă‚€ă‚°ăƒŹăƒŒă‚·ăƒ§ăƒłă‚Źă‚€ăƒ‰ ~Apache Hadoopたäș‹äŸ‹~
Java11ăžăźăƒžă‚€ă‚°ăƒŹăƒŒă‚·ăƒ§ăƒłă‚Źă‚€ăƒ‰ ~Apache Hadoopたäș‹äŸ‹~
 

Semelhante a 20181016_pgconfeu_ssd2gpu_multi

Semelhante a 20181016_pgconfeu_ssd2gpu_multi (20)

20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
 
Stress your DUT
Stress your DUTStress your DUT
Stress your DUT
 

Mais de Kohei KaiGai

Mais de Kohei KaiGai (20)

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_Online
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.Japan
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStrom
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStrom
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw
 

Último

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Último (20)

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïžcall girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïžcall girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 

20181016_pgconfeu_ssd2gpu_multi

  • 1. NVMEandGPUacceleratesPostgreSQLbeyondthelimitation 〜Our challenge to the 10GB/s for big-data query execution〜 HeteroDB,Inc Chief Architect & CEO KaiGai Kohei <kaigai@heterodb.com>
  • 2. Today’s Topic: Unveil how this benchmark results were achieved? NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-2 Benchmark conditions:  By the PostgreSQL v11beta3 + PG-Strom v2.1devel on a single-node server system  13 queries of Star-schema benchmark onto the 1055GB data set 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3 Q3_1 Q3_2 Q3_3 Q3_4 Q4_1 Q4_2 Q4_3 QueryExecutionThroughput[MB/s] Star Schema Benchmark for PostgreSQL 11beta3 + PG-Strom v2.1devel PG-Strom v2.1devel max 13.5GB/s in query execution throughput on single-node PostgreSQL
  • 3. about HeteroDB NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-3 Corporate overview  Name HeteroDB,Inc  Established 4th-Jul-2017  Headcount 2 (KaiGai and Kashiwagi)  Location Shinagawa, Tokyo, Japan  Businesses Sales of accelerated database product Technical consulting on GPU&DB region By the heterogeneous-computing technology on the database area, we provides a useful, fast and cost-effective data analytics platform for all the people who need the power of analytics. CEO Profile  KaiGai Kohei – He has contributed for PostgreSQL and Linux kernel development in the OSS community more than ten years, especially, for security and database federation features of PostgreSQL.  Award of “Genius Programmer” by IPA MITOH program (2007)  The top-5 posters finalist at GPU Technology Conference 2017.
  • 4. Features of RDBMS ✓ High-availability / Clustering ✓ DB administration and backup ✓ Transaction control ✓ BI and visualization ➔ We can use the products that support PostgreSQL as-is. Core technology – PG-Strom PG-Strom: An extension module for PostgreSQL, to accelerate SQL workloads by the thousands cores and wide-band memory of GPU. GPU Big-data Analytics PG-Strom NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-4 Mass data loading from the storage device rapidly Machine-learning & Statistics
  • 5. GPU’s characteristics - mostly as a computing accelerator NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-5 Over 10years history in HPC, then massive popularization in Machine-Learning NVIDIA Tesla V100 Super Computer (TITEC; TSUBAME3.0) Computer Graphics Machine-Learning Today’s Topic How I/O workloads are accelerated by GPU that is a computing accelerator? Simulation
  • 6. NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-6 How PostgreSQL utilizes GPU? 〜Architecture of PG-Strom〜
  • 7. Interactions on PostgreSQL ïƒł PG-Strom Interactions NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-7 SQL Parser Query Optimizer Query Executor Storage Manager Database Files GPU Path constructor GPU Code generator GPU JIT compiler (NVRTC) GpuScan GpuJoin GpuPreAgg PL/CUDA nvme-strom Gstore_fdw CustomScan Interface (PgSQL v9.5~) PG-Strom
  • 8. Construction of query plan with CustomScan Scan t0 Scan t1 Join t0,t1 Statistics) nrows: 1.2M width: 80 Index: none candidate HashJoin cost=4000 candidate MergeJoin cost=12000 candidate NestLoop cost=99999 candidate Parallel Hash Join cost=3000 candidate GpuJoin cost=2500 WINNER! Built-in execution path of PostgreSQLProposition by extensions since PostgreSQL v9.5 since PostgreSQL v9.6 GpuJoin t0,t1 Statistics) nrows: 4000 width: 120 Index: t1.id Competition of multiple algorithms, then chosen by the “cost”. NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-8
  • 9. How CustomScan performs As long as consistent results are made, implementations are flexible. CustomScan (GpuJoin) (*BeginCustomScan)(...) (*ExecCustomScan)(...) (*EndCustomScan)(...) : SeqScan on t0 SeqScan on t1 GroupAgg key: cat ExecInitGpuJoin(...)  Initialize GPU context  Kick asynchronous JIT compilation of the GPU program auto-generated ExecGpuJoin(...)  Read records from the t0 and t1, and copy to the DMA buffer  Kick asynchronous GPU tasks  Fetch results from the completed GPU tasks, then pass them to the next step (GroupAgg) ExecEndGpuJoin(...)  Wait for completion of the asynchronous tasks (if any)  Release of GPU resource NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-9
  • 10. GPU code generation from SQL - Example of WHERE-clause NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-10 QUERY: SELECT cat, count(*), avg(x) FROM t0 WHERE x between y and y + 20.0 GROUP BY cat; : STATIC_FUNCTION(bool) gpupreagg_qual_eval(kern_context *kcxt, kern_data_store *kds, size_t kds_index) { pg_float8_t KPARAM_1 = pg_float8_param(kcxt,1); pg_float8_t KVAR_3 = pg_float8_vref(kds,kcxt,2,kds_index); pg_float8_t KVAR_4 = pg_float8_vref(kds,kcxt,3,kds_index); return EVAL((pgfn_float8ge(kcxt, KVAR_3, KVAR_4) && pgfn_float8le(kcxt, KVAR_3, pgfn_float8pl(kcxt, KVAR_4, KPARAM_1)))); } : E.g) Transformation of the numeric-formula in WHERE-clause to CUDA C code on demand Reference to input data SQL expression in CUDA source code Run-time compiler Parallel Execution
  • 11. EXPLAIN shows query execution plan NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-11 postgres=# EXPLAIN ANALYZE SELECT cat,count(*),sum(ax) FROM tbl NATURAL JOIN t1 WHERE cid % 100 < 50 GROUP BY cat; QUERY PLAN --------------------------------------------------------------------------------------------------- GroupAggregate (cost=203498.81..203501.80 rows=26 width=20) (actual time=1511.622..1511.632 rows=26 loops=1) Group Key: tbl.cat -> Sort (cost=203498.81..203499.26 rows=182 width=20) (actual time=1511.612..1511.613 rows=26 loops=1) Sort Key: tbl.cat Sort Method: quicksort Memory: 27kB -> Custom Scan (GpuPreAgg) (cost=203489.25..203491.98 rows=182 width=20) (actual time=1511.554..1511.562 rows=26 loops=1) Reduction: Local Combined GpuJoin: enabled -> Custom Scan (GpuJoin) on tbl (cost=13455.86..220069.26 rows=1797115 width=12) (never executed) Outer Scan: tbl (cost=12729.55..264113.41 rows=6665208 width=8) (actual time=50.726..1101.414 rows=19995540 loops=1) Outer Scan Filter: ((cid % 100) < 50) Rows Removed by Outer Scan Filter: 10047462 Depth 1: GpuHashJoin (plan nrows: 6665208...1797115, actual nrows: 9948078...2473997) HashKeys: tbl.aid JoinQuals: (tbl.aid = t1.aid) KDS-Hash (size plan: 11.54MB, exec: 7125.12KB) -> Seq Scan on t1 (cost=0.00..2031.00 rows=100000 width=12) (actual time=0.016..15.407 rows=100000 loops=1) Planning Time: 0.721 ms Execution Time: 1595.815 ms (19 rows) What’s happen?
  • 12. GpuScan + GpuJoin + GpuPreAgg Combined Kernel (1/3) Aggregation GROUP BY JOIN SCAN SELECT cat, count(*), avg(x) FROM t0 JOIN t1 ON t0.id = t1.id WHERE y like ‘%abc%’ GROUP BY cat; count(*), avg(x) GROUP BY cat t0 JOIN t1 ON t0.id = t1.id WHERE y like ‘%abc%’ results NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-12 GpuScan GpuJoin Agg + GpuPreAgg SeqScan HashJoin Agg
  • 13. GpuScan + GpuJoin + GpuPreAgg Combined Kernel (2/3) GpuScan kernel GpuJoin kernel GpuPreAgg kernel DMA Buffer GPU CPU Storage Simple replacement of the logics makes ping-pong of data-transfer between CPU and GPU DMA Buffer DMA Buffer NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-13 DMA Buffer Agg (PostgreSQL) results
  • 14. GpuScan + GpuJoin + GpuPreAgg Combined Kernel (3/3) GpuScan kernel GpuJoin kernel GpuPreAgg kernel DMA Buffer GPU CPU Storage Save the data-transfer by data exchange on the GPU device memory GPU Buffer GPU Buffer results NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-14 DMA Buffer Agg (PostgreSQL) Combined GPU kernel for SCAN + JOIN + GROUP BY data size = Large data size = Small Usually, amount of data size to be written back from GPU is much smaller than the data size sent to GPU
  • 15. EXPLAIN shows query execution plan (again) NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-15 postgres=# EXPLAIN ANALYZE SELECT cat,count(*),sum(ax) FROM tbl NATURAL JOIN t1 WHERE cid % 100 < 50 GROUP BY cat; QUERY PLAN --------------------------------------------------------------------------------------------------- GroupAggregate (cost=203498.81..203501.80 rows=26 width=20) (actual time=1511.622..1511.632 rows=26 loops=1) Group Key: tbl.cat -> Sort (cost=203498.81..203499.26 rows=182 width=20) (actual time=1511.612..1511.613 rows=26 loops=1) Sort Key: tbl.cat Sort Method: quicksort Memory: 27kB -> Custom Scan (GpuPreAgg) (cost=203489.25..203491.98 rows=182 width=20) (actual time=1511.554..1511.562 rows=26 loops=1) Reduction: Local Combined GpuJoin: enabled -> Custom Scan (GpuJoin) on tbl (cost=13455.86..220069.26 rows=1797115 width=12) (never executed) Outer Scan: tbl (cost=12729.55..264113.41 rows=6665208 width=8) (actual time=50.726..1101.414 rows=19995540 loops=1) Outer Scan Filter: ((cid % 100) < 50) Rows Removed by Outer Scan Filter: 10047462 Depth 1: GpuHashJoin (plan nrows: 6665208...1797115, actual nrows: 9948078...2473997) HashKeys: tbl.aid JoinQuals: (tbl.aid = t1.aid) KDS-Hash (size plan: 11.54MB, exec: 7125.12KB) -> Seq Scan on t1 (cost=0.00..2031.00 rows=100000 width=12) (actual time=0.016..15.407 rows=100000 loops=1) Planning Time: 0.721 ms Execution Time: 1595.815 ms (19 rows) Combined GPU kernel processes this portion
  • 16. NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-16 Re-definition of the GPU’s role 〜How GPU accelerates I/O workloads〜
  • 17. A usual composition of x86_64 server GPUSSD CPU RAM HDD N/W NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-17
  • 18. Data flow to process a massive amount of data CPU RAM SSD GPU PCIe PostgreSQL Data Blocks Normal Data Flow All the records, including junks, must be loaded onto RAM once, because software cannot check necessity of the rows prior to the data loading. So, amount of the I/O traffic over PCIe bus tends to be large. NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-18 Unless records are not loaded to CPU/RAM once, over the PCIe bus, software cannot check its necessity even if it is “junk”.
  • 19. Core Feature: SSD-to-GPU Direct SQL CPU RAM SSD GPU PCIe PostgreSQL Data Blocks NVIDIA GPUDirect RDMA It allows to load the data blocks on NVME-SSD to GPU using peer-to-peer DMA over PCIe-bus; bypassing CPU/RAM. WHERE-clause JOIN GROUP BY Run SQL by GPU to reduce the data size Data Size: Small NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-19 v2.0
  • 20. Element technology - GPUDirect RDMA (1/2) ▌P2P data transfer technology between GPU and other PCIe devices, bypass CPU  Originally designed for multi-nodes MPI over Infiniband  Infrastructure of Linux kernel driver for other PCIe devices, including NVME-SSDs. Copyright (c) NVIDIA corporation, 2015 NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-20
  • 21. Element technology - GPUDirect RDMA (2/2) Physical address space PCIe BAR1 Area GPU device memory RAM NVMe-SSD Infiniband HBA PCIe device GPUDirect RDMA It enables to map GPU device memory on physical address space of the host system Once “physical address of GPU device memory” appears, we can use is as source or destination address of DMA with PCIe devices. NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-21 0xf0000000 0xe0000000 DMA Request SRC: 1200th sector LEN: 40 sectors DST: 0xe0200000
  • 22. Benchmark Results – single-node version NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-22 2172.3 2159.6 2158.9 2086.0 2127.2 2104.3 1920.3 2023.4 2101.1 2126.9 1900.0 1960.3 2072.1 6149.4 6279.3 6282.5 5985.6 6055.3 6152.5 5479.3 6051.2 6061.5 6074.2 5813.7 5871.8 5800.1 0 1000 2000 3000 4000 5000 6000 7000 Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3 Q3_1 Q3_2 Q3_3 Q3_4 Q4_1 Q4_2 Q4_3 QueryProcessingThroughput[MB/sec] Star Schema Benchmark on NVMe-SSD + md-raid0 PgSQL9.6(SSDx3) PGStrom2.0(SSDx3) H/W Spec (3xSSD) SSD-to-GPU Direct SQL pulls out an awesome performance close to the H/W spec  Measurement by the Star Schema Benchmark; which is a set of typical batch / reporting workloads.  CPU: Intel Xeon E5-2650v4, RAM: 128GB, GPU: NVIDIA Tesla P40, SSD: Intel 750 (400GB; SeqRead 2.2GB/s)x3  Size of dataset is 353GB (sf: 401), to ensure I/O bounds workload
  • 23. SSD-to-GPU Direct SQL - Software Stack NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-23 Filesystem (ext4, xfs) nvme_strom kernel module NVMe SSD NVIDIA Tesla GPU PostgreSQL pg_strom extension read(2) ioctl(2) Hardware Layer Operating System Software Layer Database Software Layer blk-mq nvme pcie nvme rdma Network HBA File-offset to NVME-SSD sector number translation NVMEoF Target (JBOF) NVMe Request ■ Other software component ■ Our developed software ■ Hardware NVME over Fabric nvme_strom v2.0 supports NVME-over-Fabric (RDMA).
  • 24. uninitialized File BLK-100: unknown BLK-101: unknown BLK-102: unknown BLK-103: unknown BLK-104: unknown BLK-105: unknown BLK-106: unknown BLK-107: unknown BLK-108: unknown BLK-109: unknown Consistency with disk buffers of PostgreSQL / Linux kernel NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-24 BLK-100: uncached BLK-101: cached by PG BLK-102: uncached BLK-103: uncached BLK-104: cached by OS BLK-105: cached by PG BLK-106: uncached BLK-107: uncached BLK-108: cached by OS BLK-109: uncached BLCKSZ (=8KB) Transfer Size Per Request BLCKSZ * NChunks BLK-108: cached by OS BLK-104: cached by OS BLK-105: cached by PG BLK-101: cached by PG BLK-100: uncached BLK-102: uncached BLK-103: uncached BLK-106: uncached BLK-107: uncached BLK-109: uncached unused SSD-to-GPU P2P DMA Userspace DMA Buffer (RAM) Device Memory (GPU) CUDA API (userspace) cuMemcpyHtoDAsync ① PG-Strom checks PG’s shared buffer of the blocks to read ✓ If block is already cached on the shared buffer of PostgreSQL. ✓ If block may not be all-visible using visibility-map. ② NVME-Strom checks OS’s page cache of the blocks to read ✓ If block is already cached on the page cache of Linux kernel. ✓ But, on fast-SSD mode, page cache shall not be copied unless it is not diry. ⑱ Then, kicks SSD-to-GPU P2P DMA on the uncached blocks BLK-108: cached by OS BLK-104: cached by OS BLK-105: cached by PG BLK-101: cached by PG
  • 25. 40min queries in the former version, now 30sec Case Study: Log management and analytics solution NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-25 Database server with GPU+NVME-SSD Search by SQL Security incident ‱ Fast try & error ‱ Familiar interface ‱ Search on row-data Aug 12 18:01:01 saba systemd: Started Session 21060 of user root. Understanding the situation of damage Identifying the impact Reporting to the manager
  • 26. Run faster, beyond the limitation
  • 27. Approach① – Faster NVME-SSD (1/2) Intel DC P4600 (2.0TB, HHHL) SeqRead: 3200MB/s, SeqWrite: 1575MB/s RandRead: 610k IOPS, RandWrite: 196k IOPS Interface: PCIe 3.0 (x4) NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-27 May I pull out the maximum performance of them?
  • 28. Approach① – Faster NVME-SSD (1/2) Broadwell-EP is capable up to 7.1GB/s for P2P DMA routing performance. NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-28
  • 29. Approach② – The latest CPU generation NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-29 Supermicro 1019GP-TT CPU: Xeon Gold 6126T (2.6GHz, 12C) RAM: 192GB (32GB DDR4-2666 x6) GPU: NVIDIA Tesla P40 (3840C, 24GB) x1 SSD: Intel SSD DC P4600 (2.0TB, HHHL) x3 HDD: 2.0TB (SATA, 72krpm) x6 N/W: 10Gb ethernet x2ports
  • 30. Approach② – The latest CPU generation NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-30 Skylake-SP improved the P2P DMA routing performance to 8.5GB/s. GPU SSD-1 SSD-2 SSD-3 md-raid0 Xeon Gold 6126T routing by CPU
  • 31. Consideration for the hardware configuration (1/3) NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-31 ① SSD and GPU are connected to the same PCIe-switch OK ② CPU controls PCIe-bus, and SSD and GPU are directly connected to the same CPU Workable ⑱ SSD and GPU are connected to the different CPUs Not Supported CPU CPU PLX SSD GPU PCIe-switch CPU CPU SSD GPU CPU CPU SSD GPU QPI A pair of SSD and GPU must be under a particular CPU or PLX(PCIe-switch). PLX is more preferable than CPU. https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#supported-systems
  • 32. Consideration for the hardware configuration (2/3) NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-32 PCIe-switch can make CPU more relaxed. CPU CPU PLX SSD GPU PLX SSD GPU PLX SSD GPU PLX SSD GPU SCAN SCAN SCAN SCAN JOIN JOIN JOIN JOIN GROUP BY GROUP BY GROUP BY GROUP BY Very small amount of data GATHER GATHER
  • 33. Hardware with PCIe switch (1/2) - HPC Server NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-33  Pros ✓ Legacy but stable hardware  Cons ✓ Less flexibility for re-configuration, or hardware upgrade Supermicro SYS-4029TRT2 x96 lane PCIe switch x96 lane PCIe switch CPU2 CPU1 QPI Gen3 x16 Gen3 x16 for each slot Gen3 x16 for each slot Gen3 x16
  • 34. Hardware with PCIe switch (2/2) - I/O expansion box NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-34  Pros ✓ Flexibility of re-configuration and hardware upgrade  Cons ✓ Special hardware is required (but no extra drivers) NEC ExpEther 40G (4slots) 4 slots of PCIe Gen3 x8 PCIe Swich 40Gb Ethernet Network switch CPU NIC extra I/O boxes
  • 35. Consideration for the hardware configuration (3/3) NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-35 Most of the SQL workloads are executed inside of the I/O box CPU CPU PLX SSD GPU PLX SSD GPU PLX SSD GPU PLX SSD GPU SCAN SCAN SCAN SCAN JOIN JOIN JOIN JOIN GROUP BY GROUP BY GROUP BY GROUP BY GATHER GATHER I/O BoxI/O BoxI/O BoxI/O Box
  • 36. NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-36 Optimization of the Storage Path with I/O Expansion Box
  • 37. System configuration with I/O expansion boxes NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-37 PCIe I/O Expansion Box Host System (x86_64 server) NVMe SSD PostgreSQL Tables PostgreSQL Data Blocks Internal PCIe Switch SSD-to-GPU P2P DMA (Large data size) GPU WHERE-clause JOIN GROUP BY PCIe over Ethernet Pre-processed small data A few GB/s SQL execution performance per box A few GB/s SQL execution performance per box A few GB/s SQL execution performance per box NIC / HBA Simplified DB operations and APP development by the simple single-node PostgreSQL configuration Enhancement of capacity & performance Visible as leafs of partitioned child-tables on PostgreSQL v2.1
  • 38. Hardware optimized table partition layout NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-38 lineorder lineorder_p0 lineorder_p1 lineorder_p2 reminder=0 reminder=1 reminder=2 customer date supplier parts tablespace: nvme0 tablespace: nvme1 tablespace: nvme2 key INSERT Hashed key hash = f(key) hash % 3 = 2 hash % 3 = 0 Raw data 1053GB Partial data 351GB Partial data 351GB Partial data 351GB v2.1 Partition leaf for each I/O expansion box on behalf of the tablespaces
  • 39. Partition-wise GpuJoin/GpuPreAgg1/3 NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-39 lineorder lineorder_p0 lineorder_p1 lineorder_p2 reminder=0 reminder=1 reminder=2 customer date supplier parts tablespace: nvme0 tablespace: nvme1 tablespace: nvme2 New in PostgreSQL v11: Parallel scan of the partition leafs Scan Scan Scan Gather Join Agg Query Results Scan Massive records makes hard to gather v2.1
  • 40. Partition-wise GpuJoin/GpuPreAgg2/3 NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-40 lineorder lineorder_p0 lineorder_p1 lineorder_p2 reminder=0 reminder=1 reminder=2 customer date supplier parts tablespace: nvme0 tablespace: nvme1 tablespace: nvme2 Preferable: Gathering the partition-leafs after JOIN / GROUP BY Join Gather Agg Query Results Scan Scan PreAgg Join Scan PreAgg Join Scan PreAgg v2.1
  • 41. Partition-wise GpuJoin/GpuPreAgg3/3 NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-41 ssbm =# EXPLAIN SELECT sum(lo_extendedprice*lo_discount) as revenue FROM lineorder,date1 WHERE lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount between 1 and 3 AND lo_quantity < 25; QUERY PLAN ------------------------------------------------------------------------------ Aggregate -> Gather Workers Planned: 9 -> Parallel Append -> Parallel Custom Scan (GpuPreAgg) Reduction: NoGroup Combined GpuJoin: enabled GPU Preference: GPU2 (Tesla P40) -> Parallel Custom Scan (GpuJoin) on lineorder_p2 Outer Scan: lineorder_p2 Outer Scan Filter: ((lo_discount >= '1'::numeric) AND (lo_discount <= '3'::numeric) AND (lo_quantity < '25'::numeric)) Depth 1: GpuHashJoin (nrows 102760469...45490403) HashKeys: lineorder_p2.lo_orderdate JoinQuals: (lineorder_p2.lo_orderdate = date1.d_datekey) KDS-Hash (size: 66.03KB) GPU Preference: GPU2 (Tesla P40) NVMe-Strom: enabled -> Seq Scan on date1 Filter: (d_year = 1993) -> Parallel Custom Scan (GpuPreAgg) Reduction: NoGroup Combined GpuJoin: enabled GPU Preference: GPU1 (Tesla P40) : Portion to be executed on the 3rd I/O expansion box. v2.1
  • 42. Benchmark (1/3) - System configuration NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-42 x 1 x 3 x 6 x 3 NEC Express5800/R120h-2m CPU: Intel Xeon E5-2603 v4 (6C, 1.7GHz) RAM: 64GB OS: Red Hat Enterprise Linux 7 (kernel: 3.10.0-862.9.1.el7.x86_64) CUDA-9.2.148 + driver 396.44 DB: PostgreSQL 11beta3 + PG-Strom v2.1devel NEC ExpEther 40G (4slots) I/F: PCIe 3.0 x8 (x16 physical) ... 4slots with internal PCIe switch N/W: 40Gb-ethernet Intel DC P4600 (2.0TB; HHHL) SeqRead: 3200MB/s, SeqWrite: 1575MB/s RandRead: 610k IOPS, RandWrite: 196k IOPS I/F: PCIe 3.0 x4 NVIDIA Tesla P40 # of cores: 3840 (1.3GHz) Device RAM: 24GB (347GB/s, GDDR5) CC: 6.1 (Pascal, GP104) I/F: PCIe 3.0 x16 SPECIAL THANKS FOR v2.1
  • 43. Benchmark (2/3) - Result of query execution performance NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-43  13 SSBM queries to 1055GB database in total (a.k.a 351GB per I/O expansion box)  Raw I/O data transfer without SQL execution was up to 9GB/s. In other words, SQL execution was faster than simple storage read with raw-I/O. 2,388 2,477 2,493 2,502 2,739 2,831 1,865 2,268 2,442 2,418 1,789 1,848 2,202 13,401 13,534 13,536 13,330 12,696 12,965 12,533 11,498 12,312 12,419 12,414 12,622 12,594 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3 Q3_1 Q3_2 Q3_3 Q3_4 Q4_1 Q4_2 Q4_3 QueryExecutionThroughput[MB/s] Star Schema Benchmark for PgSQL v11beta3 / PG-Strom v2.1devel on NEC ExpEther x3 PostgreSQL v11beta3 PG-Strom v2.1devel Raw I/O Limitation max 13.5GB/s for query execution performance with 3x I/O expansion boxes!! v2.1
  • 44. Benchmark (3/3) - Density of I/O per expansion box NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-44 0 2000 4000 6000 8000 10000 12000 14000 16000 nvme0n1 nvme1n1 nvme2n1 nvme3n1 nvme4n1 nvme5n1 I/O workload balances over the I/O expansion boxes, more scaling are expected  On SQL execution, raw-I/O performance was 5000〜5100MB/s per expansion box, and 2600MB/s per NVME-SSD.  Overall performance was balanced, so we can expect performance scaling if more expansion boxes. ➔ 4.5GB/s x8 = 36GB/s is expected if 8 expansion box configuration; close to the commercial DWH solutions. v2.1
  • 45. PostgreSQL is a successor of XXXX? NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-45 13.5GB/s with 3x I/O boxes 30GB/s with 8x I/O boxes is EXPECTED Single-node PostgreSQL now offers DWH-appliance grade performance
  • 46. NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-46 Conclusion
  • 47. Expected usage – Log data processing and analysis NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-47 As a data management, analytics and machine-learning platform for log data daily growing up Manufacturing Logistics Mobile Home electronics GPU + NVME-SSD Why PG-Strom?  It covers around 30-50TB data with single node by addition of I/O expansion box.  It allows to process the raw log data as is, more than max performance of H/W.  Users can continue to use the familiar SQL statement and applications.
  • 48. Summary NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-48  PG-Strom An extension module for PostgreSQL, to accelerate SQL execution by GPU. It pulls out maximum potential of hardware to summarize and analyze large data more than terabytes class.  Core feature: SSD-to-GPU Direct SQL It directly transfers the data blocks on NVME-SSD to GPU by P2P DMA, and runs SQL workloads on GPU prior to data loading onto the host system. By reduction of the data to be processed, it improves the performance of I/O bound jobs.  Multiple GPU/SSD configuration with I/O expansion box To avoid saturation of CPU which performs PCIe root complex, it exchanges P2P DMA packets close to the storage device by I/O expansion box that mounts PCIe switch. Not only mitigation of CPU loads, but also allows enhancement of database capacity and performance on demand. We measured 13.5GB/s in SQL execution by 3x expansion box. More performance is expected according to the investment of hardware.  Expected use scenario Database system which stores massive logs, including M2M. Simpleness of operations by single-node PostgreSQL, and continuity of the skill-set by the familiar SQL statement and applications. Adoption targets: small〜middle Hadoop clusters, or entry-class DWH solutions.
  • 49. Enhancement (1/2) - NVME-over-Fabric and GPU RDMA support ready Host System SSD SSD SSD SSD SSD SSD HBA SSD SSD SSD SSD SSD SSD HBA SSD SSD SSD SSD SSD SSD HBA CPU HBA InfiniBand or 100Gb-Ethernet JBOF JBOF JBOF More NVME-SSDs than what we can install on single-node with GPUs GPU-node Storage-node SSD-to-GPU Direct SQL over 100Gb Ethernet with RDMA protocol NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-49
  • 50. Enhancement (2/2) – FDW for Parquet with SSD-to-GPU Direct WIP Special optimization for Parquet file as a columnar format on SSD-to-GPU Direct. It allows data exchange on IoT grade system with no data import. Foreign Scan GpuJoin GpuPreAgg Agg Parquet File Storage I/O SSD→CPU CPU → GPU GPU → GPU GPU → CPU Data Format Conversion (CPU) Most of the workloads here GpuJoin on Parquet Files GpuPreAgg Agg Storage I/O SSD→GPU Point-1 Smaller data loading from the storage Point-2 Optimal structure for GPU’s memory subsystem Current behavior Support of Parquet File Parquet File Combined GPU kernel for Parquet Format GPU → GPU GPU → CPU Parquet data format NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-50
  • 51. Resources NVME and GPU accelerates PostgreSQL beyond the limitation -PGconf.EU 2018-51 ▌Repository https://github.com/heterodb/pg-strom ▌Documentation http://heterodb.github.io/pg-strom/ ▌Software Distribution https://heterodb.github.io/swdc/ ▌Contact ML: pgstrom@heterodb.com e-mail: kaigai@heterodb.com twitter: @kkaigai