S3332 peter bakkum

A GPU Database
Architecture

GTC 2013
Tuesday, March 19
Peter Bakkum

How do you…
Use an interface you already know?

Accelerate things you’re already
doing?

SQL on the GPU

Peter Bakkum
@pbbakkum

Research at NEC Labs
Algorithmic Trading at ITG
Engineer at Hyperpublic
Data Infrastructure at Groupon

Virginian
Developed at NEC Labs

SQL SELECT scans

Choose execution on the CPU or the
GPU with CUDA

Common data format

SQL database

API

SQL parser/compiler

Common query format

CPU VM GPU VM

Common data format

How do we…
Decompose SQL queries into discrete
units?

Interface with the virtual machine?

Fit this into both the CPU and the GPU?

Opcodes

Opcodes
Simple operations
Control query flow
Use the context of the virtual machine
Access VM registers
Type agnostic
Abstract away parallelism
Execute within a kernel, not as a kernel

Opcode example

Add 2, 1, 0, null
source
operation
register 2

destination source
register register 1

reg[2] = reg[1] + reg[0]

Opcode program
select col0 from test where (col0 = 2)
!
0: Table 0 0 0 0 !
1: ResultColumn 0 0 0 col0 !
2: Parallel 0 0 9 0 !
3: Column 1 0 0 0 !
4: Integer 0 2 0 0 !
5: Eq 1 0 7 1 !
6: Invalid 0 0 0 0 !
7: Result 1 1 0 0 !
8: Converge 0 0 0 0 !
9: Finish 0 0 0 0 !

How do we…
Store our data?
Handle fixed AND variable widths?
Ensure efficiency on both platforms?
Move to a new memory space?

Tablets

Tablets
Meta Information
Split the data into
Key / Key Pointer
workable chunks

Use column major Column-major fixed
form to exploit both width data

CPU cache locality
and GPU
coalescing Variable width data

Tablets
Meta Information
Use relative
Key / Key Pointer
pointers so variable
width data can be
moved
Column-major fixed
width data

Flexible in terms of
fixed vs variable
width areas Variable width data

Tablet management
Data tablets streamed or mapped into GPU memory
Result tablets streamed or mapped off
Effectively circumvent GPU memory limitation

GPU Memory

Main Memory

… Disk

Sequential memory copies
Entire data and results tablets must be moved
Can use streaming to overlap

Time Data Transfer Time Query Execution Results Transfer Time
GPU Registers

Global memory reads and writes

GPU Global Memory

Data Transfer Results Transfer

Main Memory

Mapped memory
Coalesced reads efficiently transfer data as it is needed
Results buffered in GPU memory, then moved in chunks

Time Query Execution

GPU Registers

Results buffered in global memory

GPU Global Memory

Constant data/results transfer

Main Memory

Mapped Memory Performance
0.12

0.1

0.08
Running Time(s)

Results Transfer
0.06 Kernel Execution
Data Transfer

0.04 Mapped Kernel

0.02

0
Serial Mapped Cached

Query Suite Running Times
0.16

0.14

0.12
Running Time (s)

0.1

0.08 Multi-Core
Mapped GPU
0.06
Cached GPU
0.04

0.02

0
1 2 3 4 5 6 7 8 9 10
Query

Rows Processed vs. Running Time
0.12

0.1

0.08
Running Time (s)

0.06 Multi
Mapped
0.04 Cached

0.02

0
0 1 2 3 4 5 6 7 8
Data Rows (M)

Future work
Some database workloads are a good
candidate for GPUs

Implementing an entire database on
both platforms is difficult

Future hardware changes will shift the
balance towards GPUs

Questions

Peter Bakkum
pbb7c@virginia.edu
@pbbakkum

github.com/bakks/virginian

S3332 peter bakkum

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a S3332 peter bakkum

Semelhante a S3332 peter bakkum (20)

S3332 peter bakkum