4. Virginian
Developed at NEC Labs
SQL SELECT scans
Choose execution on the CPU or the
GPU with CUDA
Common data format
5. SQL database
API
SQL parser/compiler
Common query format
CPU VM GPU VM
Common data format
6. How do we…
Decompose SQL queries into discrete
units?
Interface with the virtual machine?
Fit this into both the CPU and the GPU?
Opcodes
7. Opcodes
Simple operations
Control query flow
Use the context of the virtual machine
Access VM registers
Type agnostic
Abstract away parallelism
Execute within a kernel, not as a kernel
10. How do we…
Store our data?
Handle fixed AND variable widths?
Ensure efficiency on both platforms?
Move to a new memory space?
Tablets
11. Tablets
Meta Information
Split the data into
Key / Key Pointer
workable chunks
Use column major Column-major fixed
form to exploit both width data
CPU cache locality
and GPU
coalescing Variable width data
12. Tablets
Meta Information
Use relative
Key / Key Pointer
pointers so variable
width data can be
moved
Column-major fixed
width data
Flexible in terms of
fixed vs variable
width areas Variable width data
13. Tablet management
Data tablets streamed or mapped into GPU memory
Result tablets streamed or mapped off
Effectively circumvent GPU memory limitation
GPU Memory
Main Memory
… Disk
14. Sequential memory copies
Entire data and results tablets must be moved
Can use streaming to overlap
Time Data Transfer Time Query Execution Results Transfer Time
GPU Registers
Global memory reads and writes
GPU Global Memory
Data Transfer Results Transfer
Main Memory
15. Mapped memory
Coalesced reads efficiently transfer data as it is needed
Results buffered in GPU memory, then moved in chunks
Time Query Execution
GPU Registers
Results buffered in global memory
GPU Global Memory
Constant data/results transfer
Main Memory
16. Mapped Memory Performance
0.12
0.1
0.08
Running Time(s)
Results Transfer
0.06 Kernel Execution
Data Transfer
0.04 Mapped Kernel
0.02
0
Serial Mapped Cached
17. Query Suite Running Times
0.16
0.14
0.12
Running Time (s)
0.1
0.08 Multi-Core
Mapped GPU
0.06
Cached GPU
0.04
0.02
0
1 2 3 4 5 6 7 8 9 10
Query
18. Rows Processed vs. Running Time
0.12
0.1
0.08
Running Time (s)
0.06 Multi
Mapped
0.04 Cached
0.02
0
0 1 2 3 4 5 6 7 8
Data Rows (M)
19. Future work
Some database workloads are a good
candidate for GPUs
Implementing an entire database on
both platforms is difficult
Future hardware changes will shift the
balance towards GPUs