Presentation WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by Leo Meyerovich and Matthew Torok at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
8. Browser Engine ~= Chart Engine!
DSLs
render
layout
selectors
parse
Exploit Parallelism in Each One
9. Deploy Today via Parallel JavaScript
data
styling
widgets
Parser.js
GPU
Compiler
Data
stays
on
GPU!
Selectors.CL
Layout.CL
JavaScript
VM
data
viz
Renderer.GL
webpage
Parser
Selectors
HTML
data
CSS
styling
JS
script
Layout
Renderer
Pixels
superconductor.js
9
10. DSL 1: Data via JSON
JavaScript, Ruby, Python,
Java, …
Easy… until 1-10s data
loading
10
17. Step
1/2:
Schema
of
VisualizaYon
Tree class hierarchy
Node attributes
17
18. [Kastens
1980,
Saraiva
2003]
[WWW
2010,
Step 2/2: Schema AttributePPOPP
2013]
Constraints
1. Local
10px
5px
inputs
vars
2.
Single-‐assignment
HBox
Leaf
y
w
h
x
Leaf
y
w
h
x
y
w
h
x
HBox
HBox ! left=IBox right=IBox
w := left.w + right.w
…
Root
y
w
h
x
Leaf
y
w
h
x
Leaf
y
w
h
x
18
19. llel
ara
P
[WWW
2010]
Compiler Output: Layout as Tree Traversals
logical
joins
w,h
w,h
w,h
Leaf
x,y
…
logical
spawns
w,h
w,h
w,h
Parallelism in each traversal!
Mozilla, Microsoft
1. Works for all data sets
2. Compiler automatically parallelizes!
19
20. DSL 4: Rendering as a Layout Extension
HBox ! left=IBox right=IBox
@render @Rectangle(x,y,w,h,color)
…
w := left.w + right.w
…
21. [Blelloch
93]
Traversals: Flattened & Level-Synchronous
y
x
wh
Array per
attribute
level
1
Nodes in
arrays
parallel for loop
(level synchronous)
level
n
Tree
Compiler automates code + data
transformations.
21
22. Problem: Dynamic Memory Allocation on GPU?
rect(…); …
oval(…)
square(…)
function circ(x,y,r) {
buffer = new
line(…); … Array(r*10)
for (i = 0; i < r * 10; i+
circ(…)
+)
buffer[i] =
dynamic
Math.cos(i) allocation"
rect(…); …}
1.0 0.8 0.5 0.2 0
0.2
22
23. Dynamic Allocation as SIMD Traversals
allocRect(…)! 7
fillRect(…)
allocLine(…)! 6
allocCirc(…)à
4
allocRect(…)! 6
1.0 0.8 0.5 0.2 0
0.2
1.0 0.8 0.5
0.2
1. Prefix sum for needed
space
2. Allocate buffers
fillLine(…)
fillCirc(…)
fillRect(…)
1.0 0.8 0.5 0.2 0
0.2
3. Fill vertex buffers in
parallel
4. Give OpenGL buffers
pointer
23
24. CPU vs. GPU for Election Treemap:
5 traversals over 100K nodes
Naïve JS (Chrome 26)
10,000
GPU (Safari + WebCL 11/3)
COMBINED: 54X !
Time (ms)
1,000
100
24fps
WebCL:
5X
WebCL:
31X
10
1
layout (4 passes) rendering pass
TOTAL
24
26. Superconductor
• Explore data with interactive visualization
• Script charts like web pages: DSLs!
• Hardware accelerate each DSL
• We use WebCL:
GPGPU, keeps data on GPU, dynamic
compilation