SlideShare a Scribd company logo
1 of 25
Download to read offline
ACCELERATING	
  AND	
  EVALUATING	
  
OPENCL	
  GRAPH	
  APPLICATIONS	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  SHUAI	
  CHE	
  ,	
  BRAD	
  BECKMANN,	
  STEVE	
  REINHARDT	
  AND	
  	
  KEVIN	
  SKADRON	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
AGENDA	
  

Background	
  and	
  Graph	
  Applica8ons	
  

Panno8a	
  OpenCL™	
  Graph	
  Applica8ons

	
  	
  

Performance	
  Evalua8on	
  and	
  Discussion	
  

2	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
GRAPH	
  APPLICATIONS	
  
!  Intelligence	
  
‒ Business	
  analy8cs,	
  security	
  and	
  scien8fic	
  discovery	
  	
  

!  Social	
  networks	
  
‒ Facebook,	
  TwiVer,	
  LinkedIn,	
  Weibo,	
  etc.	
  

!  Life	
  science	
  and	
  healthcare	
  
‒ Disease	
  and	
  drug	
  research,	
  life	
  system	
  research	
  

!  Infrastructure	
  
‒ Transporta8on,	
  power	
  grid,	
  energy	
  and	
  water	
  supply	
  

!  Scien8fic	
  and	
  engineering	
  simula8ons	
  
	
  
	
  
3	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
GRAPH	
  APPLICATIONS	
  
!  Low	
  arithme8c	
  intensity	
  and	
  data	
  reuse	
  
!  Not	
  floa8ng-­‐point	
  intensive	
  
!  Branch	
  divergence	
  
‒ Part	
  of	
  threads	
  in	
  a	
  wavefront	
  are	
  ac8ve	
  	
  

!  Memory	
  divergence	
  
‒ Data	
  distributed	
  in	
  different	
  regions	
  of	
  memory	
  
‒ A	
  challenge	
  to	
  op8mize	
  data	
  layouts	
  and	
  memory	
  accesses	
  

!  Load	
  imbalance	
  	
  
‒ Uneven	
  work	
  distribu8on	
  across	
  different	
  threads	
  
‒ Short-­‐running	
  threads	
  wait	
  for	
  long-­‐running	
  threads	
  

!  Parallelism	
  
‒ Changing	
  degree	
  of	
  parallelism	
  across	
  itera8ons	
  
‒ Underu8liza8on	
  of	
  compute	
  units	
  for	
  certain	
  phases	
  
4	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
PANNOTIA	
  
!  A	
  graph	
  applica8on	
  suite	
  for	
  GPGPU	
  
!  Eight	
  diverse	
  graph	
  algorithms,	
  e.g.,	
  shortest	
  path,	
  graph	
  par88oning,	
  web	
  analysis	
  and	
  
social	
  network	
  
!  Implemented	
  in	
  C	
  +	
  OpenCL™	
  	
  
!  Some	
  are	
  OpenCL	
  implementa8ons	
  based	
  on	
  algorithms	
  of	
  prior	
  work	
  	
  
!  Ini8al	
  implementa8on	
  is	
  for	
  a	
  single	
  GPU	
  node	
  
!  Further	
  algorithmic	
  and	
  hardware-­‐specific	
  op8miza8ons	
  are	
  in	
  progress	
  
!  Details	
  of	
  Panno8a	
  can	
  be	
  found	
  in	
  our	
  paper	
  published	
  in	
  2013	
  IEEE	
  Interna8onal	
  
Symposium	
  on	
  Workload	
  Characteriza8on	
  

5	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
PANNOTIA	
  
Applica7ons	
  

Domains	
  

Single-­‐Source	
  Shortest	
  Path	
  

Shortest	
  Path	
  

Connected	
  Component	
  Labeling	
  

Graph	
  Clustering	
  

Graph	
  Coloring	
  

Graph	
  Par88oning	
  

Floyd-­‐Warshall	
  

Shortest	
  Path	
  

Maximal	
  Independent	
  Set	
  

Graph	
  Par88oning	
  

Betweeness	
  Centrality	
  

Social	
  Network	
  

Friend	
  Recommenda8on	
  

Social	
  Network	
  

Page	
  Rank	
  

Web	
  Analysis	
  

6	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
GRAPH	
  INPUT	
  AND	
  DATA	
  STRUCTURE	
  
!  Real-­‐world	
  graphs	
  
‒ The	
  University	
  of	
  Florida	
  Sparse	
  Matrix	
  Collec8on	
  
‒ The	
  9th	
  	
  DIMACS	
  Implementa8on	
  Challenges	
  
‒ The10th	
  DIMACS	
  Implementa8on	
  Challenges	
  
!  	
  Synthe8c	
  graphs	
  

‒ 	
  Random-­‐graph	
  generator	
  from	
  Georgia	
  Tech	
  

!  	
  Graph	
  input	
  formats	
  
‒ 	
  Coordinate	
  Format	
  
‒ 	
  METIS	
  
‒ 	
  Matrix	
  Market	
  
!  	
  Data	
  structure	
  representa8on	
  
‒ 	
  CSR,	
  COO,	
  ELL	
  …	
  
‒ 	
  2D	
  adjacency	
  matrix	
  	
  
7	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
SINGLE	
  SOURCE	
  SHORTEST	
  PATH	
  	
  
!  Finds	
  the	
  path	
  with	
  the	
  shortest	
  path	
  between	
  the	
  source	
  node	
  and	
  all	
  the	
  other	
  nodes	
  
in	
  the	
  graph	
  
Vid	
  	
  	
  	
  Dist	
  
7	
  

0	
  

15	
  
2	
  

13	
  

6	
  

23	
  

18	
  

4	
  

1	
  
5	
  

8	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

0	
  

1	
  

3	
  

2	
  

1	
  

3	
  

8	
  

4	
  

16	
  

5	
  

2	
  

1	
  

0	
  

8	
  

3	
  

19	
  

6	
  

16	
  
CONNECTED	
  COMPONENT	
  LABELING	
  
!  Detect	
  connected	
  regions	
  in	
  graphs	
  and	
  images	
  
!  Connected	
  components	
  are	
  the	
  nodes	
  in	
  a	
  graph	
  that	
  point	
  to	
  the	
  same	
  root	
  

q	
  
p	
  
s	
  
r	
  

9	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
GRAPH	
  COLORING	
  
!  Assign	
  colors	
  (integers)	
  to	
  ver8ces	
  with	
  no	
  two	
  adjacent	
  ver8ces	
  with	
  the	
  same	
  color	
  	
  

10	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
FLOYD-­‐WARSHALL	
  
!  Solves	
  the	
  all-­‐pairs	
  shortest	
  path	
  (APSP)	
  problem	
  –	
  finding	
  the	
  shortest	
  path	
  from	
  every	
  
possible	
  source	
  to	
  every	
  possible	
  des8na8on	
  
!  	
  A	
  dynamic	
  programming	
  approach	
  
	
  

	
  	
  	
  	
  	
  	
  	
  	
  	
  shortestPath(i,	
  j,	
  k)	
  returns	
  the	
  shortest	
  path	
  from	
  i	
  to	
  j	
  with	
  ver8ces	
  from	
  {1,2,...,k}	
  

11	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
MAXIMAL	
  INDEPENDENT	
  SET	
  
!  Independent	
  set:	
  	
  no	
  two	
  ver8ces	
  are	
  neighbors	
  
!  Maximal	
  Independent	
  set:	
  impossible	
  to	
  add	
  another	
  vertex	
  to	
  s8ll	
  keep	
  independent	
  
	
  
0	
  
2	
  

3	
  

5	
  

4	
  

1	
  

6	
  

7	
  

S	
  =	
  {0,	
  4,	
  6}	
  is	
  an	
  Maximal	
  Independent	
  Set	
  	
  

12	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
BETWEENNESS	
  CENTRALITY	
  
!  Centrality	
  determines	
  the	
  rela8ve	
  importance	
  of	
  a	
  vertex	
  within	
  the	
  graph	
  (e.g.	
  degree,	
  
betweenness,	
  closeness	
  …)	
  
!  Betweenness	
  Centrality	
  quan8fies	
  the	
  number	
  of	
  8mes	
  a	
  node	
  acts	
  as	
  a	
  bridge	
  along	
  
the	
  shortest	
  path	
  between	
  two	
  other	
  nodes.	
  

σ st (v)
BC (v) = ∑
s ≠ v ≠ t σ st
σ st
σ st (v)

no.	
  of	
  shortest	
  paths	
  between	
  nodes	
  s	
  and	
  t	
  
no.	
  of	
  shortest	
  paths	
  between	
  nodes	
  s	
  and	
  t	
  passing	
  through	
  v	
  

13	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
FRIEND	
  RECOMMENDATION	
  
!  	
  Recommend	
  friend	
  connec8ons	
  –	
  a	
  common	
  feature	
  in	
  social	
  websites	
  
!  	
  A	
  simple	
  Map-­‐Reduce	
  like	
  algorithm	
  
	
  “Andy” =	
  	
  [	
  “Brad”,	
  “Derek”,	
  “Shuai”,	
  …]	
  
	
  	
  Andy	
  !	
  	
  	
  <“Brad”,	
  “Derek”,	
  “Andy”>	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <“Brad”,	
  “Shuai”,	
  “Andy”>	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <“Derek”,	
  “Brad”,	
  “Andy”>	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <“Derek”,	
  “Shuai”,	
  “Andy”>	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <“Shuai”,	
  “Derek”,	
  “Andy”>	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <“Shuai”,	
  “Brad”,	
  “Andy”>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Andy	
  recommends	
  Brad	
  to	
  Shuai	
  

14	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
PAGERANK	
  
! 

15	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
PERFORMANCE	
  BENEFITS	
  
!  Speedups	
  are	
  up	
  to	
  11x	
  (an	
  AMD	
  “Tahi8”	
  discrete	
  GPU	
  v.s.	
  4	
  CPU	
  cores	
  on	
  A8)	
  
!  PCI-­‐E	
  overhead	
  is	
  included	
  
!  Performance	
  benefits	
  depend	
  on	
  graph	
  input	
  datasets	
  
15	
  

Parallel Speedup

	
  

10	
  

5	
  

0	
  

16	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
EXECUTION	
  TIME	
  BREAKDOWN	
  (D-­‐GPU)	
  
!  The	
  por8on	
  of	
  GPU	
  execu8on:	
  8%	
  -­‐	
  99%	
  
!  Some	
  further	
  GPU	
  offload	
  can	
  be	
  done	
  (e.g.	
  FRD	
  and	
  MIS)	
  
	
  

17	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
PARALLELISM	
  (ACTIVE	
  VERTICES	
  OVER	
  TIME)	
  
Single-­‐Source	
  Shortest	
  Path	
  (Road	
  Network	
  -­‐	
  NY)	
  

120000	
  

0	
  

400000	
  

0	
  

Time	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Graph	
  Coloring	
  (G3	
  Circuit)	
  

Time	
  

18	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
LOAD	
  IMBALANCE	
  (DEGREE	
  DISTRIBUTION)	
  
Single-­‐Source	
  Shortest	
  Path	
  (Road	
  Network)	
  

1	
  

2	
  

3	
  

4	
  

5	
  

6	
  

100%	
  

0%	
  

Time	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Graph	
  Coloring	
  (G3	
  Circuit)	
  

1	
  

2	
  

3	
  

4	
  

100%	
  

0%	
  

Time	
  

19	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

5	
  

7	
  

8	
  
HIERARCHICAL	
  CLUSTERING	
  
!  	
  Different	
  program-­‐input	
  pairs	
  may	
  have	
  vastly	
  different	
  characteris8cs!	
  
BC-­‐2k	
  
BC-­‐1k	
  
MIS-­‐US-­‐NW	
  
PRK-­‐2k	
  
CLR-­‐G3-­‐circuit	
  
CLR-­‐ecology	
  
MIS-­‐ecology	
  
FW-­‐512-­‐64k	
  
FW-­‐256-­‐16k	
  
CCL-­‐lena	
  
CCL-­‐deposit	
  
DJK-­‐US-­‐NW	
  
DJK-­‐US-­‐CA	
  
MIS-­‐shell	
  
CLR-­‐shell	
  
PRK-­‐flicker	
  
FRD-­‐coAuthor	
  

0.0	
  
20	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

4.6	
  
L2	
  HIT	
  RATE	
  OVER	
  TIME	
  (SSSP)	
  
!  The	
  cache	
  hit	
  rate	
  first	
  improves,	
  then	
  degrades,	
  improves	
  again	
  and	
  finally	
  degrades	
  
with	
  some	
  fluctua8ons	
  in	
  the	
  middle	
  
60	
  

Hit	
  Rate	
  

50	
  
40	
  
30	
  
20	
  
10	
  
0	
  

Time	
  
21	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
ARCHITECTURAL	
  IMPLICATIONS	
  (SCALAR	
  UNIT)	
  
Scalar	
  

SIMD	
  
SIMD	
  

Scalar	
  

SIMD	
  
Time	
  

	
  A	
  

	
  B	
  

22	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

Graph	
  Traversal	
  
ARCHITECTURAL	
  IMPLICATIONS	
  
!  Possibly	
  include	
  narrower	
  SIMD	
  units	
  or	
  heterogeneous	
  SIMD	
  units	
  	
  
Scalar	
  

Narrow	
  SIMD	
  

Wide	
  SIMD	
  

	
  
!  Resource	
  management	
  and	
  scheduling	
  
‒ Switch	
  the	
  task	
  between	
  the	
  CPU	
  and	
  the	
  GPU	
  based	
  on	
  parallelism	
  
‒ Use	
  only	
  “enough”	
  SIMD	
  engines	
  and	
  save	
  power	
  	
  
CPU	
  	
  

120000	
  

0	
  

GPU	
  	
  

GPU	
  	
  

Time	
  

	
  	
  	
  	
  A	
  

23	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

B	
  	
  
CONCLUSION	
  AND	
  FUTURE	
  WORK	
  
!  Graph	
  applica8ons	
  are	
  an	
  emerging	
  workload	
  domain	
  
!  Panno8a	
  is	
  a	
  first-­‐step	
  aVempt	
  to	
  evaluate	
  diverse	
  graph	
  building	
  blocks	
  on	
  GPUs	
  
	
  

Next-­‐Step	
  Goals:	
  
!  Add	
  more	
  applica8ons	
  (e.g.	
  matching,	
  spanning	
  tree,	
  flow)	
  	
  
!  Op8mize	
  Panno8a	
  applica8ons	
  
!  Extend	
  to	
  mul8ple	
  GPU	
  nodes	
  and	
  across	
  CPU	
  and	
  GPU	
  

24	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  
DISCLAIMER	
  &	
  ATTRIBUTION	
  
The	
  informa8on	
  presented	
  in	
  this	
  document	
  is	
  for	
  informa8onal	
  purposes	
  only	
  and	
  may	
  contain	
  technical	
  inaccuracies,	
  omissions	
  and	
  typographical	
  errors.	
  
	
  
The	
  informa8on	
  contained	
  herein	
  is	
  subject	
  to	
  change	
  and	
  may	
  be	
  rendered	
  inaccurate	
  for	
  many	
  reasons,	
  including	
  but	
  not	
  limited	
  to	
  product	
  and	
  roadmap	
  
changes,	
  component	
  and	
  motherboard	
  version	
  changes,	
  new	
  model	
  and/or	
  product	
  releases,	
  product	
  differences	
  between	
  differing	
  manufacturers,	
  so{ware	
  
changes,	
  BIOS	
  flashes,	
  firmware	
  upgrades,	
  or	
  the	
  like.	
  AMD	
  assumes	
  no	
  obliga8on	
  to	
  update	
  or	
  otherwise	
  correct	
  or	
  revise	
  this	
  informa8on.	
  However,	
  AMD	
  
reserves	
  the	
  right	
  to	
  revise	
  this	
  informa8on	
  and	
  to	
  make	
  changes	
  from	
  8me	
  to	
  8me	
  to	
  the	
  content	
  hereof	
  without	
  obliga8on	
  of	
  AMD	
  to	
  no8fy	
  any	
  person	
  of	
  
such	
  revisions	
  or	
  changes.	
  
	
  
AMD	
  MAKES	
  NO	
  REPRESENTATIONS	
  OR	
  WARRANTIES	
  WITH	
  RESPECT	
  TO	
  THE	
  CONTENTS	
  HEREOF	
  AND	
  ASSUMES	
  NO	
  RESPONSIBILITY	
  FOR	
  ANY	
  
INACCURACIES,	
  ERRORS	
  OR	
  OMISSIONS	
  THAT	
  MAY	
  APPEAR	
  IN	
  THIS	
  INFORMATION.	
  
	
  
AMD	
  SPECIFICALLY	
  DISCLAIMS	
  ANY	
  IMPLIED	
  WARRANTIES	
  OF	
  MERCHANTABILITY	
  OR	
  FITNESS	
  FOR	
  ANY	
  PARTICULAR	
  PURPOSE.	
  IN	
  NO	
  EVENT	
  WILL	
  AMD	
  BE	
  
LIABLE	
  TO	
  ANY	
  PERSON	
  FOR	
  ANY	
  DIRECT,	
  INDIRECT,	
  SPECIAL	
  OR	
  OTHER	
  CONSEQUENTIAL	
  DAMAGES	
  ARISING	
  FROM	
  THE	
  USE	
  OF	
  ANY	
  INFORMATION	
  
CONTAINED	
  HEREIN,	
  EVEN	
  IF	
  AMD	
  IS	
  EXPRESSLY	
  ADVISED	
  OF	
  THE	
  POSSIBILITY	
  OF	
  SUCH	
  DAMAGES.	
  
	
  
ATTRIBUTION	
  
©	
  2013	
  Advanced	
  Micro	
  Devices,	
  Inc.	
  All	
  rights	
  reserved.	
  AMD,	
  the	
  AMD	
  Arrow	
  logo	
  and	
  combina8ons	
  thereof	
  are	
  trademarks	
  of	
  Advanced	
  Micro	
  Devices,	
  
Inc.	
  in	
  the	
  United	
  States	
  and/or	
  other	
  jurisdic8ons.	
  	
  OpenCL	
  	
  is	
  a	
  registered	
  trademark	
  of	
  Apple	
  Inc.	
  Other	
  names	
  are	
  for	
  informa8onal	
  purposes	
  only	
  and	
  
may	
  be	
  trademarks	
  of	
  their	
  respec8ve	
  owners.	
  

25	
   |	
  	
  	
  Accelera8ng	
  and	
  Evalua8ng	
  OpenCL	
  Graph	
  Applica8ons|	
  	
  	
  November	
  20,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

More Related Content

What's hot

"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...Edge AI and Vision Alliance
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systemsinside-BigData.com
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Junli Gu
 
HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...
HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...
HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...AMD Developer Central
 
Gpu Compute
Gpu ComputeGpu Compute
Gpu Computejworth
 
"How to Test and Validate an Automated Driving System," a Presentation from M...
"How to Test and Validate an Automated Driving System," a Presentation from M..."How to Test and Validate an Automated Driving System," a Presentation from M...
"How to Test and Validate an Automated Driving System," a Presentation from M...Edge AI and Vision Alliance
 
"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius
"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius
"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from MovidiusEdge AI and Vision Alliance
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalJunli Gu
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Yangqing Jia at AI Frontiers: Towards Better DL Frameworks
Yangqing Jia at AI Frontiers: Towards Better DL FrameworksYangqing Jia at AI Frontiers: Towards Better DL Frameworks
Yangqing Jia at AI Frontiers: Towards Better DL FrameworksAI Frontiers
 
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...Edge AI and Vision Alliance
 
GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說NVIDIA Taiwan
 
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ..."Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...Edge AI and Vision Alliance
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from IntelEdge AI and Vision Alliance
 
GPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next FrontierGPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next FrontierNVIDIA
 
GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用NVIDIA Taiwan
 
"Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t...
"Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t..."Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t...
"Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t...Edge AI and Vision Alliance
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.orgEdge AI and Vision Alliance
 

What's hot (20)

"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systems
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
 
HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...
HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...
HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...
 
Gpu Compute
Gpu ComputeGpu Compute
Gpu Compute
 
"How to Test and Validate an Automated Driving System," a Presentation from M...
"How to Test and Validate an Automated Driving System," a Presentation from M..."How to Test and Validate an Automated Driving System," a Presentation from M...
"How to Test and Validate an Automated Driving System," a Presentation from M...
 
"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius
"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius
"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Yangqing Jia at AI Frontiers: Towards Better DL Frameworks
Yangqing Jia at AI Frontiers: Towards Better DL FrameworksYangqing Jia at AI Frontiers: Towards Better DL Frameworks
Yangqing Jia at AI Frontiers: Towards Better DL Frameworks
 
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
 
GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說GTC Taiwan 2017 主題演說
GTC Taiwan 2017 主題演說
 
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ..."Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
 
GPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next FrontierGPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next Frontier
 
GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用GTC Taiwan 2017 企業端深度學習與人工智慧應用
GTC Taiwan 2017 企業端深度學習與人工智慧應用
 
"Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t...
"Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t..."Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t...
"Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org
 

Viewers also liked

Contact Sheet
Contact SheetContact Sheet
Contact Sheetruqaiyahk
 
[HKDUG] #20161210 - BarCamp Hong Kong 2016 - What's News in PHP?
[HKDUG] #20161210 - BarCamp Hong Kong 2016 - What's News in PHP?[HKDUG] #20161210 - BarCamp Hong Kong 2016 - What's News in PHP?
[HKDUG] #20161210 - BarCamp Hong Kong 2016 - What's News in PHP?Wong Hoi Sing Edison
 
INFOGRAFÍA FLIPPED CLASSROOM HORIZONTAL DEXWAY ENGLISH
INFOGRAFÍA FLIPPED CLASSROOM HORIZONTAL DEXWAY ENGLISHINFOGRAFÍA FLIPPED CLASSROOM HORIZONTAL DEXWAY ENGLISH
INFOGRAFÍA FLIPPED CLASSROOM HORIZONTAL DEXWAY ENGLISHElsa Martín
 
Aislar intervalos probados en un pozo
Aislar intervalos probados en un pozoAislar intervalos probados en un pozo
Aislar intervalos probados en un pozokashama51
 
Statement of-additional-information-rmf2
Statement of-additional-information-rmf2Statement of-additional-information-rmf2
Statement of-additional-information-rmf2mrnandan486
 
La fórmula de los ingresos permanentes solo comprando (parte 1)
La fórmula de los ingresos permanentes solo comprando (parte 1)La fórmula de los ingresos permanentes solo comprando (parte 1)
La fórmula de los ingresos permanentes solo comprando (parte 1)Jesús García Arcos
 
Hundertwasser
HundertwasserHundertwasser
Hundertwasseradam eva
 
Zaindu Profesionales de la salud contra la violencia de género .pdf
Zaindu Profesionales de la salud contra la violencia de género .pdfZaindu Profesionales de la salud contra la violencia de género .pdf
Zaindu Profesionales de la salud contra la violencia de género .pdfIrekia - EJGV
 
“Factores claves de SEO local y algún truquito no tan White”- Isabel Romero ...
“Factores claves de SEO local y algún truquito no tan White”-  Isabel Romero ...“Factores claves de SEO local y algún truquito no tan White”-  Isabel Romero ...
“Factores claves de SEO local y algún truquito no tan White”- Isabel Romero ...Webpositer
 
Revista Natura C 14 09
Revista Natura C 14 09Revista Natura C 14 09
Revista Natura C 14 09Teresa Cossio
 
Que Son Los Blogs Y Webblogs
Que   Son Los Blogs Y WebblogsQue   Son Los Blogs Y Webblogs
Que Son Los Blogs Y WebblogsHAFHENTAY
 
Virtual%20 Organization%203
Virtual%20 Organization%203Virtual%20 Organization%203
Virtual%20 Organization%203bjnjovic
 
Wellness center grand opening
Wellness center grand openingWellness center grand opening
Wellness center grand openingmegankjohns
 
2012 programa análisis económico
2012 programa análisis económico2012 programa análisis económico
2012 programa análisis económicoEnrique Zavala Espino
 
Ajedrez para todos henschel, g - 1973, by moctezuma, ed jparra ocr
Ajedrez para todos   henschel, g - 1973, by moctezuma, ed jparra ocrAjedrez para todos   henschel, g - 1973, by moctezuma, ed jparra ocr
Ajedrez para todos henschel, g - 1973, by moctezuma, ed jparra ocrtintan_1981
 

Viewers also liked (20)

MYCVDAHALAN
MYCVDAHALANMYCVDAHALAN
MYCVDAHALAN
 
Contact Sheet
Contact SheetContact Sheet
Contact Sheet
 
[HKDUG] #20161210 - BarCamp Hong Kong 2016 - What's News in PHP?
[HKDUG] #20161210 - BarCamp Hong Kong 2016 - What's News in PHP?[HKDUG] #20161210 - BarCamp Hong Kong 2016 - What's News in PHP?
[HKDUG] #20161210 - BarCamp Hong Kong 2016 - What's News in PHP?
 
INFOGRAFÍA FLIPPED CLASSROOM HORIZONTAL DEXWAY ENGLISH
INFOGRAFÍA FLIPPED CLASSROOM HORIZONTAL DEXWAY ENGLISHINFOGRAFÍA FLIPPED CLASSROOM HORIZONTAL DEXWAY ENGLISH
INFOGRAFÍA FLIPPED CLASSROOM HORIZONTAL DEXWAY ENGLISH
 
Aislar intervalos probados en un pozo
Aislar intervalos probados en un pozoAislar intervalos probados en un pozo
Aislar intervalos probados en un pozo
 
Statement of-additional-information-rmf2
Statement of-additional-information-rmf2Statement of-additional-information-rmf2
Statement of-additional-information-rmf2
 
Reglement fr
Reglement frReglement fr
Reglement fr
 
La fórmula de los ingresos permanentes solo comprando (parte 1)
La fórmula de los ingresos permanentes solo comprando (parte 1)La fórmula de los ingresos permanentes solo comprando (parte 1)
La fórmula de los ingresos permanentes solo comprando (parte 1)
 
Knoll office
Knoll officeKnoll office
Knoll office
 
Hundertwasser
HundertwasserHundertwasser
Hundertwasser
 
Plateamiento Del problema
Plateamiento Del  problemaPlateamiento Del  problema
Plateamiento Del problema
 
Zaindu Profesionales de la salud contra la violencia de género .pdf
Zaindu Profesionales de la salud contra la violencia de género .pdfZaindu Profesionales de la salud contra la violencia de género .pdf
Zaindu Profesionales de la salud contra la violencia de género .pdf
 
“Factores claves de SEO local y algún truquito no tan White”- Isabel Romero ...
“Factores claves de SEO local y algún truquito no tan White”-  Isabel Romero ...“Factores claves de SEO local y algún truquito no tan White”-  Isabel Romero ...
“Factores claves de SEO local y algún truquito no tan White”- Isabel Romero ...
 
Revista Natura C 14 09
Revista Natura C 14 09Revista Natura C 14 09
Revista Natura C 14 09
 
Que Son Los Blogs Y Webblogs
Que   Son Los Blogs Y WebblogsQue   Son Los Blogs Y Webblogs
Que Son Los Blogs Y Webblogs
 
Virtual%20 Organization%203
Virtual%20 Organization%203Virtual%20 Organization%203
Virtual%20 Organization%203
 
buraj al arab by saniah saleem rao
buraj al arab by saniah saleem raoburaj al arab by saniah saleem rao
buraj al arab by saniah saleem rao
 
Wellness center grand opening
Wellness center grand openingWellness center grand opening
Wellness center grand opening
 
2012 programa análisis económico
2012 programa análisis económico2012 programa análisis económico
2012 programa análisis económico
 
Ajedrez para todos henschel, g - 1973, by moctezuma, ed jparra ocr
Ajedrez para todos   henschel, g - 1973, by moctezuma, ed jparra ocrAjedrez para todos   henschel, g - 1973, by moctezuma, ed jparra ocr
Ajedrez para todos henschel, g - 1973, by moctezuma, ed jparra ocr
 

Similar to PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che, Bradford Bechmann, Steve Reinhardt and Kevin Skadron

ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...Subhajit Sahu
 
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed GraphUsing Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed GraphRui Wang
 
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...Ahsan Javed Awan
 
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceA simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceIRJET Journal
 
OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 
Design and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip RouterDesign and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip RouterIRJET Journal
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to MahoutTed Dunning
 
Introduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGIntroduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGMapR Technologies
 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...ijcsit
 
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 FPGA-based soft-processors: 6G nodes and post-quantum security in space FPGA-based soft-processors: 6G nodes and post-quantum security in space
FPGA-based soft-processors: 6G nodes and post-quantum security in spaceFacultad de Informática UCM
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...TigerGraph
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraph-TA
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptxOpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptxOpenACC
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
Personal Research Overview presented at the KU-NAIST Research Meeting
Personal Research Overview presented at the KU-NAIST Research MeetingPersonal Research Overview presented at the KU-NAIST Research Meeting
Personal Research Overview presented at the KU-NAIST Research MeetingChawanat Nakasan
 

Similar to PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che, Bradford Bechmann, Steve Reinhardt and Kevin Skadron (20)

ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
 
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed GraphUsing Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
 
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
 
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceA simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
 
OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
Design and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip RouterDesign and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip Router
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to Mahout
 
Introduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGIntroduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUG
 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
 
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 FPGA-based soft-processors: 6G nodes and post-quantum security in space FPGA-based soft-processors: 6G nodes and post-quantum security in space
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platforms
 
Cadancesimulation
CadancesimulationCadancesimulation
Cadancesimulation
 
Portfolio
PortfolioPortfolio
Portfolio
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptxOpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Personal Research Overview presented at the KU-NAIST Research Meeting
Personal Research Overview presented at the KU-NAIST Research MeetingPersonal Research Overview presented at the KU-NAIST Research Meeting
Personal Research Overview presented at the KU-NAIST Research Meeting
 

More from AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 

More from AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che, Bradford Bechmann, Steve Reinhardt and Kevin Skadron

  • 1. ACCELERATING  AND  EVALUATING   OPENCL  GRAPH  APPLICATIONS                                                                                                                                              SHUAI  CHE  ,  BRAD  BECKMANN,  STEVE  REINHARDT  AND    KEVIN  SKADRON                                                                
  • 2. AGENDA   Background  and  Graph  Applica8ons   Panno8a  OpenCL™  Graph  Applica8ons     Performance  Evalua8on  and  Discussion   2   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 3. GRAPH  APPLICATIONS   !  Intelligence   ‒ Business  analy8cs,  security  and  scien8fic  discovery     !  Social  networks   ‒ Facebook,  TwiVer,  LinkedIn,  Weibo,  etc.   !  Life  science  and  healthcare   ‒ Disease  and  drug  research,  life  system  research   !  Infrastructure   ‒ Transporta8on,  power  grid,  energy  and  water  supply   !  Scien8fic  and  engineering  simula8ons       3   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 4. GRAPH  APPLICATIONS   !  Low  arithme8c  intensity  and  data  reuse   !  Not  floa8ng-­‐point  intensive   !  Branch  divergence   ‒ Part  of  threads  in  a  wavefront  are  ac8ve     !  Memory  divergence   ‒ Data  distributed  in  different  regions  of  memory   ‒ A  challenge  to  op8mize  data  layouts  and  memory  accesses   !  Load  imbalance     ‒ Uneven  work  distribu8on  across  different  threads   ‒ Short-­‐running  threads  wait  for  long-­‐running  threads   !  Parallelism   ‒ Changing  degree  of  parallelism  across  itera8ons   ‒ Underu8liza8on  of  compute  units  for  certain  phases   4   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 5. PANNOTIA   !  A  graph  applica8on  suite  for  GPGPU   !  Eight  diverse  graph  algorithms,  e.g.,  shortest  path,  graph  par88oning,  web  analysis  and   social  network   !  Implemented  in  C  +  OpenCL™     !  Some  are  OpenCL  implementa8ons  based  on  algorithms  of  prior  work     !  Ini8al  implementa8on  is  for  a  single  GPU  node   !  Further  algorithmic  and  hardware-­‐specific  op8miza8ons  are  in  progress   !  Details  of  Panno8a  can  be  found  in  our  paper  published  in  2013  IEEE  Interna8onal   Symposium  on  Workload  Characteriza8on   5   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 6. PANNOTIA   Applica7ons   Domains   Single-­‐Source  Shortest  Path   Shortest  Path   Connected  Component  Labeling   Graph  Clustering   Graph  Coloring   Graph  Par88oning   Floyd-­‐Warshall   Shortest  Path   Maximal  Independent  Set   Graph  Par88oning   Betweeness  Centrality   Social  Network   Friend  Recommenda8on   Social  Network   Page  Rank   Web  Analysis   6   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 7. GRAPH  INPUT  AND  DATA  STRUCTURE   !  Real-­‐world  graphs   ‒ The  University  of  Florida  Sparse  Matrix  Collec8on   ‒ The  9th    DIMACS  Implementa8on  Challenges   ‒ The10th  DIMACS  Implementa8on  Challenges   !   Synthe8c  graphs   ‒   Random-­‐graph  generator  from  Georgia  Tech   !   Graph  input  formats   ‒   Coordinate  Format   ‒   METIS   ‒   Matrix  Market   !   Data  structure  representa8on   ‒   CSR,  COO,  ELL  …   ‒   2D  adjacency  matrix     7   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 8. SINGLE  SOURCE  SHORTEST  PATH     !  Finds  the  path  with  the  shortest  path  between  the  source  node  and  all  the  other  nodes   in  the  graph   Vid        Dist   7   0   15   2   13   6   23   18   4   1   5   8   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL   0   1   3   2   1   3   8   4   16   5   2   1   0   8   3   19   6   16  
  • 9. CONNECTED  COMPONENT  LABELING   !  Detect  connected  regions  in  graphs  and  images   !  Connected  components  are  the  nodes  in  a  graph  that  point  to  the  same  root   q   p   s   r   9   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 10. GRAPH  COLORING   !  Assign  colors  (integers)  to  ver8ces  with  no  two  adjacent  ver8ces  with  the  same  color     10   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 11. FLOYD-­‐WARSHALL   !  Solves  the  all-­‐pairs  shortest  path  (APSP)  problem  –  finding  the  shortest  path  from  every   possible  source  to  every  possible  des8na8on   !   A  dynamic  programming  approach                      shortestPath(i,  j,  k)  returns  the  shortest  path  from  i  to  j  with  ver8ces  from  {1,2,...,k}   11   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 12. MAXIMAL  INDEPENDENT  SET   !  Independent  set:    no  two  ver8ces  are  neighbors   !  Maximal  Independent  set:  impossible  to  add  another  vertex  to  s8ll  keep  independent     0   2   3   5   4   1   6   7   S  =  {0,  4,  6}  is  an  Maximal  Independent  Set     12   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 13. BETWEENNESS  CENTRALITY   !  Centrality  determines  the  rela8ve  importance  of  a  vertex  within  the  graph  (e.g.  degree,   betweenness,  closeness  …)   !  Betweenness  Centrality  quan8fies  the  number  of  8mes  a  node  acts  as  a  bridge  along   the  shortest  path  between  two  other  nodes.   σ st (v) BC (v) = ∑ s ≠ v ≠ t σ st σ st σ st (v) no.  of  shortest  paths  between  nodes  s  and  t   no.  of  shortest  paths  between  nodes  s  and  t  passing  through  v   13   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 14. FRIEND  RECOMMENDATION   !   Recommend  friend  connec8ons  –  a  common  feature  in  social  websites   !   A  simple  Map-­‐Reduce  like  algorithm    “Andy” =    [  “Brad”,  “Derek”,  “Shuai”,  …]      Andy  !      <“Brad”,  “Derek”,  “Andy”>                            <“Brad”,  “Shuai”,  “Andy”>                            <“Derek”,  “Brad”,  “Andy”>                                          <“Derek”,  “Shuai”,  “Andy”>                            <“Shuai”,  “Derek”,  “Andy”>                                          <“Shuai”,  “Brad”,  “Andy”>                                                                              Andy  recommends  Brad  to  Shuai   14   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 15. PAGERANK   !  15   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 16. PERFORMANCE  BENEFITS   !  Speedups  are  up  to  11x  (an  AMD  “Tahi8”  discrete  GPU  v.s.  4  CPU  cores  on  A8)   !  PCI-­‐E  overhead  is  included   !  Performance  benefits  depend  on  graph  input  datasets   15   Parallel Speedup   10   5   0   16   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 17. EXECUTION  TIME  BREAKDOWN  (D-­‐GPU)   !  The  por8on  of  GPU  execu8on:  8%  -­‐  99%   !  Some  further  GPU  offload  can  be  done  (e.g.  FRD  and  MIS)     17   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 18. PARALLELISM  (ACTIVE  VERTICES  OVER  TIME)   Single-­‐Source  Shortest  Path  (Road  Network  -­‐  NY)   120000   0   400000   0   Time                                    Graph  Coloring  (G3  Circuit)   Time   18   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 19. LOAD  IMBALANCE  (DEGREE  DISTRIBUTION)   Single-­‐Source  Shortest  Path  (Road  Network)   1   2   3   4   5   6   100%   0%   Time                                      Graph  Coloring  (G3  Circuit)   1   2   3   4   100%   0%   Time   19   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL   5   7   8  
  • 20. HIERARCHICAL  CLUSTERING   !   Different  program-­‐input  pairs  may  have  vastly  different  characteris8cs!   BC-­‐2k   BC-­‐1k   MIS-­‐US-­‐NW   PRK-­‐2k   CLR-­‐G3-­‐circuit   CLR-­‐ecology   MIS-­‐ecology   FW-­‐512-­‐64k   FW-­‐256-­‐16k   CCL-­‐lena   CCL-­‐deposit   DJK-­‐US-­‐NW   DJK-­‐US-­‐CA   MIS-­‐shell   CLR-­‐shell   PRK-­‐flicker   FRD-­‐coAuthor   0.0   20   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL   4.6  
  • 21. L2  HIT  RATE  OVER  TIME  (SSSP)   !  The  cache  hit  rate  first  improves,  then  degrades,  improves  again  and  finally  degrades   with  some  fluctua8ons  in  the  middle   60   Hit  Rate   50   40   30   20   10   0   Time   21   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 22. ARCHITECTURAL  IMPLICATIONS  (SCALAR  UNIT)   Scalar   SIMD   SIMD   Scalar   SIMD   Time    A    B   22   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL   Graph  Traversal  
  • 23. ARCHITECTURAL  IMPLICATIONS   !  Possibly  include  narrower  SIMD  units  or  heterogeneous  SIMD  units     Scalar   Narrow  SIMD   Wide  SIMD     !  Resource  management  and  scheduling   ‒ Switch  the  task  between  the  CPU  and  the  GPU  based  on  parallelism   ‒ Use  only  “enough”  SIMD  engines  and  save  power     CPU     120000   0   GPU     GPU     Time          A   23   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL   B    
  • 24. CONCLUSION  AND  FUTURE  WORK   !  Graph  applica8ons  are  an  emerging  workload  domain   !  Panno8a  is  a  first-­‐step  aVempt  to  evaluate  diverse  graph  building  blocks  on  GPUs     Next-­‐Step  Goals:   !  Add  more  applica8ons  (e.g.  matching,  spanning  tree,  flow)     !  Op8mize  Panno8a  applica8ons   !  Extend  to  mul8ple  GPU  nodes  and  across  CPU  and  GPU   24   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL  
  • 25. DISCLAIMER  &  ATTRIBUTION   The  informa8on  presented  in  this  document  is  for  informa8onal  purposes  only  and  may  contain  technical  inaccuracies,  omissions  and  typographical  errors.     The  informa8on  contained  herein  is  subject  to  change  and  may  be  rendered  inaccurate  for  many  reasons,  including  but  not  limited  to  product  and  roadmap   changes,  component  and  motherboard  version  changes,  new  model  and/or  product  releases,  product  differences  between  differing  manufacturers,  so{ware   changes,  BIOS  flashes,  firmware  upgrades,  or  the  like.  AMD  assumes  no  obliga8on  to  update  or  otherwise  correct  or  revise  this  informa8on.  However,  AMD   reserves  the  right  to  revise  this  informa8on  and  to  make  changes  from  8me  to  8me  to  the  content  hereof  without  obliga8on  of  AMD  to  no8fy  any  person  of   such  revisions  or  changes.     AMD  MAKES  NO  REPRESENTATIONS  OR  WARRANTIES  WITH  RESPECT  TO  THE  CONTENTS  HEREOF  AND  ASSUMES  NO  RESPONSIBILITY  FOR  ANY   INACCURACIES,  ERRORS  OR  OMISSIONS  THAT  MAY  APPEAR  IN  THIS  INFORMATION.     AMD  SPECIFICALLY  DISCLAIMS  ANY  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  OR  FITNESS  FOR  ANY  PARTICULAR  PURPOSE.  IN  NO  EVENT  WILL  AMD  BE   LIABLE  TO  ANY  PERSON  FOR  ANY  DIRECT,  INDIRECT,  SPECIAL  OR  OTHER  CONSEQUENTIAL  DAMAGES  ARISING  FROM  THE  USE  OF  ANY  INFORMATION   CONTAINED  HEREIN,  EVEN  IF  AMD  IS  EXPRESSLY  ADVISED  OF  THE  POSSIBILITY  OF  SUCH  DAMAGES.     ATTRIBUTION   ©  2013  Advanced  Micro  Devices,  Inc.  All  rights  reserved.  AMD,  the  AMD  Arrow  logo  and  combina8ons  thereof  are  trademarks  of  Advanced  Micro  Devices,   Inc.  in  the  United  States  and/or  other  jurisdic8ons.    OpenCL    is  a  registered  trademark  of  Apple  Inc.  Other  names  are  for  informa8onal  purposes  only  and   may  be  trademarks  of  their  respec8ve  owners.   25   |      Accelera8ng  and  Evalua8ng  OpenCL  Graph  Applica8ons|      November  20,  2013      |      CONFIDENTIAL