SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Supercomputing: Technical Evolution &
Programming Models
Marc	
  Snir	
  
Argonne	
  Na.onal	
  Laboratory	
  &	
  
University	
  of	
  Illinois	
  at	
  Urbana-­‐Champaign	
  
Introduction
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
2	
  
Theory of Punctuated Equilibrium
(Eldredge, Gould, Mayer…)
§  Evolu.on	
  consists	
  of	
  long	
  periods	
  of	
  equilibrium,	
  with	
  liIle	
  change,	
  
interspersed	
  with	
  short	
  periods	
  of	
  rapid	
  change.	
  	
  
–  Muta.ons	
  are	
  diluted	
  in	
  large	
  popula.ons	
  in	
  equilibrium	
  –	
  homogenizing	
  
effect	
  prevents	
  the	
  accumula.on	
  of	
  mul.ple	
  changes	
  
–  Small,	
  isolated	
  	
  popula.on	
  under	
  heavy	
  natural	
  selec.on	
  pressure	
  evolve	
  
rapidly	
  and	
  new	
  species	
  can	
  appear	
  
–  Major	
  cataclysms	
  can	
  be	
  a	
  cause	
  for	
  rapid	
  change	
  
§  Punctuated	
  Equilibrium	
  is	
  a	
  good	
  model	
  for	
  technology	
  evolu.on:	
  
–  Revolu.ons	
  are	
  hard	
  in	
  large	
  markets	
  with	
  network	
  effects	
  and	
  technology	
  
that	
  evolves	
  gradually	
  
–  Changes	
  can	
  be	
  much	
  faster	
  when	
  small,	
  isolated	
  product	
  markets	
  are	
  
created,	
  or	
  when	
  current	
  technology	
  hits	
  a	
  wall	
  (cataclysm)	
  
§  (Not	
  a	
  new	
  idea:	
  e.g.,	
  Levinthal	
  1998:	
  The	
  Slow	
  Pace	
  of	
  Rapid	
  Technological	
  
Change:	
  Gradualism	
  and	
  Punctua;on	
  in	
  Technological	
  Change)	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
3	
  
July	
  13	
  
Why it Matters to SPAA (and PODC)
§  Periods	
  of	
  paradigm	
  shiW	
  generate	
  a	
  rich	
  set	
  of	
  new	
  problems	
  
(new	
  low-­‐hanging	
  fruit?)	
  
–  It	
  is	
  a	
  .me	
  when	
  good	
  theory	
  can	
  help	
  
§  E.g.,	
  Internet,	
  Wireless,	
  Big	
  data	
  
–  Punctuated	
  evolu.on	
  due	
  to	
  the	
  appearance	
  of	
  new	
  markets	
  
§  Hypothesis:	
  HPC	
  now	
  and,	
  ul.mately,	
  much	
  of	
  IT	
  are	
  entering	
  a	
  
period	
  of	
  fast	
  evolu.on:	
  Please	
  prepare	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
4	
  
Where Analogy with Biological Evolution Breaks Down
§  Technology	
  evolu.on	
  can	
  be	
  accelerated	
  by	
  gene.c	
  engineering	
  
–  Technology	
  developed	
  in	
  one	
  market	
  is	
  exploited	
  in	
  another	
  
market	
  
–  E.g.,	
  Internet	
  or	
  wireless	
  were	
  enabled	
  by	
  cheap	
  
microprocessors,	
  telephony	
  	
  technology,	
  etc.	
  
§  “Gene.c	
  engineering”	
  has	
  been	
  essen.al	
  for	
  HPC	
  in	
  the	
  last	
  
25	
  years:	
  	
  
–  Progress	
  enabled	
  by	
  reuse	
  of	
  technologies	
  from	
  other	
  markets	
  
(micros,	
  GPUs…)	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
5	
  
Past & Present
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
6	
  
Evidence of Punctuated Equilibrium in HPC
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
7	
  
1	
  
10	
  
100	
  
1000	
  
10000	
  
100000	
  
1000000	
  
10000000	
  
Core	
  Count	
  Leading	
  Top500	
  System	
  
aIack	
  of	
  
the	
  killer	
  
micros	
  
mul.core	
  
accelerators	
  
SPAA	
  
1990: The Attack of the Killer Micros
(Eugene Brooks, 1990)
§  ShiW	
  from	
  ECL	
  vector	
  machines	
  &	
  to	
  clusters	
  of	
  MOS	
  micros	
  
–  Cataclysm:	
  bipolar	
  evolu.on	
  reached	
  its	
  limits	
  (nitrogen	
  cooling,	
  gallium	
  
arsenide…);	
  MOS	
  was	
  	
  on	
  a	
  fast	
  evolu.on	
  path	
  
–  MOS	
  had	
  its	
  niche	
  markets:	
  controllers,	
  worksta.ons,	
  PCs	
  
–  Classical	
  example	
  of	
  “good	
  enough,	
  cheaper	
  technology”	
  (Christensen,	
  The	
  
Innovator’s	
  Dilemma)	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
8	
  
2002: Multicore
§  Clock	
  speed	
  stopped	
  
increasing;	
  very	
  liIle	
  return	
  
on	
  added	
  CPU	
  complexity;	
  
chip	
  density	
  con.nued	
  to	
  
increase	
  
–  Technology	
  push	
  –	
  not	
  
market	
  pull	
  
–  S.ll	
  has	
  limited	
  success	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
9	
  
2010: Accelerators
§  New	
  market	
  (graphics)	
  created	
  ecological	
  niche	
  
§  Technology	
  transplanted	
  into	
  other	
  markets	
  (signal	
  processing/
vision,	
  scien.fic	
  compu.ng)	
  
–  Advantage	
  of	
  beIer	
  power/performance	
  ra.o	
  (less	
  logic)	
  
§  Technology	
  s.ll	
  changing	
  rapidly:	
  integra.on	
  with	
  CPU	
  and	
  
evolving	
  ISA	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
10	
  
Where the (R)evolutions Successful in HPC?
§  Killer	
  Micros:	
  Yes	
  
–  Totally	
  replaced	
  vector	
  machines	
  
–  All	
  HPC	
  codes	
  enabled	
  for	
  message-­‐passing	
  (MPI)	
  
–  Took	
  >	
  10	
  years	
  and	
  >	
  $1B	
  govt.	
  investment	
  (DARPA)	
  
§  Mul:core:	
  Incomplete	
  
–  Many	
  codes	
  s.ll	
  use	
  one	
  MPI	
  process	
  per	
  core	
  –	
  use	
  shared	
  memory	
  
for	
  message-­‐passing	
  
–  Use	
  of	
  two	
  programming	
  models	
  (MPI+OpenMP)	
  is	
  burdensome	
  
–  PGAS	
  is	
  not	
  used,	
  and	
  does	
  not	
  provide	
  (so	
  far)	
  a	
  real	
  advantage	
  over	
  
MPI	
  
–  Many	
  open	
  issues	
  on	
  scaling	
  mul.threading	
  models	
  (OpenMP,	
  TBB,	
  
Cilk…)	
  and	
  combining	
  them	
  with	
  message-­‐passing	
  	
  
–  (See	
  history	
  of	
  large-­‐scale	
  NUMA	
  -­‐-­‐	
  which	
  did	
  not	
  become	
  a	
  viable	
  
species)	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
11	
  
Where the (R)evolutions Successful? (2)
§  Accelerators:	
  Just	
  beginning	
  
–  Few	
  HPC	
  codes	
  converted	
  to	
  use	
  GPUs	
  
§  Obstacles:	
  
–  Technology	
  s.ll	
  changing	
  fast	
  (integra.on	
  of	
  GPU	
  with	
  CPU,	
  
con.nued	
  changes	
  in	
  ISA)	
  
–  No	
  good	
  non-­‐proprietary	
  programming	
  systems	
  are	
  available,	
  and	
  
their	
  long-­‐term	
  viability	
  is	
  uncertain	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
12	
  
Key Obstacles
§  Scien.fic	
  codes	
  live	
  much	
  longer	
  than	
  computer	
  systems	
  (two	
  
decades	
  or	
  more);	
  they	
  need	
  to	
  be	
  ported	
  across	
  successive	
  HW	
  
genera.ons	
  	
  
§  Amount	
  of	
  code	
  to	
  be	
  ported	
  con.nuously	
  increases	
  (major	
  
scien.fic	
  codes	
  each	
  have	
  >	
  1MLOCs)	
  
§  Need	
  very	
  efficient,	
  well	
  tuned	
  codes	
  (HPC	
  plarorms	
  are	
  
expensive)	
  
§  Need	
  portability	
  across	
  plarorms	
  (HPC	
  programmers	
  are	
  
expensive)	
  
§  Squaring	
  the	
  circle?	
  
	
  
§  Lack	
  of	
  performing,	
  portable	
  programming	
  models	
  has	
  become	
  
the	
  major	
  impediment	
  to	
  the	
  evolu.on	
  of	
  HPC	
  hardware	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
13	
  
Did Theory Help?
§  Killer	
  Micros:	
  Helped	
  by	
  work	
  on	
  scalable	
  algorithms	
  and	
  on	
  
interconnects	
  
§  Mul:core:	
  Helped	
  by	
  work	
  on	
  communica.on	
  complexity	
  
(efficient	
  use	
  of	
  caches)	
  
–  Very	
  liIle	
  use	
  of	
  work	
  on	
  coordina.on	
  algorithms	
  or	
  
transac.onal	
  memory	
  
§  Accelerators:	
  Cannot	
  think	
  of	
  relevant	
  work	
  	
  	
  
–  Interes.ng	
  ques.on:	
  power	
  of	
  branching	
  &	
  power	
  of	
  
indirec.on;	
  
–  Surprising	
  result:	
  AKS	
  sor.ng	
  network	
  	
  
§  Too	
  oWen,	
  theory	
  follows	
  prac.ce,	
  rather	
  than	
  preceding	
  it.	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
14	
  
Future
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
15	
  
The End of Moore’s Law is Coming
§  Moore’s	
  Law:	
  The	
  
number	
  of	
  transistors	
  
per	
  chip	
  doubles	
  every	
  
two	
  years	
  
§  Stein’s	
  Law:	
  If	
  
something	
  cannot	
  go	
  
forever,	
  it	
  will	
  stop	
  
§  Ques.on	
  is	
  not	
  
whether	
  but	
  when	
  will	
  
Moore’s	
  Law	
  stop?	
  
–  It	
  is	
  difficult	
  to	
  make	
  
predic.ons,	
  especially	
  
about	
  the	
  future	
  (Yogi	
  
Berra)	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
16	
  
Current Obstacle: Current Leakage
§  Transistors	
  do	
  not	
  shut-­‐off	
  completely	
  
While	
  power	
  consump;on	
  is	
  an	
  urgent	
  challenge,	
  its	
  leakage	
  or	
  
sta;c	
  component	
  will	
  become	
  a	
  major	
  industry	
  crisis	
  in	
  the	
  long	
  
term,	
  threatening	
  the	
  survival	
  of	
  CMOS	
  technology	
  itself,	
  just	
  as	
  
bipolar	
  technology	
  was	
  threatened	
  and	
  eventually	
  disposed	
  of	
  
decades	
  ago	
  
Interna.onal	
  Technology	
  Roadmap	
  for	
  Semiconductors	
  (ITRS)	
  2011	
  
§  The	
  ITRS	
  “long	
  term”	
  is	
  the	
  2017-­‐2024	
  .meframe.	
  	
  
§  No	
  “good	
  enough”	
  technology	
  wai.ng	
  in	
  the	
  wings	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
17	
  
Longer-Term Obstacle
§  Quantum	
  effects	
  totally	
  change	
  the	
  behavior	
  of	
  transistors	
  as	
  
they	
  shrink	
  
–  7-­‐5nm	
  feature	
  size	
  is	
  predicted	
  to	
  be	
  the	
  lower	
  limit	
  for	
  CMOS	
  
devices	
  
–  ITRS	
  predicts	
  7.5nm	
  will	
  be	
  reached	
  in	
  2024	
  
	
  
	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
18	
  
The 7nm Wall
24	
  July	
  2013	
  
ANL-­‐LBNL-­‐ORNL-­‐PNNL	
  	
  
19	
  
(courtesy	
  S.	
  Dosanjh)	
  
The Future Is Not What It Was
24	
  July	
  2013	
  
ANL-­‐LBNL-­‐ORNL-­‐PNNL	
  	
  
20	
  
(courtesy	
  S.	
  Dosanjh)	
  
Progress Does Not Stop
§  It	
  becomes	
  more	
  expensive	
  and	
  slows	
  down	
  
–  New	
  materials	
  (e.g.,	
  III-­‐V,	
  germanium	
  thin	
  channels,	
  nanowires,	
  
nanotubes	
  or	
  graphene)	
  	
  
–  New	
  structures	
  (e.g.,	
  3D	
  transistor	
  structures)	
  	
  
–  Aggressive	
  cooling	
  
–  New	
  packages	
  
§  More	
  inven.on	
  at	
  the	
  architecture	
  level	
  
§  Seeking	
  value	
  from	
  features	
  other	
  than	
  speed	
  (More	
  than	
  Moore)	
  
–  System	
  on	
  a	
  chip	
  –	
  integra.on	
  of	
  analog	
  and	
  digital	
  
–  MEMS…	
  
§  Beyond	
  Moore?	
  (Quantum,	
  biological…)	
  –	
  beyond	
  my	
  horizon	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
21	
  
Exascale
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
22	
  
Supercomputer Evolution
§  X1,000	
  performance	
  increase	
  every	
  11	
  years	
  
–  X50	
  faster	
  than	
  Moore’s	
  Law	
  
§  Extrapola.on	
  predicts	
  exaflop/s	
  (1018	
  floa.ng	
  point	
  
opera.ons	
  per	
  second)	
  before	
  2020	
  
–  We	
  are	
  now	
  at	
  50	
  Petaflop/s	
  
§  Extrapola.on	
  may	
  not	
  work	
  if	
  Moore’s	
  Law	
  slows	
  down	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
23	
  
Do We Care?
§  It’s	
  all	
  about	
  Big	
  Data	
  Now,	
  simula.ons	
  are	
  passé.	
  
§  B***t	
  
§  All	
  science	
  is	
  either	
  physics	
  or	
  stamp	
  collec;ng.	
  (Ernest	
  
Rutherford)	
  
–  In	
  Physical	
  Sciences,	
  experiments	
  and	
  observa.ons	
  exist	
  to	
  
validate/refute/mo.vate	
  theory.	
  “Data	
  Mining”	
  not	
  driven	
  by	
  a	
  
scien.fic	
  hypothesis	
  is	
  “stamp	
  collec.on”.	
  
§  Simula.on	
  is	
  needed	
  to	
  go	
  from	
  a	
  mathema.cal	
  model	
  to	
  
predic.ons	
  on	
  observa.ons.	
  
–  If	
  system	
  is	
  complex	
  (e.g.,	
  climate)	
  then	
  simula.on	
  is	
  expensive	
  
–  Predic.ons	
  are	
  oWen	
  sta.s.cal	
  –	
  complica.ng	
  both	
  simula.on	
  
and	
  data	
  analysis	
  
	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
24	
  
Observation Meets Data: Cosmology
Computation Meets Data: The Argonne View
Mapping the Sky with
Survey Instruments
Observations:
Statistical error bars
will ‘disappear’ soon!
Emulator based on Gaussian
Process Interpolation in High-
Dimensional Spaces
Supercomputer
Simulation Campaign
Markov chain
Monte Carlo
‘Precision
Oracle’
‘Cosmic
Calibration’
LSST Weak Lensing
HACC+CCF (Domain
science+CS+Math+Stats
+Machine learning)
CCF= Cosmic Calibration Framework
w = -1
w = - 0.9
LSST
HACC=Hardware/Hybrid Accelerated
Cosmology Code(s)
(courtesy	
  Salman	
  Habib)	
  
Record-­‐breaking	
  applica.on:	
  3.6	
  Trillion	
  
par.cles,	
  14	
  Pflop/s	
  
Exascale Design Point 202x with a
cap of $200M and 20MW
Systems	
   2012	
  
BG/Q	
  
Computer	
  
2020-­‐2024	
  	
   Difference	
  
Today	
  &	
  2019	
  
System	
  peak	
   20	
  Pflop/s	
   1	
  Eflop/s	
   O(100)	
  
Power	
   8.6	
  MW	
   ~20	
  MW	
  
System	
  memory	
   1.6	
  PB	
  
(16*96*1024)	
  	
  
32	
  -­‐	
  64	
  PB	
   O(10)	
  
Node	
  performance	
   	
  	
  205	
  GF/s	
  
(16*1.6GHz*8)	
  
1.2	
  	
  or	
  15TF/s	
   O(10)	
  –	
  O(100)	
  
Node	
  memory	
  BW	
   42.6	
  GB/s	
   2	
  -­‐	
  4TB/s	
   O(1000)	
  
Node	
  concurrency	
   64	
  Threads	
   O(1k)	
  or	
  10k	
   O(100)	
  –	
  O(1000)	
  
Total	
  Node	
  Interconnect	
  BW	
   20	
  GB/s	
   200-­‐400GB/s	
   O(10)	
  
System	
  size	
  (nodes)	
   98,304	
  
(96*1024)	
  
O(100,000)	
  or	
  O(1M)	
   O(100)	
  –	
  O(1000)	
  
Total	
  concurrency	
   5.97	
  M	
   O(billion)	
   O(1,000)	
  
MTTI	
   4	
  days	
   O(<1	
  day)	
   -­‐	
  O(10)	
  
Both	
  price	
  and	
  power	
  envelopes	
  may	
  be	
  too	
  aggressive!	
  
Identified Issues
§  Scale	
  (billion	
  threads)	
  
§  Power	
  (10’s	
  of	
  MWaIs)	
  
–  Communica:on:	
  >	
  99%	
  of	
  power	
  is	
  consumed	
  by	
  moving	
  
operands	
  across	
  the	
  memory	
  hierarchy	
  and	
  across	
  nodes	
  
–  Reduced	
  memory	
  size:	
  (communica.on	
  in	
  .me)	
  
§  Resilience:	
  Something	
  fails	
  every	
  hour;	
  the	
  machine	
  is	
  never	
  
“whole”	
  
–  Trade-­‐off	
  between	
  power	
  and	
  resilience	
  
§  Asynchrony:	
  Equal	
  work	
  ≠	
  equal	
  .me	
  
–  Power	
  management	
  
–  Error	
  recovery	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
27	
  
Other Issues
§  Uncertainly	
  about	
  underlying	
  HW	
  architecture	
  
–  Fast	
  evolu.on	
  of	
  architecture	
  (accelerators,	
  3D	
  memory	
  and	
  
processing	
  near	
  memory,	
  nvram)	
  
–  Uncertainty	
  about	
  the	
  market	
  that	
  will	
  supply	
  components	
  to	
  
HPC	
  
–  Possible	
  divergence	
  from	
  commodity	
  markets	
  
§  Increased	
  complexity	
  of	
  soWware	
  
–  Simula.ons	
  of	
  complex	
  systems	
  +	
  uncertainty	
  quan.fica.on	
  +	
  
op.miza.on…	
  
–  SoWware	
  management	
  of	
  power	
  and	
  failure	
  
–  Scale	
  and	
  .ght	
  coupling	
  (tail	
  of	
  distribu.on	
  maIers!)	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
28	
  
Research Areas
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
29	
  
Scale
§  HPC	
  algorithms	
  are	
  being	
  designed	
  for	
  a	
  2-­‐level	
  hierarchy	
  (node,	
  
global);	
  can	
  they	
  be	
  designed	
  for	
  a	
  mul.-­‐level	
  hierarchy?	
  Can	
  
they	
  be	
  “hierarchy-­‐oblivious”?	
  
§  Can	
  we	
  have	
  a	
  programming	
  model	
  that	
  abstracts	
  the	
  specific	
  
HW	
  mechanisms	
  are	
  each	
  level	
  (message-­‐passing,	
  shared-­‐
memory)	
  yet	
  can	
  leverage	
  these	
  mechanisms	
  efficiently?	
  
–  Global	
  shared	
  object	
  space	
  +	
  caching	
  +	
  explicit	
  communica.on	
  
–  Mul.level	
  programing	
  (compila.on	
  with	
  human	
  in	
  the	
  loop)	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
30	
  
Communication
§  Communica.on-­‐efficient	
  algorithms	
  
§  A	
  beIer	
  understanding	
  of	
  fundamental	
  communica.on-­‐
computa.on	
  tradeoffs	
  for	
  PDE	
  solvers	
  (ge•ng	
  away	
  from	
  DAG	
  –	
  
based	
  lower	
  bounds;	
  tradeoffs	
  between	
  communica.on	
  and	
  
convergence	
  rate)	
  
§  Programming	
  models,	
  libraries	
  and	
  languages	
  where	
  
communica.on	
  is	
  a	
  first-­‐class	
  ci.zen	
  (other	
  than	
  MPI)	
  
	
  
	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
31	
  
Resilient Distributed Systems
§  E.g.,	
  a	
  parallel	
  file	
  system,	
  with	
  768	
  I/O	
  nodes	
  >50K	
  disks	
  
–  Systems	
  are	
  built	
  to	
  tolerate	
  disk	
  and	
  node	
  failures	
  
–  However,	
  most	
  failures	
  in	
  the	
  field	
  are	
  due	
  to	
  “performance	
  
bugs”:	
  e.g.,	
  .me-­‐outs,	
  due	
  to	
  thrashing	
  
§  How	
  do	
  we	
  build	
  feedback	
  mechanisms	
  that	
  ensure	
  stability?	
  
(control	
  theory	
  for	
  large-­‐scale,	
  discrete	
  systems)	
  
§  How	
  do	
  we	
  provide	
  quality	
  of	
  service?	
  
§  What	
  is	
  a	
  quan.ta.ve	
  theory	
  of	
  resilience?	
  (E.g.	
  Impact	
  of	
  failure	
  
rate	
  on	
  overall	
  performance)	
  
–  Focus	
  on	
  systems	
  where	
  failures	
  are	
  not	
  excep.onal	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
32	
  
Resilient Parallel Algorithms – Overcoming Silent Data
Corruptions
§  SDCs	
  may	
  be	
  unavoidable	
  in	
  future	
  large	
  systems	
  (due	
  to	
  flips	
  in	
  
computa.on	
  logic)	
  
§  Intui.on:	
  SDC	
  can	
  either	
  be	
  
–  Type	
  1:	
  Grossly	
  violates	
  the	
  computa.on	
  model	
  (e.g.	
  jump	
  to	
  
wrong	
  address,	
  message	
  sent	
  to	
  wrong	
  node),	
  or	
  
–  Type	
  2:	
  Introduces	
  noise	
  in	
  the	
  data	
  (bit	
  flip	
  in	
  a	
  large	
  array)	
  
§  Many	
  itera.ve	
  algorithms	
  can	
  tolerate	
  infrequent	
  type	
  2	
  errors	
  
§  Type	
  1	
  errors	
  are	
  oWen	
  catastrophic	
  and	
  easy	
  to	
  detect	
  in	
  
soWware	
  
§  Can	
  we	
  build	
  systems	
  that	
  avoid	
  or	
  correct	
  easy	
  to	
  detect	
  (type	
  1)	
  
errors	
  and	
  tolerate	
  hard	
  to	
  detect	
  (type	
  2)	
  errors?	
  
§  What	
  is	
  the	
  general	
  theory	
  of	
  fault-­‐tolerant	
  numerical	
  
algorithms?	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
33	
  
Asynchrony
§  What	
  is	
  a	
  measure	
  of	
  asynchrony	
  tolerance?	
  
–  Moving	
  away	
  from	
  the	
  qualita.ve	
  (e.g.,	
  wait-­‐free)	
  to	
  the	
  
quan.ta.ve:	
  	
  
–  How	
  much	
  do	
  intermiIently	
  slow	
  processes	
  slow	
  down	
  the	
  
en.re	
  computa.on	
  –	
  on	
  average?	
  
§  What	
  are	
  the	
  trade-­‐offs	
  between	
  synchronicity	
  and	
  computa.on	
  
work?	
  
§  Load	
  balancing,	
  driven	
  not	
  by	
  uncertainty	
  on	
  computa.on,	
  but	
  
uncertainty	
  on	
  computer	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
34	
  
Architecture-Specific Algorithms
§  GPU/accelerators	
  
	
  
§  Hybrid	
  memory	
  Cube	
  /	
  Near-­‐
memory	
  compu.ng	
  
§  NVRAM	
  –	
  E.g.,	
  flash	
  memory	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
35	
  
Portable Performance
§  Can	
  we	
  redefine	
  compila.on	
  so	
  that:	
  
–  It	
  supports	
  well	
  a	
  human	
  in	
  the	
  loop	
  (manual	
  high-­‐level	
  decisions	
  vs.	
  
automated	
  low-­‐level	
  transforma.ons)	
  
–  It	
  integrates	
  auto-­‐tuning	
  and	
  profile-­‐guided	
  compila.on	
  
–  It	
  preserves	
  high-­‐level	
  code	
  seman.cs	
  
–  It	
  preserves	
  high-­‐level	
  code	
  “performance	
  seman.cs”	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
36	
  
Principle	
  
High-­‐level	
  code	
  
Low-­‐level,	
  plarorm-­‐
specific	
  codes	
  
“Compila.on”	
  
Prac:ce	
  
Code	
  A	
   Code	
  B	
   Code	
  C	
  
Manual	
  conversion	
  
“ifdef”	
  spaghe•	
  
Conclusion
§  Moore’s	
  Law	
  is	
  slowing	
  down;	
  the	
  slow-­‐down	
  has	
  many	
  
fundamental	
  consequences	
  –	
  only	
  a	
  few	
  of	
  them	
  explored	
  in	
  this	
  
talk	
  
§  HPC	
  is	
  the	
  “canary	
  in	
  the	
  mine”:	
  
–  issues	
  appear	
  earlier	
  because	
  of	
  size	
  and	
  .ght	
  coupling	
  
§  Op.mis.c	
  view	
  of	
  the	
  next	
  decades:	
  A	
  frenzy	
  of	
  innova.on	
  to	
  
con.nue	
  pushing	
  current	
  ecosystem,	
  followed	
  by	
  frenzy	
  of	
  
innova.on	
  to	
  use	
  totally	
  different	
  compute	
  technologies	
  
§  Pessimis.c	
  view:	
  	
  The	
  end	
  is	
  coming	
  
July	
  13	
  
MCS	
  	
  -­‐-­‐	
  Marc	
  Snir	
  
37	
  

Mais conteúdo relacionado

Semelhante a Keynote snir spaa

Sc10 slide share
Sc10 slide shareSc10 slide share
Sc10 slide shareGuy Tel-Zur
 
Our Concurrent Past; Our Distributed Future
Our Concurrent Past; Our Distributed FutureOur Concurrent Past; Our Distributed Future
Our Concurrent Past; Our Distributed FutureC4Media
 
How to leverage Quantum Computing and Generative AI for Clean Energy Transiti...
How to leverage Quantum Computing and Generative AI for Clean Energy Transiti...How to leverage Quantum Computing and Generative AI for Clean Energy Transiti...
How to leverage Quantum Computing and Generative AI for Clean Energy Transiti...Sayonsom Chanda
 
Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Masoud Nikravesh
 
Technological forecasting of supercomputer development: The march to exascale...
Technological forecasting of supercomputer development: The march to exascale...Technological forecasting of supercomputer development: The march to exascale...
Technological forecasting of supercomputer development: The march to exascale...dongjoon
 
Energy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21cEnergy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21cIan Phillips
 
VLSI
VLSI VLSI
VLSI So Ma
 
Episode 2(2): Electronic automation and computation - Meetup session 8
Episode 2(2): Electronic automation and computation - Meetup session 8Episode 2(2): Electronic automation and computation - Meetup session 8
Episode 2(2): Electronic automation and computation - Meetup session 8William Hall
 
The Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPCThe Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPCinside-BigData.com
 
IS 139 Lecture 1
IS 139 Lecture 1IS 139 Lecture 1
IS 139 Lecture 1wajanga
 
System On Chip
System On ChipSystem On Chip
System On ChipA B Shinde
 
Proactive Management of Future Grid [mithun_p_c]
Proactive Management of Future Grid [mithun_p_c]Proactive Management of Future Grid [mithun_p_c]
Proactive Management of Future Grid [mithun_p_c]MithunPChandra
 
Adaptable embedded systems
Adaptable embedded systemsAdaptable embedded systems
Adaptable embedded systemsSpringer
 
invited speech at Ge2013, Udine 2013
invited speech at Ge2013, Udine 2013 invited speech at Ge2013, Udine 2013
invited speech at Ge2013, Udine 2013 Roberto Siagri
 
Futures Frameworks Simulation
Futures Frameworks SimulationFutures Frameworks Simulation
Futures Frameworks SimulationMelanie Swan
 
onur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptx
onur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptxonur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptx
onur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptxsivasubramanianManic2
 
ON THE SYNERGY OF CIRCUITS AND PACKETS
ON THE SYNERGY OF CIRCUITS AND PACKETS ON THE SYNERGY OF CIRCUITS AND PACKETS
ON THE SYNERGY OF CIRCUITS AND PACKETS Coldbeans Software
 

Semelhante a Keynote snir spaa (20)

Sc10 slide share
Sc10 slide shareSc10 slide share
Sc10 slide share
 
Our Concurrent Past; Our Distributed Future
Our Concurrent Past; Our Distributed FutureOur Concurrent Past; Our Distributed Future
Our Concurrent Past; Our Distributed Future
 
How to leverage Quantum Computing and Generative AI for Clean Energy Transiti...
How to leverage Quantum Computing and Generative AI for Clean Energy Transiti...How to leverage Quantum Computing and Generative AI for Clean Energy Transiti...
How to leverage Quantum Computing and Generative AI for Clean Energy Transiti...
 
Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012
 
Technological forecasting of supercomputer development: The march to exascale...
Technological forecasting of supercomputer development: The march to exascale...Technological forecasting of supercomputer development: The march to exascale...
Technological forecasting of supercomputer development: The march to exascale...
 
Energy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21cEnergy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21c
 
VLSI
VLSI VLSI
VLSI
 
Episode 2(2): Electronic automation and computation - Meetup session 8
Episode 2(2): Electronic automation and computation - Meetup session 8Episode 2(2): Electronic automation and computation - Meetup session 8
Episode 2(2): Electronic automation and computation - Meetup session 8
 
The Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPCThe Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPC
 
IS 139 Lecture 1
IS 139 Lecture 1IS 139 Lecture 1
IS 139 Lecture 1
 
System On Chip
System On ChipSystem On Chip
System On Chip
 
Small is Beautiful
Small is BeautifulSmall is Beautiful
Small is Beautiful
 
Nano computing.
Nano computing.Nano computing.
Nano computing.
 
Proactive Management of Future Grid [mithun_p_c]
Proactive Management of Future Grid [mithun_p_c]Proactive Management of Future Grid [mithun_p_c]
Proactive Management of Future Grid [mithun_p_c]
 
Adaptable embedded systems
Adaptable embedded systemsAdaptable embedded systems
Adaptable embedded systems
 
MAJOR PROJEC TVLSI
MAJOR PROJEC TVLSIMAJOR PROJEC TVLSI
MAJOR PROJEC TVLSI
 
invited speech at Ge2013, Udine 2013
invited speech at Ge2013, Udine 2013 invited speech at Ge2013, Udine 2013
invited speech at Ge2013, Udine 2013
 
Futures Frameworks Simulation
Futures Frameworks SimulationFutures Frameworks Simulation
Futures Frameworks Simulation
 
onur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptx
onur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptxonur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptx
onur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptx
 
ON THE SYNERGY OF CIRCUITS AND PACKETS
ON THE SYNERGY OF CIRCUITS AND PACKETS ON THE SYNERGY OF CIRCUITS AND PACKETS
ON THE SYNERGY OF CIRCUITS AND PACKETS
 

Último

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Keynote snir spaa

  • 1. Supercomputing: Technical Evolution & Programming Models Marc  Snir   Argonne  Na.onal  Laboratory  &   University  of  Illinois  at  Urbana-­‐Champaign  
  • 2. Introduction July  13   MCS    -­‐-­‐  Marc  Snir   2  
  • 3. Theory of Punctuated Equilibrium (Eldredge, Gould, Mayer…) §  Evolu.on  consists  of  long  periods  of  equilibrium,  with  liIle  change,   interspersed  with  short  periods  of  rapid  change.     –  Muta.ons  are  diluted  in  large  popula.ons  in  equilibrium  –  homogenizing   effect  prevents  the  accumula.on  of  mul.ple  changes   –  Small,  isolated    popula.on  under  heavy  natural  selec.on  pressure  evolve   rapidly  and  new  species  can  appear   –  Major  cataclysms  can  be  a  cause  for  rapid  change   §  Punctuated  Equilibrium  is  a  good  model  for  technology  evolu.on:   –  Revolu.ons  are  hard  in  large  markets  with  network  effects  and  technology   that  evolves  gradually   –  Changes  can  be  much  faster  when  small,  isolated  product  markets  are   created,  or  when  current  technology  hits  a  wall  (cataclysm)   §  (Not  a  new  idea:  e.g.,  Levinthal  1998:  The  Slow  Pace  of  Rapid  Technological   Change:  Gradualism  and  Punctua;on  in  Technological  Change)   MCS    -­‐-­‐  Marc  Snir   3   July  13  
  • 4. Why it Matters to SPAA (and PODC) §  Periods  of  paradigm  shiW  generate  a  rich  set  of  new  problems   (new  low-­‐hanging  fruit?)   –  It  is  a  .me  when  good  theory  can  help   §  E.g.,  Internet,  Wireless,  Big  data   –  Punctuated  evolu.on  due  to  the  appearance  of  new  markets   §  Hypothesis:  HPC  now  and,  ul.mately,  much  of  IT  are  entering  a   period  of  fast  evolu.on:  Please  prepare   July  13   MCS    -­‐-­‐  Marc  Snir   4  
  • 5. Where Analogy with Biological Evolution Breaks Down §  Technology  evolu.on  can  be  accelerated  by  gene.c  engineering   –  Technology  developed  in  one  market  is  exploited  in  another   market   –  E.g.,  Internet  or  wireless  were  enabled  by  cheap   microprocessors,  telephony    technology,  etc.   §  “Gene.c  engineering”  has  been  essen.al  for  HPC  in  the  last   25  years:     –  Progress  enabled  by  reuse  of  technologies  from  other  markets   (micros,  GPUs…)   July  13   MCS    -­‐-­‐  Marc  Snir   5  
  • 6. Past & Present July  13   MCS    -­‐-­‐  Marc  Snir   6  
  • 7. Evidence of Punctuated Equilibrium in HPC July  13   MCS    -­‐-­‐  Marc  Snir   7   1   10   100   1000   10000   100000   1000000   10000000   Core  Count  Leading  Top500  System   aIack  of   the  killer   micros   mul.core   accelerators   SPAA  
  • 8. 1990: The Attack of the Killer Micros (Eugene Brooks, 1990) §  ShiW  from  ECL  vector  machines  &  to  clusters  of  MOS  micros   –  Cataclysm:  bipolar  evolu.on  reached  its  limits  (nitrogen  cooling,  gallium   arsenide…);  MOS  was    on  a  fast  evolu.on  path   –  MOS  had  its  niche  markets:  controllers,  worksta.ons,  PCs   –  Classical  example  of  “good  enough,  cheaper  technology”  (Christensen,  The   Innovator’s  Dilemma)   July  13   MCS    -­‐-­‐  Marc  Snir   8  
  • 9. 2002: Multicore §  Clock  speed  stopped   increasing;  very  liIle  return   on  added  CPU  complexity;   chip  density  con.nued  to   increase   –  Technology  push  –  not   market  pull   –  S.ll  has  limited  success   July  13   MCS    -­‐-­‐  Marc  Snir   9  
  • 10. 2010: Accelerators §  New  market  (graphics)  created  ecological  niche   §  Technology  transplanted  into  other  markets  (signal  processing/ vision,  scien.fic  compu.ng)   –  Advantage  of  beIer  power/performance  ra.o  (less  logic)   §  Technology  s.ll  changing  rapidly:  integra.on  with  CPU  and   evolving  ISA   July  13   MCS    -­‐-­‐  Marc  Snir   10  
  • 11. Where the (R)evolutions Successful in HPC? §  Killer  Micros:  Yes   –  Totally  replaced  vector  machines   –  All  HPC  codes  enabled  for  message-­‐passing  (MPI)   –  Took  >  10  years  and  >  $1B  govt.  investment  (DARPA)   §  Mul:core:  Incomplete   –  Many  codes  s.ll  use  one  MPI  process  per  core  –  use  shared  memory   for  message-­‐passing   –  Use  of  two  programming  models  (MPI+OpenMP)  is  burdensome   –  PGAS  is  not  used,  and  does  not  provide  (so  far)  a  real  advantage  over   MPI   –  Many  open  issues  on  scaling  mul.threading  models  (OpenMP,  TBB,   Cilk…)  and  combining  them  with  message-­‐passing     –  (See  history  of  large-­‐scale  NUMA  -­‐-­‐  which  did  not  become  a  viable   species)   July  13   MCS    -­‐-­‐  Marc  Snir   11  
  • 12. Where the (R)evolutions Successful? (2) §  Accelerators:  Just  beginning   –  Few  HPC  codes  converted  to  use  GPUs   §  Obstacles:   –  Technology  s.ll  changing  fast  (integra.on  of  GPU  with  CPU,   con.nued  changes  in  ISA)   –  No  good  non-­‐proprietary  programming  systems  are  available,  and   their  long-­‐term  viability  is  uncertain   July  13   MCS    -­‐-­‐  Marc  Snir   12  
  • 13. Key Obstacles §  Scien.fic  codes  live  much  longer  than  computer  systems  (two   decades  or  more);  they  need  to  be  ported  across  successive  HW   genera.ons     §  Amount  of  code  to  be  ported  con.nuously  increases  (major   scien.fic  codes  each  have  >  1MLOCs)   §  Need  very  efficient,  well  tuned  codes  (HPC  plarorms  are   expensive)   §  Need  portability  across  plarorms  (HPC  programmers  are   expensive)   §  Squaring  the  circle?     §  Lack  of  performing,  portable  programming  models  has  become   the  major  impediment  to  the  evolu.on  of  HPC  hardware   July  13   MCS    -­‐-­‐  Marc  Snir   13  
  • 14. Did Theory Help? §  Killer  Micros:  Helped  by  work  on  scalable  algorithms  and  on   interconnects   §  Mul:core:  Helped  by  work  on  communica.on  complexity   (efficient  use  of  caches)   –  Very  liIle  use  of  work  on  coordina.on  algorithms  or   transac.onal  memory   §  Accelerators:  Cannot  think  of  relevant  work       –  Interes.ng  ques.on:  power  of  branching  &  power  of   indirec.on;   –  Surprising  result:  AKS  sor.ng  network     §  Too  oWen,  theory  follows  prac.ce,  rather  than  preceding  it.   July  13   MCS    -­‐-­‐  Marc  Snir   14  
  • 15. Future July  13   MCS    -­‐-­‐  Marc  Snir   15  
  • 16. The End of Moore’s Law is Coming §  Moore’s  Law:  The   number  of  transistors   per  chip  doubles  every   two  years   §  Stein’s  Law:  If   something  cannot  go   forever,  it  will  stop   §  Ques.on  is  not   whether  but  when  will   Moore’s  Law  stop?   –  It  is  difficult  to  make   predic.ons,  especially   about  the  future  (Yogi   Berra)   July  13   MCS    -­‐-­‐  Marc  Snir   16  
  • 17. Current Obstacle: Current Leakage §  Transistors  do  not  shut-­‐off  completely   While  power  consump;on  is  an  urgent  challenge,  its  leakage  or   sta;c  component  will  become  a  major  industry  crisis  in  the  long   term,  threatening  the  survival  of  CMOS  technology  itself,  just  as   bipolar  technology  was  threatened  and  eventually  disposed  of   decades  ago   Interna.onal  Technology  Roadmap  for  Semiconductors  (ITRS)  2011   §  The  ITRS  “long  term”  is  the  2017-­‐2024  .meframe.     §  No  “good  enough”  technology  wai.ng  in  the  wings   July  13   MCS    -­‐-­‐  Marc  Snir   17  
  • 18. Longer-Term Obstacle §  Quantum  effects  totally  change  the  behavior  of  transistors  as   they  shrink   –  7-­‐5nm  feature  size  is  predicted  to  be  the  lower  limit  for  CMOS   devices   –  ITRS  predicts  7.5nm  will  be  reached  in  2024       July  13   MCS    -­‐-­‐  Marc  Snir   18  
  • 19. The 7nm Wall 24  July  2013   ANL-­‐LBNL-­‐ORNL-­‐PNNL     19   (courtesy  S.  Dosanjh)  
  • 20. The Future Is Not What It Was 24  July  2013   ANL-­‐LBNL-­‐ORNL-­‐PNNL     20   (courtesy  S.  Dosanjh)  
  • 21. Progress Does Not Stop §  It  becomes  more  expensive  and  slows  down   –  New  materials  (e.g.,  III-­‐V,  germanium  thin  channels,  nanowires,   nanotubes  or  graphene)     –  New  structures  (e.g.,  3D  transistor  structures)     –  Aggressive  cooling   –  New  packages   §  More  inven.on  at  the  architecture  level   §  Seeking  value  from  features  other  than  speed  (More  than  Moore)   –  System  on  a  chip  –  integra.on  of  analog  and  digital   –  MEMS…   §  Beyond  Moore?  (Quantum,  biological…)  –  beyond  my  horizon   July  13   MCS    -­‐-­‐  Marc  Snir   21  
  • 22. Exascale July  13   MCS    -­‐-­‐  Marc  Snir   22  
  • 23. Supercomputer Evolution §  X1,000  performance  increase  every  11  years   –  X50  faster  than  Moore’s  Law   §  Extrapola.on  predicts  exaflop/s  (1018  floa.ng  point   opera.ons  per  second)  before  2020   –  We  are  now  at  50  Petaflop/s   §  Extrapola.on  may  not  work  if  Moore’s  Law  slows  down   July  13   MCS    -­‐-­‐  Marc  Snir   23  
  • 24. Do We Care? §  It’s  all  about  Big  Data  Now,  simula.ons  are  passé.   §  B***t   §  All  science  is  either  physics  or  stamp  collec;ng.  (Ernest   Rutherford)   –  In  Physical  Sciences,  experiments  and  observa.ons  exist  to   validate/refute/mo.vate  theory.  “Data  Mining”  not  driven  by  a   scien.fic  hypothesis  is  “stamp  collec.on”.   §  Simula.on  is  needed  to  go  from  a  mathema.cal  model  to   predic.ons  on  observa.ons.   –  If  system  is  complex  (e.g.,  climate)  then  simula.on  is  expensive   –  Predic.ons  are  oWen  sta.s.cal  –  complica.ng  both  simula.on   and  data  analysis     July  13   MCS    -­‐-­‐  Marc  Snir   24  
  • 25. Observation Meets Data: Cosmology Computation Meets Data: The Argonne View Mapping the Sky with Survey Instruments Observations: Statistical error bars will ‘disappear’ soon! Emulator based on Gaussian Process Interpolation in High- Dimensional Spaces Supercomputer Simulation Campaign Markov chain Monte Carlo ‘Precision Oracle’ ‘Cosmic Calibration’ LSST Weak Lensing HACC+CCF (Domain science+CS+Math+Stats +Machine learning) CCF= Cosmic Calibration Framework w = -1 w = - 0.9 LSST HACC=Hardware/Hybrid Accelerated Cosmology Code(s) (courtesy  Salman  Habib)   Record-­‐breaking  applica.on:  3.6  Trillion   par.cles,  14  Pflop/s  
  • 26. Exascale Design Point 202x with a cap of $200M and 20MW Systems   2012   BG/Q   Computer   2020-­‐2024     Difference   Today  &  2019   System  peak   20  Pflop/s   1  Eflop/s   O(100)   Power   8.6  MW   ~20  MW   System  memory   1.6  PB   (16*96*1024)     32  -­‐  64  PB   O(10)   Node  performance      205  GF/s   (16*1.6GHz*8)   1.2    or  15TF/s   O(10)  –  O(100)   Node  memory  BW   42.6  GB/s   2  -­‐  4TB/s   O(1000)   Node  concurrency   64  Threads   O(1k)  or  10k   O(100)  –  O(1000)   Total  Node  Interconnect  BW   20  GB/s   200-­‐400GB/s   O(10)   System  size  (nodes)   98,304   (96*1024)   O(100,000)  or  O(1M)   O(100)  –  O(1000)   Total  concurrency   5.97  M   O(billion)   O(1,000)   MTTI   4  days   O(<1  day)   -­‐  O(10)   Both  price  and  power  envelopes  may  be  too  aggressive!  
  • 27. Identified Issues §  Scale  (billion  threads)   §  Power  (10’s  of  MWaIs)   –  Communica:on:  >  99%  of  power  is  consumed  by  moving   operands  across  the  memory  hierarchy  and  across  nodes   –  Reduced  memory  size:  (communica.on  in  .me)   §  Resilience:  Something  fails  every  hour;  the  machine  is  never   “whole”   –  Trade-­‐off  between  power  and  resilience   §  Asynchrony:  Equal  work  ≠  equal  .me   –  Power  management   –  Error  recovery   July  13   MCS    -­‐-­‐  Marc  Snir   27  
  • 28. Other Issues §  Uncertainly  about  underlying  HW  architecture   –  Fast  evolu.on  of  architecture  (accelerators,  3D  memory  and   processing  near  memory,  nvram)   –  Uncertainty  about  the  market  that  will  supply  components  to   HPC   –  Possible  divergence  from  commodity  markets   §  Increased  complexity  of  soWware   –  Simula.ons  of  complex  systems  +  uncertainty  quan.fica.on  +   op.miza.on…   –  SoWware  management  of  power  and  failure   –  Scale  and  .ght  coupling  (tail  of  distribu.on  maIers!)   July  13   MCS    -­‐-­‐  Marc  Snir   28  
  • 29. Research Areas July  13   MCS    -­‐-­‐  Marc  Snir   29  
  • 30. Scale §  HPC  algorithms  are  being  designed  for  a  2-­‐level  hierarchy  (node,   global);  can  they  be  designed  for  a  mul.-­‐level  hierarchy?  Can   they  be  “hierarchy-­‐oblivious”?   §  Can  we  have  a  programming  model  that  abstracts  the  specific   HW  mechanisms  are  each  level  (message-­‐passing,  shared-­‐ memory)  yet  can  leverage  these  mechanisms  efficiently?   –  Global  shared  object  space  +  caching  +  explicit  communica.on   –  Mul.level  programing  (compila.on  with  human  in  the  loop)   July  13   MCS    -­‐-­‐  Marc  Snir   30  
  • 31. Communication §  Communica.on-­‐efficient  algorithms   §  A  beIer  understanding  of  fundamental  communica.on-­‐ computa.on  tradeoffs  for  PDE  solvers  (ge•ng  away  from  DAG  –   based  lower  bounds;  tradeoffs  between  communica.on  and   convergence  rate)   §  Programming  models,  libraries  and  languages  where   communica.on  is  a  first-­‐class  ci.zen  (other  than  MPI)       July  13   MCS    -­‐-­‐  Marc  Snir   31  
  • 32. Resilient Distributed Systems §  E.g.,  a  parallel  file  system,  with  768  I/O  nodes  >50K  disks   –  Systems  are  built  to  tolerate  disk  and  node  failures   –  However,  most  failures  in  the  field  are  due  to  “performance   bugs”:  e.g.,  .me-­‐outs,  due  to  thrashing   §  How  do  we  build  feedback  mechanisms  that  ensure  stability?   (control  theory  for  large-­‐scale,  discrete  systems)   §  How  do  we  provide  quality  of  service?   §  What  is  a  quan.ta.ve  theory  of  resilience?  (E.g.  Impact  of  failure   rate  on  overall  performance)   –  Focus  on  systems  where  failures  are  not  excep.onal   July  13   MCS    -­‐-­‐  Marc  Snir   32  
  • 33. Resilient Parallel Algorithms – Overcoming Silent Data Corruptions §  SDCs  may  be  unavoidable  in  future  large  systems  (due  to  flips  in   computa.on  logic)   §  Intui.on:  SDC  can  either  be   –  Type  1:  Grossly  violates  the  computa.on  model  (e.g.  jump  to   wrong  address,  message  sent  to  wrong  node),  or   –  Type  2:  Introduces  noise  in  the  data  (bit  flip  in  a  large  array)   §  Many  itera.ve  algorithms  can  tolerate  infrequent  type  2  errors   §  Type  1  errors  are  oWen  catastrophic  and  easy  to  detect  in   soWware   §  Can  we  build  systems  that  avoid  or  correct  easy  to  detect  (type  1)   errors  and  tolerate  hard  to  detect  (type  2)  errors?   §  What  is  the  general  theory  of  fault-­‐tolerant  numerical   algorithms?   July  13   MCS    -­‐-­‐  Marc  Snir   33  
  • 34. Asynchrony §  What  is  a  measure  of  asynchrony  tolerance?   –  Moving  away  from  the  qualita.ve  (e.g.,  wait-­‐free)  to  the   quan.ta.ve:     –  How  much  do  intermiIently  slow  processes  slow  down  the   en.re  computa.on  –  on  average?   §  What  are  the  trade-­‐offs  between  synchronicity  and  computa.on   work?   §  Load  balancing,  driven  not  by  uncertainty  on  computa.on,  but   uncertainty  on  computer   July  13   MCS    -­‐-­‐  Marc  Snir   34  
  • 35. Architecture-Specific Algorithms §  GPU/accelerators     §  Hybrid  memory  Cube  /  Near-­‐ memory  compu.ng   §  NVRAM  –  E.g.,  flash  memory   July  13   MCS    -­‐-­‐  Marc  Snir   35  
  • 36. Portable Performance §  Can  we  redefine  compila.on  so  that:   –  It  supports  well  a  human  in  the  loop  (manual  high-­‐level  decisions  vs.   automated  low-­‐level  transforma.ons)   –  It  integrates  auto-­‐tuning  and  profile-­‐guided  compila.on   –  It  preserves  high-­‐level  code  seman.cs   –  It  preserves  high-­‐level  code  “performance  seman.cs”   July  13   MCS    -­‐-­‐  Marc  Snir   36   Principle   High-­‐level  code   Low-­‐level,  plarorm-­‐ specific  codes   “Compila.on”   Prac:ce   Code  A   Code  B   Code  C   Manual  conversion   “ifdef”  spaghe•  
  • 37. Conclusion §  Moore’s  Law  is  slowing  down;  the  slow-­‐down  has  many   fundamental  consequences  –  only  a  few  of  them  explored  in  this   talk   §  HPC  is  the  “canary  in  the  mine”:   –  issues  appear  earlier  because  of  size  and  .ght  coupling   §  Op.mis.c  view  of  the  next  decades:  A  frenzy  of  innova.on  to   con.nue  pushing  current  ecosystem,  followed  by  frenzy  of   innova.on  to  use  totally  different  compute  technologies   §  Pessimis.c  view:    The  end  is  coming   July  13   MCS    -­‐-­‐  Marc  Snir   37