O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
MRemu:	
  An	
  Emula-on-­‐based	
  
Framework	
  for	
  Datacenter	
  Network	
  
Experimenta-on	
  using	
  Realis-c	
  ...
Context	
  
•  Big	
  Data	
  &	
  MapReduce	
  analy-cs	
  frameworks	
  
– Scale	
  out	
  to	
  hundred	
  or	
  even	
...
Problem	
  
•  The	
  need	
  for	
  a	
  real	
  hardware	
  infrastructure	
  
– is	
  oZen	
  not	
  a	
  valid	
  op-o...
Problem	
  
•  Most	
  research	
  on	
  data	
  center	
  networks	
  do	
  not	
  use	
  
realis-c	
  Big	
  Data	
  tra...
Network	
  Traffic	
  in	
  real	
  Hadoop	
  Applica-ons	
  
•  a	
  
5	
  
Network	
  transfers	
  
Network	
  transfers	
...
Proposed	
  solu-on:	
  MRemu	
  
•  Emula-on-­‐based	
  framework	
  for	
  data	
  center	
  
network	
  experimenta-on	...
MRemu	
  Architecture	
  
7	
  
Job Trace
Hadoop
Job Tracing
Synthectic
Job Generator
Topology
Description
Mininet-HiFI
To...
Evalua-on	
  
•  Mininet-­‐HiFi	
  has	
  already	
  been	
  validated	
  and	
  is	
  widely	
  
used	
  to	
  reproduce	...
Accuracy	
  Evalua-on	
  
9	
  
Job	
  Comple-on	
  Time	
  Accuracy	
   Individual	
  Flow	
  Comple-on	
  Time	
  Accura...
Nutch	
  applica-on	
  with	
  background	
  traffic	
  
Sort	
  applica-on	
  with	
  background	
  traffic	
  
Par--on	
  sk...
Conclusion	
  and	
  Future	
  Work	
  
•  MRemu,	
  an	
  emula-on-­‐based	
  framework	
  that	
  enables	
  
conduc-ng	...
References	
  
•  NEVES,	
  M.	
  V.,:	
  Applica-on-­‐aware	
  networking	
  to	
  Accelerate	
  
MapReduce	
  Applica-on...
MRemu:	
  An	
  Emula-on-­‐based	
  
Framework	
  for	
  Datacenter	
  Network	
  
Experimenta-on	
  using	
  Realis-c	
  ...
Próximos SlideShares
Carregando em…5
×

MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

405 visualizações

Publicada em

As data volumes and the need for timely analysis grow, Big Data analytics frameworks have to scale out to hundred or even thousands of commodity servers. While such a scale-out is crucial to sustain desired computational throughput/latency and storage capacity, it comes at the cost of increased network traffic volumes and multiplicity of traffic patterns. Despite the sheer reality of the dependency between datacenter network (DCN) and time-to-insight through big data analysis, our experience as active networking researchers conveys that a large fraction of DCN research experimentation is conducted on network traces and/or synthetic flow traces. And while the respective results are often valuable as standalone contributions, in practice it turns out extremely difficult to quantitatively assess how the reported network optimization results translate to performance or fault-tolerance improvement for actual analytics runtimes, e.g., due to the ability of these runtimes to overlap communication with computation. This paper presents MRemu, an emulation-based framework for conducting reproducible datacenter network research using accurate MapReduce workloads and at system scales that are relevant to the size of target deployments, albeit without requiring access to a hardware infrastructure of such scale. We choose the MapReduce (MR) framework as a design point, for it is a common representative of the most widely deployed frameworks for analysis of large volumes of - structured and unstructured - data and is reported to be highly sensitive to network performance. With MRemu, it is possible to quantify the impact of various network design parameters and software-defined control techniques to key performance indicators of a given MR application. We show through targeted experimental validation that MRemu exhibits high fidelity, when compared to the performance of MR applications on a real scale-out cluster of 16 high-end servers. Also, as a proof of impact of our experimental framework, we showcase how we used MRemu to quantify the impact of network capacity among MR nodes to a selection of network-bound MR applications.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

  1. 1. MRemu:  An  Emula-on-­‐based   Framework  for  Datacenter  Network   Experimenta-on  using  Realis-c   MapReduce  Traffic     Marcelo  Veiga  Neves1,  Cesar  A.  F.  De  Rose1,     Kostas  Katrinis2   marcelo.neves@pucrs.br       1  PUCRS,  Porto  Alegre,  Brazil   2  IBM  Research,  Dublin,  Ireland   Oct,  2015  
  2. 2. Context   •  Big  Data  &  MapReduce  analy-cs  frameworks   – Scale  out  to  hundred  or  even  thousands  of   commodity  servers   – Increased  network  traffic  volumes  and  mul-plicity   of  traffic  paWerns   •  Data  center  networks  for  Big  Data   – Scale-­‐out  topologies  (e.g.,  fat-­‐tree,  leaf-­‐spine)   – Network  control  soZware  (e.g,  SDN  –  IPDPS’15)    
  3. 3. Problem   •  The  need  for  a  real  hardware  infrastructure   – is  oZen  not  a  valid  op-on     – even  when  datacenter  resources  are  available     •  access  is  not  con-nuous   •  not  prac-cal  to  reconfigure  them  in  order  to  evaluate   different  network  topologies  and  characteris-cs  (e.g.,   bandwidth  and  latency)   •  Alterna-ves:   – Simula-on  &  Emula-on  
  4. 4. Problem   •  Most  research  on  data  center  networks  do  not  use   realis-c  Big  Data  traffic   –  synthe-c  traffic  paWerns   •  Simplified  shuffle-­‐like  traffic  paWerns   –  e.g.,  all-­‐to-­‐all   –  not  consider  transfer  scheduling  decisions,  number  of   parallel  transfers,  etc.   –  overlap  communica-on  with  computa-on   •  How  the  reported  results  translate  to  performance   improvement  for  actual  analy-cs  run-mes?   4  
  5. 5. Network  Traffic  in  real  Hadoop  Applica-ons   •  a   5   Network  transfers   Network  transfers  
  6. 6. Proposed  solu-on:  MRemu   •  Emula-on-­‐based  framework  for  data  center   network  experimenta-on   •  Highlights:   –  Ability  to  run  a  complete  data  center  in  a  single  server   –  Use  of  realis-c  network  traffic   •  replay  or  extrapolate  from  execu-ons  of  real  applica-ons  in   produc-on  datacenters   –  Mimics  framework  internals  (e.g,  transfer  scheduling,   phases  overlaps,  etc.)   –  Unmodified  code  also  run  in  real  hardware  
  7. 7. MRemu  Architecture   7   Job Trace Hadoop Job Tracing Synthectic Job Generator Topology Description Mininet-HiFI Topology Builder Application Launcher Network Monitor Data center emulator TaskTracker Job Trace Parser Traffic Generator Hadoop MapReduce emulator Logger JobTracker *Mininet  can  be  replaced  with  real  hardware  (e.g.,  run  in  legacy  clusters)   *  
  8. 8. Evalua-on   •  Mininet-­‐HiFi  has  already  been  validated  and  is  widely   used  to  reproduce  networking  research  experiments   •  Accuracy  when  reproducing  MapReduce  workloads.     –  Comparison  with  traces  extracted  from  real  job  execu-ons   –  Two  opera-ons  modes:  replay  mode  and  hadoop  mode   •  Execu-on  environment:   –  Shamrock  datacenter,  IBM  Research   –  HiBench  Benchmark  Suite:    Sort,  Nutch,  PageRank  and   Bayes   8   Handigol,  N.;  Heller,  B.;  Jeyakumar,  V.;  Lantz,  B.;  McKeown,  N.  “Reproducible  Network  Experiments
  9. 9. Accuracy  Evalua-on   9   Job  Comple-on  Time  Accuracy   Individual  Flow  Comple-on  Time  Accuracy  
  10. 10. Nutch  applica-on  with  background  traffic   Sort  applica-on  with  background  traffic   Par--on  skew  problem   Impact  of  the  network  topology   Other  experiments  
  11. 11. Conclusion  and  Future  Work   •  MRemu,  an  emula-on-­‐based  framework  that  enables   conduc-ng  datacenter  network  research   –  without  requiring  expensive  and  con-nuous  access  to   large-­‐scale  datacenter  hardware  resources     •  Available  as  open  source:   –  hWps://github.com/mvneves/mremu   •  Future  work:   –  Extend  it  to  other  frameworks  and  traffic  paWerns   –  Integrate  it  with  Mininet-­‐HiFI  cluster  edi-on   –  Support  to  migra-on  of  “virtual  machines”  
  12. 12. References   •  NEVES,  M.  V.,:  Applica-on-­‐aware  networking  to  Accelerate   MapReduce  Applica-ons  (Ph.D.  Disserta-on),  2015   •  NEVES,  M.  V.;  KATRINIS,  M.  K.;  FRANKE,  H.;  DE  ROSE,  C.  A.  F.;   Pythia:  Faster  Big  Data  in  Mo-on  through  Predic-ve  SoZware-­‐ Defined  Network  Op-miza-on  at  Run-me.  In:  IPDPS  2014,   Phoenix,  USA,  2014     •  NEVES,  M.  V.,  DE  ROSE,  C.  A.  F.,  KATRINIS,  K.  MRemu:  An   Emula-on-­‐based  Framework  for  Datacenter  Network   Experimenta-on  using  Realis-c  MapReduce  Traffic,  MASCOTS   2015,  Atlanta,  USA,  2015.  
  13. 13. MRemu:  An  Emula-on-­‐based   Framework  for  Datacenter  Network   Experimenta-on  using  Realis-c   MapReduce  Traffic     Marcelo  Veiga  Neves1,  Cesar  A.  F.  De  Rose1,     Kostas  Katrinis2   marcelo.neves@pucrs.br       1  PUCRS,  Porto  Alegre,  Brazil   2  IBM  Research,  Dublin,  Ireland   Oct,  2015  

×