O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 32 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (20)

Anúncio

Semelhante a Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer (20)

Mais de Förderverein Technische Fakultät (20)

Anúncio

Mais recentes (20)

Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer

  1. 1. Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
  2. 2. Agenda � National Supercomputer Center in Tianjin( NSCC-TJ) � TH-1A system � Hardware sub-system � Software sub-system � Applications
  3. 3. NSCC-TJ � National SuperComputer Center in Tianjin � Sponsored by � Chinese Ministry of Science and Technology � Tianjin Binhai New Area � Public information infrastructure � To accelerate the economy, education and industry of Northern China � To provide high performance computing service to whole China � Open platform for research and education
  4. 4. NSCC-TJ Main building office Computer room Transformer station & Total area: 2400m2 air conditioner
  5. 5. NSCC-TJ The first floor of central computing room: 1200m2
  6. 6. NSCC-TJ The second floor of central computing room: Visualization environment, 1200m2
  7. 7. NSCC-TJ Electric transformer station
  8. 8. NSCC-TJ Cooling water station 2011-6-28 TH-1 8
  9. 9. NSCC-TJ � Layout of computing room
  10. 10. TH-1A system
  11. 11. TH-1A system � Enhanced system based on TH-1 system (Sep. 2009) � Installed in NSCC-TJ, Aug. 2010 � Debugging and performance testing, Sept.~Oct. 2010 Sept.~Oct. � On service, after Nov. 2010 Items Configuration Processors 14336 Intel CPUs + 7168 nVIDIA GPUs + 2048FT CPUs Memory 262TB in total Interconnect Proprietary high-speed interconnecting network Storage 2PB 120 Compute / service Cabinets Cabinets 14 Storage Cabinets 6 Communication Cabinets
  12. 12. TH-1A system � TH-1A System Architecture � Hybrid MPP structure: CPU & GPU � Proprietary compute nodes � Connected by proprietary high-speed interconnect network � Global shared parallel storage system � Custom software stack
  13. 13. TH-1A hardware sub-system Service Service Compute sub-system Compute sub-system sub-system sub-system CPU CPU CPU CPU CPU … Operation Operation diagnosis sub-system diagnosis sub-system + + + + + node node GPU GPU GPU GPU GPU Monitor and Monitor and Communication sub-system Communication sub-system Storage sub-system Storage sub-system MDS … OSS OSS OSS OSS
  14. 14. Compute sub-system � 7,168 compute nodes � 2 six-core CPU and 1 GPU per node � CPU �Xeon X5670 ( Westmere ) (Westmere Westmere) �Processor speed - 2.93GHz � GPU �NVIDIA Tesla M2050 �Connected with CPU by PCI-E � 32GB memory per node � 2U height � Peak performance �4,701,061Gflops
  15. 15. Service sub-system � 1,024 service nodes � 2 eight-core domestic CPUs � CPU: FT-1000 � SoC � 1.0GH z 1.0GHz � Eight-core, eight-thread per ight-core, core � Peak performance 8Gflops � 32GB memory per node � For login, compile, and applications need throughput computing
  16. 16. Proprietary interconnection network � Interconnection signal speed – 10Gbps � Bi-directional bandwidth – 160Gbps � Hierarchy fat-tree structure � First stage: 16 nodes connected by 16-port switching board � Second stage: all parts connected to eleven 384-port switches
  17. 17. Proprietary interconnection network � High radix router ASIC:NRC ASIC: � Feature size :90nm � Die size:17.16mm x 17.16mm size: � Package :FC-PBGA Package: � 2577 pins � Throughput of single NRC: 2.56Tbps � Network interface ASIC:NIC � Same feature size and package as NRC � Die size :10.76mm x 10.76mm size: � 675 pins
  18. 18. Proprietary interconnection network 16-port switch board in cabinet Leaf switch blade and Root switch blade of 384-ports switch Back plane of 384-ports switch about 700mm *600mm 700mm*
  19. 19. Proprietary interconnecting network � Switching board and high-radix switch � Based on network interface ASIC and router ASIC � Reduced user communication protocol � Throughput: 61.44Tbps Front two 384-port high-radix switches Back
  20. 20. Storage sub-system � Capacity: 2 PB � Connected by proprietary interconnection network � Lustre based parallel file system
  21. 21. Monitor and diagnosis sub-system � Rich monitor & control functions � Real-time monitor hardware parameters � Precise fault position � Alarm and immediate action against emergency � Self-feedback cool adjust for environment status � I2C & JTAG diagnosis mechanism � Large scale console � Remote monitor and management
  22. 22. Computing cabinet � Node: 2 CPUs and 1 GPU � Blade: 2 nodes � Frame � 8 computing blades � 16-port switching board � 1 monitor and diagnosis board � Cabinet � 4 frames, 64 nodes � Close-coupled chilled water cooling � 128 CPUs, 64 GPU � 56KW cooling capacity in a cabinet � Footprint � 700m2
  23. 23. TH-1A software sub-system � Software stack
  24. 24. Operating system � Kylin Linux � compute node kernel � Provide virtual running environment � Isolated running environments for different users � Custom software package installation � QoS support � Power aware computing
  25. 25. Compiler system � C, C++, Fortran, Java � OpenMP, MPI, OpenMP/MPI OpenMP, OpenMP/MPI � CUDA, OpenCL � Heterogeneous programming framework � Accelerate the large scale, complex applications, especially for applications in developing status or their full source codes are not available � Use the computing power of CPUs and GPUs, hide the GPU GPUs, programming to users � Inter-node homogeneous parallel programming (users) � Intra-node heterogeneous parallel computing (computer experts)
  26. 26. Compiler system � Heterogeneous programming framework � Inter-node homogeneous parallel programming (JASMIN) � Patch-based objects data structures � MPI communication, dynamic load balancing support � Zero-copy optimization in communication library
  27. 27. Compiler system � Heterogeneous programming framework � Intra-node heterogeneous parallel computing � Compiler optimized / hand-tuned threaded code � Optimizations include � Adaptive partitioning, balance the workloads between CPUs and GPU � Asynchronous data transfer / computing, overlap CPU operations with GPU operations � Software pipelining, overlap GPU computing with data transfer between host and GPU device memory � ……
  28. 28. Compiler system � Heterogeneous programming framework � An example: 3-D short range molecular simulations � For each time step � Split workload (force calculation) between CPU and GPU � For each patch allocated to GPU � Start asynchronous operations: transfer the patch data to GPU, compute the patch, get results from GPU � For each patch allocated to CPU � Launch threads on CPU cores to compute the patch � CPU waits for GPU completion event � Adjust the split value according to the CPU/GPU performance (patches per second + empirical ) � Other workload (velocity, position) computed on CPU � Performance: one NVIDIA M2050 GPU is 3 times faster than one Intel X5670 CPU
  29. 29. Programming environment � Virtual running environments � Provide services on demand � Parallel toolkits � Based on Eclipse � To integrate all kinds of tools � Editor, debugger, profiler � Work flow support � Support QoS negotiate � Reserve resource for future requirement
  30. 30. Visualization system � Application area � Numerical weather forecast � Computational fluid dynamics � Oil exploration � Other large-scale data � Computing platform � Tianhe-1A � Render server � 128 CPU + 64 GPU � Display device � 3x6 multi-channel display wall
  31. 31. Applications � Oil exploration � High-end equipment development � Bio-medical research � Animation design � New energy research � New material research � Weather and climate forecasting � Engineering design, simulation and analysis � Remote sensing data processing � Financial risk analysis
  32. 32. Thanks

×