Anúncio
Anúncio

Mais conteúdo relacionado

Similar a 11540800.ppt(20)

Anúncio

11540800.ppt

  1. Lustre Reference Architecture
  2. 1 OSS MDS OSS Enterprise unified storage SAN … IB … Compute node cluster IO node cluster Lustre distributed file system … FC IB Up to 96 GB/s bandwidth per cabinet High performan ce Scalable to 16 controllers High scalability RAID 2.0+ virtualization technology High reliability Lustre Distributed Parallel System
  3. 2 Lustre parallel storage design example  Network  Reference  Configuration Control server: 1 set of RH1288 V3 (each 2*E5-2603 V3, 2*16G MEM,2*300G SAS) Metadata server: 2 sets of RH1288 V3 (each 2*E5-2620v3, 4*16G MEM, 2*300G SAS, 1*FDR single port IB card, 1* dual 8G FC HBA) I/O server: 2 sets of RH1288 V3 (each 2*E5-2620v3, 4*16G MEM, 2*300G SAS, 1*FDR single port IB card, 1* dual 8G FC HBA) Disk layout: 1 set of 5500V3 controller built-in 6 10K 600GB SAS disks (RAID10) 48 7.2K 2TB NL-SAS disks (every 12 RAID6-10 Storage Pool) Software: Intel enterprise version Lustre Writes performance:1.7GB/s Read performance:2.8GB/s Raw capacity:96TB Available capacity: 60.8TB Space: 15U Maximum power : 2.55KW
  4. 3 Lustre parallel storage design example Network  Reference  Configuration Control server: 1 set of RH1288 V3 (each 2*E5-2603 V3, 2*16G MEM,2*300G SAS) Metadata server: 2 sets of RH1288 V3 (each 2*E5-2620v3, 4*16G MEM, 2*300G SAS, 1*FDR single port IB card, 1* dual 8G FC HBA) I/O server: 2 sets of RH1288 V3 (each 2*E5-2620v3, 4*16G MEM, 2*300G SAS, 1*FDR single port IB card, 1* dual 16G FC HBA) Disk layout: 1 set of 5500V3 (the 48GB memory, 25 positions, the controller built-in 12 10K 600GB SAS disks metadata storage RAID10) 96 7.2K 2TB NL-SAS disks (every 12 RAID6-10 Storage Pool makes data storage) Software: Intel enterprise version Lustre Writes performance:3.5GB/s Read performance:4.0GB/s Raw capacity:192TB Available capacity:120.18TB Space:23U Maximum power :3.2KW
  5. 4 Lustre parallel storage design example  Network  Reference  Configuration Control server: 1 set of RH1288 V3 (each 2*E5-2603 V3, 2*16G MEM,2*300G SAS) Metadata server: 2 sets of RH1288 V3 (each 2*E5-2620v3, 4*16G MEM, 2*300G SAS, 1*FDR single port IB card, 1* dual 16G FC HBA) I/O server: 2 sets of RH2288 V3 (each 2*E5-2620v3, 4*16G MEM, 2*300G SAS,1*FDR single port IB+2*16G twin port FC) Disk layout: 1 set of 5800V3 (the 128GB memory), 1 2.5 inches disk box built-in 12 10K 600GB SAS disks metadata RAID10 144 7.2K 2TB NL-SAS disks (every 12 RAID6-10 Storage Pool makes data storage) Software: Intel enterprise version Lustre Writes performance:5.5GB/s Read performance:6GB/s Raw capacity:288TB Available capacity:180TB Space:34U Maximum power:4.1KW
  6. 5 Lustre parallel storage design example  Network  Reference  Configuration Control server: 1 set of RH1288 V3 (each 2*E5-2603 V3, 2*16G MEM,2*300G SAS) Metadata server: 2 sets of RH2288 V3 (each 2*E5-2620v3, 4*16G MEM, 2*300G SAS,1*FDR single port IB+2*16G twin port FC) I/O server: 6 sets of RH2288 V3 (each 2*E5-2620v3, 4*16G MEM, 6*300G SAS,1*FDR single port IB+2*16G twin port FC) Disk layout: 1 set of 5300V3 (the 32GB memory, 25 positions), the built-in built-in 25 10K 1.2TB SAS disks, 2 2.5 inches hard disk box built-in 50 10K 1.2TB SAS disks), metadata RAID10 4 sets of 5800V3 (128GB memory), 576 7.2K 6TB NL-SAS disks (every 12 RAID6-10 Storage Pool makes data storage) Software: Intel enterprise version Lustre Writes performance:22GB/s Read performance:24GB/s Bare capacity:3456TB Available capacity:the 2160TB Space :131U Maximum power :14KW
  7. 6 2 OSS 1 SP-3584 JBOD (OST) 2 OSS 1 SP-3584 JBOD (OST) 2 MDS 1 OceanStor 5300 V3 1 Lustre Manager 2 OSS 1 SP-3584 JBOD (OST) 36GB/s@ Sustained Performance 1.5PB@ Useable Capacity 2 OSS 1 SP-3584 JBOD (OST) IPMI Monitor Network(GE) Management Network(GE) Computing Network(FDR IB) Storage Network (12G SAS) Heart Beat(GE) GE-IPMI-SW1 GE-MGT-SW1 FDR-COMP-SW1 FDR-COMP-SW2 OSS01 OSS02 OneStor-SP3584 JBOD bond bond Heart Beat Approach using ZFS and JBOD
  8. 7 Form Factor 1U Server RH1288 V3 CPU Intel Xeon E5-2603 v3 Memory 2 x 8GB 2133MHz DDR3 PCIe Slot 2 x PCIe 3.0 x16 Other 2 x GE Ports 1 x Intel Lustre Manager Server Form Factor 1U Server RH1288 V3 CPU Intel Xeon E5-2620 v3 Memory 16 x 16GB 2133MHz DDR3 PCIe Slot 2 x PCIe 3.0 x16 HBA 1 x Single-Port FDR IB HCA 1 x Dual-Port 8G FC HBA Other 2 x GE Ports 2 x MDS Form Factor 2U Server RH2288 V3 CPU Intel Xeon E5-2620 v3 Memory 8 x 8GB 2133MHz DDR3 PCIe Slot 5 x PCIe 3.0 x8 HBA 1 x Single-Port FDR IB HCA 2 x LSI SAS 9300-8e HBA Other 2 x GE Ports 8 x OSS 1 x MDT Storage Form Factor 2U Storage OceanStor 5300 V3 Cache 32GB Disk 25 x 1.2TB 10kRPM SAS Raid RAID 10 Other 8 x 8G FC Ports 4 x OST JBOD Form Factor 5U 84-slots Model Seagate OneStor SP-3584 Disk 84 x 6TB NL SAS IO Port 2 Front 12Gb/s SAS ports Form Factor 1U Model Mellanox SX6025 36-Port FDR IB Switch 2 x FDR IB Switch Form Factor 1U Model Huawei CE5855 48-Port GE Switch 2 x GE Switch JBOD Configuration
  9. 8 • Existing HPC devices can not meet the requirement of performance and business • Long time construction period, complicated management • Cloud evolution in the future, providing commercial service and internal use Challenges Solution • 240 * CH121v3 blade node (Haswell E5-2697V3CPU) • 56Gbps FDR Infiniband Network • Lustre parallel storage Customer Benefits • Optimal performance: Rpeak =279.5TFlops • Energy saving: 2935.85Mflops/Watt • Efficient management: Intuitive graphical operations and various scheduling algorithms Interdisciplinary Centre for Mathematical and Computational Modeling  PL-GRID Plus project funded by EU was established in 2012, and its goal is to build a H PC platform for Polish and international scientific research and commercial applications within three years  13 application area: Astronomy, HEP, nano-technology, acoustics, life sciences, quantu m chemistry and molecular physics, ecology, energy, bioinformatics, health sciences, m aterials, metallurgy HPC System University of Warsaw, Poland
  10. 9 HPC System for Newcastle University, UK Customer Benefits Solution • Rapid growth of computing data: single node requires 512 GB memory, overall computing capability > 88 TFLOPS • I/O read and write bandwidth: > 1 GB/s • System can be easily expanded • Rapid deployment: Integrated computing, network, and storage resources reduces service deployment time to three days. • Simple management: Unified management minimizes maintenance workloads. • Easy expansion: easy expansion to reduce later investment costs and meet expansion requirements for the next five years. • Huawei All-in-One HPC solution: 122 E9000 blade server nodes, S5500T storage device, full-10 GE high speed network, Huawei cluster management software, with Lustre high-performance, parallel storage. • The All-in-One HPC modular solution meets requirements for future expansion.(So far, completed four phases deployment) Challenges Bioinformatics Medicine R&D Storage Computing Network Huawei HPC solution "We are impressed by Huawei's HPC solution that integrates computing, storage, and network resources. Huawei's HPC solution provides us with an easy-to-expand and cost-effective HPC cluster. This is the solution we need." — Jason Bain, Deputy Dean of the IT Infrastructure and Architecture Department, Newcastle University Molecular dynamics simulation
  11. 10 HPC Platform for BJTU, China  Task-driven IT infrastructure causes decentralized computing resources and inefficiency planning.  The existing IT system has difficulty in HPC software deployment and further expansion and causes high storage cost.  A unified educational cloud computing platform is required. Challenges  Huawei provides a one-stop solution. E9000 blade servers configured with the CH121 compute nodes are used to implement high-density computing. The 2 U four-socket RH2485 V2 rack servers are used to provide large storage capacity. Huawei also uses the RH2285 V2 servers to deploy the GPGPU nodes to perform graphics acceleration, floating-point calculation, and concurrent computing acceleration.  The Huawei FusionCluster provides powerful policy management by using a wide range of task and resource scheduling algorithms and supporting user-defined scheduling algorithms.  The Huawei HPC solution allows storage capacity expansion to the PB level by adopting the Lustre storage scheme, meeting expansion requirements in the next five years. Solution  The Huawei one-stop solution provides 30.5 TFLOPS and improves the overall system performance by 80%.  The HPC platform implements unified resource planning, improving equipment utilization efficiency and reducing the costs from maintenance personnel and redundant construction.  The deployment period is greatly shortened thanks to the one-stop delivery of hardware and software. The HPC platform offers high scalability that meets requirements for system capacity expansion in the next 10 years.  Virtualization and cloud computing technologies reduce IT management and operational costs and improve hardware utilization for BJTU. Customer Benefits 用户logo 应用场景图 1、第一选择:客户实景图,包括建筑大楼、机房、布局图 等代表图片 2、第二选择:客户所在行业/城市代表图片
  12. 11 Solution • The previous standalone serial computing mode had poor computing capabilities. Simulated computing was time-consuming and complex computing tasks could not be executed, which hindered the research progress of the university. • Computing and storage resources could not be shared, which resulted in a low resource utilization rate. Challenges Customer Benefits HPC Platform for YTU, Turkey • Yildiz Technical University (YTU), founded in 1911, is one of the seven public universities in Istanbul and is also one of the oldest and most prominent universities in Istanbul. Its predecessor is Yildiz University founded in 1882. YTU is dedicated to engineering sciences, and has three campuses and 17000+ students nowadays. YTU wanted to deploy an HPC platform to improve its abilities in the scientific research and to provide various HPC resources for enterprises in its science park. • 256 RH2288 V2 2-socket rack servers were deployed as computing servers to provide the maximum computing capability of 90 TFLOPS. 10 RH5885H V3 servers and 40 NVDIA K20X GPGPUs were used as acceleration nodes to provide the maximum computing capability of 57.5 TFLOPS. • One OceanStor18000 was deployed as the storage system, which provided 300 TB storage capacity. Six RH2288 V2 servers were deployed to run the commercial Lustre software, which provided a high- performance parallel file system. • The QDR 40GE network was used to ensure high-speed data communication. • Huawei's HPC platform provides superb computing performance to improve scientific research efficiency by 80%. • End-to-end provisioning and unified management reduce maintenance cost by 30%.
  13. Copyright©2016 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There is a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice. HUAWEI ENTERPRISE ICT SOLUTIONS A BETTER WAY
  14. 13 Entry Level 1 Lustre control server RH2285H (2*E5-2403, 2*8G MEM, 2*300G SAS) 2 MDS nodes RH2288H (2*E5-2620, 8*8G MEM, 2*300G SAS, 1*QDR/FDR IB card, 1* dual 8G FC HBA) 2 OSS nodes RH2288H (2*E5-2620, 8*8G MEM, 2*300G SAS, 1*QDR/FDR IB card, 1* dual 8G FC HBA) 1 S2600T, 16G cache; 600G SAS disk for MGT and MDT; 2~4TB NLSAS hard disks 4 groups RAID6(8+2) for OST The servers and SAN uses dual 8G FC connectivity hence no need for an FC switch Deploys IEEL Configuration Performance IOR write bandwidth 1GB/s IOR read bandwidth 1.8GB/s S2600T MDS Pair OSS Pair INFINIBAND 8G FC Computing nodes Network chart ETH
  15. 14 Bigger 1 Lustre control server RH2285H (2*E5-2403, 2*8G MEM, 2*300G SAS) 2 MDS nodes RH2288H (2*E5-2620, 8*8G MEM, 2*300G SAS, 1*QDR/FDR IB card, 1* dual 8G FC HBA) 2 OSS nodes RH2288H (2*E5-2620, 8*8G MEM, 2*300G SAS, 1*QDR/FDR IB card, 1* dual 8G FC HBA) 1 S5600T, 48G cache; 600G SAS for MGT and MDT; 2~4TB NLSAS hard disks 6 groups RAID6(8+2) for OST The servers and SAN uses dual 8G FC connectivity hence no need for a FC switch Deploys IEEL Configuration Performance IOR write bandwidth 1.55GB/s IOR read bandwidth 2.9GB/s Network chart S5600T MDS Pair OSS Pair INFINIBAND 8G FC Computing nodes ETH
  16. 15 Bigger Higher Performance 1 Lustre control server RH2285H (2*E5-2403, 2*8G MEM,2*300G SAS) 2 MDS nodes RH2288H (2*E5-2620, 8*8G MEM, 2*300G SAS, 1*QDR/FDR IB card, 1* dual 8G FC HBA) 2 OSS nodes RH2288H (2*E5-2620, 8*8G MEM, 2*300G SAS, 1*QDR/FDR IB card, 2* dual 8G FC HBA) 1 S2600T 600G SAS hard disk for MGT and MDT 1 S6800T, 192G cache 2~4TB NLSAS hard disks 8 groups RAID6(8+2) for OST The servers and SAN uses dual 8G FC connectivity hence no need for a FC switch Each MDS node has 2 FC connections, each OSS node has 4 FC connections Deploys IEEL Configuration Performance IOR writes bandwidth 3.2GB/s IOR read bandwidth 4GB/s S2600T MDS Pair S6800T OSS Pair INFINIBAND 8G FC Computing nodes ETH Network chart
  17. 16 Bigger Again 1 Lustre control server RH2285H (2*E5-2403, 2*8G MEM,2*300G SAS) 2 MDS nodes RH2288H (2*E5-2620, 8*8G MEM, 2*300G SAS, 1*QDR/FDR IB card, 1* dual 8G FC HBA) 8 OSS nodes RH2288H (2*E5-2620, 8*8G MEM, 2*300G SAS, 1*QDR/FDR IB card, 1* dual 8G FC HBA) 1 S2600T, 600G SAS hard disk for MGT and MDT 4 S5600T, 48G cache 2~4TB NLSAS hard disks, 6 groups RAID6(8+2), for OST The server and FC SAN uses double controls the 8G FC straight company, does not need the FC switchboard Deploys IEEL Configuration Performance IOR write bandwidth 6GB/s IOR read bandwidth 10GB/s S2600T S5600T MDS Pair OSS Pair S5600T OSS Pair S5600T OSS Pair S5600T OSS Pair INFINIBAND 8G FC Computing nodes Lstre manage ETH
  18. 17 Bigger Again Higher Performance 1 Lustre control server, configures RH2285H V2 (2*E5-2403, 2*8G MEM,2*300G SAS, board carries GE) 2 MDS nodes, configure RH2288H V2 (2*E5-2620v2, 8*8G MEM, 2*300G SAS, 1*QDR/FDR IB card, 1* dual 8G FC HBA) 8 OSS nodes, configure RH2288H V2 (2*E5-2620v2, 8*8G MEM, 2*300G SAS, 1*QDR/FDR IB card, 2* dual 8G FC HBA) 1 S2600T, 600G SAS hard disk for MGT and MDT 4 S6800T, 192G cache, 2~4TB NLSAS hard disk, 8 groups RAID6(8+2) for OST The servers and SAN uses dual 8G FC connectivity hence no need for a FC switch Each MDS node has 2 FC connections, each OSS node has 4 FC connections Deploys IEEL Configuration Performance IOR write bandwidth 12GB/s IOR read bandwidth 16GB/s S2600T S6800T MDS Pair OSS Pair S6800T OSS Pair S6800T OSS Pair S6800T OSS Pair INFINIBAND 8G FC Computing nodes ETH

Notas do Editor

  1. 组网:采用开源或intel的商业版Lustre客户端,组网容易,成本较低,业内使用较多,HPC领域中常用。要求用户有一定开发能力 扩展性:Lustre在设计时没有考虑系统扩展后的重新负载均衡功能,系统扩展时容易出现局部热点,解决该问题需要手动调优,对长时运算的客户很难接受。 可靠性:元数据放在单一引擎内,元数据较多时查询容易形成瓶颈,且可靠性较低。 Lustre广泛应用于各种环境,目前部署最多的为高性能计算HPC,世界超级计算机TOP 10中的70%,TOP 30中的50%,TOP 100中的40%均部署了Lustre。 华为Lustre存储的系统架构,物理上MDS和OSS共享存储设备,逻辑上划分出多个LUN分给MDT和OST,MDT分配给MDS,OST分配给OSS。MDS和OSS节点都配置40G QDR IB或10GE网卡,连接到高速计算网络,同时也都通过GE接口连接到系统管理网。系统管理网,主要是用于集群部署OS、集群监控管理等方面。存储网络,配置了2台FC交换机SNS2120,通过多路径实现冗余。一般所提到的HPC系统三网,就是计算网、系统管理网以及存储网。当然,除此之外,其实还有硬件管理网,即BMC管理网。
  2. 华沙大学ICM学院:华沙大学成立于1816年,是波兰最大、最好的公立大学,其中ICM(跨学科数学与计算模型中心) 成立于1993年。ICM致力于研究物理、化学、理论生物、生命科学等学科,同时自建的超算集群,服务整个波兰的科学家,也向商业公司进行出租计算资源. 2012年客户部署的IBM BlueGene Q在当年11月排名TOP500的第143名。 2007年在欧盟资金的帮助下,PL-GRID在5所高校都建立不同规模的HPC计算集群。 PL-GRID Plus项目成立。在2012-2014 3年内建造一个意在为波兰和国际科学研究和商业应用服务HPC。华沙大学的ICM获得新建一个约为200TFLOPS(200万亿)-300TFLOPS(300万亿)的HPC集群的预算,并力争在2015年的上半年HPC TOP500的榜单里头继续延续自己的排名!
  3. 项目已经分期交付四期,系统已平滑扩容至122节点
  4. The reference price (server according to 4.5 booklets, stores according to 1.1 booklets):¥391600 And, 1st, server:¥137000 2nd, storage:¥161000 3rd, software(Intel Lustre, domestic price, 3 years serves, do not contain OS):¥93600 Notice:Here price is only for reference, the practical project must take a product manager to quote price as the standard!
  5. The reference price (server presses 45%, storesaccording to 11%):¥744782 And, 1st, server:¥137000 2nd, storage:¥514182 3rd, software(Intel Lustre, domestic price, 3 years serves, do not contain OS):¥93600 Notice:Here price is only for reference, the practical project must take a product manager to quote price as the standard! Stores the available capacity:2TB*8 block *6RAID group *1 set *0.931=89TB Yuan data available capacity:89*2%=1.8TB,8 block 600SAS
  6. 22 600G SAS make RAID10, provides a 6.6T Yuan data to store, may satisfy the actual data storage of 600T.If system storage capacity surpasses 600T, when design needs to add the SAS hard disk, may configure 32 disks most greatly, 30 disks make RAID10, provides the 9T Yuan data storage space, other 2 do prepares hotly. The reference price (server presses 45%, storesaccording to 12% booklets):¥1049597 And, 1st, server:¥144000 2nd, storage:¥811997 3rd, software(Intel Lustre, domestic price, 3 years serves, do not contain OS):¥93600 Notice:Here price is only for reference, the practical project must take a product manager to quote price as the standard! Stores the available capacity:2TB*8 block *8RAID group *1 set *0.931=119TB Yuan data available capacity:119*2%=2.4TB,10 block 600SAS
  7. The reference price (server presses 45%, storesaccording to 11%):¥2684448 And, 1st, server:¥322000 2nd, storage:¥1988048 3rd, software(Intel Lustre, domestic price, 3 years serves, do not contain OS):¥374400 Notice:Here price is only for reference, the practical project must take a product manager to quote price as the standard! Stores the available capacity:2TB*8 block *6RAID group *4 sets *0.931=357TB Yuan data available capacity:357*2%=7.2TB,27 block 600SAS
  8. The reference price (server presses 45%, storesaccording to 11%):¥3545558 And, 1st, server:¥355000 2nd, storage:¥3135658// 3rd, software(the Intel Lustre,3 year serves, does not contain OS):¥374400 Notice:Here price is only for reference, the practical project must take a product manager to quote price as the standard! Stores the available capacity:2TB*8 block *8RAID group *4 sets *0.931=476TB Yuan data available capacity:476*2%=9.5TB,35 block 600SAS
Anúncio