SlideShare a Scribd company logo
1 of 14
Lustre over ZFS on Linux
Update on State of the Art HPC Filesystems

Q1-2014
Josh Judd, CTO
© 2013 WARP Mechanics Ltd. All Rights Reserved.
Overview
• Remind me... What is “Lustre over ZFS” and why do I care?
• What was “production grade” last year?
• What has changed since then?

• Why do I care about that?
• Can you give me a concrete implementation example?
• How could I get started on this?

Page

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Slide 2
What is Lustre over ZFS?
• Lustre: Horizontally-scalable “meta” filesystem which sits on top of
“normal” filesystems and makes them big, fast, and unified
– Historically, ext4 provided the “backing” FS for Lustre
– This has scalability & performance issues, plus it lacks features & integrity
assurances

• ZFS: Vertically-scalable “normal” filesystem, which includes many
powerful features, integrity assurances, and an advanced RAID
stack
– Historically, a ZFS filesystem could only exist on a single server at a time
– It could scale vertically, but had no ability to scale out whatsoever

• Lustre/ZFS: Marries the horizontal scalability and performance of
Lustre to the vertical scalability and features of ZFS
Page

Slide 3
• Better together: Each fills missing pieces of the other

© 2013 WARP Mechanics Ltd. All Rights Reserved.
What worked last year?
• LLNL was supporting part of ZFS in the Sequoia system
– This had scalability benefits, and added some features
– They didn’t have ZFS RAID/Volume at high confidence in time for Sequoia

• WARP was supporting all of the ZFS features... But a bit differently
– This was the most complete integration on the market. However...
– WARP ran an OSS connected via RDMA to a separate Solaris-based ZFS box

– This provided feature benefits, but wasn’t unified hardware and didn’t integrate
the file and block layers – e.g., read ahead optimizations were “N/A”
– And you had to manage both Linux and Solaris

Page

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Slide 4
What has changed?
• LLNL has done more refinement and scaling on all layers
• WARP has finished integrating the whole stack onto one controller
• Now you can get file+block aspects of Lustre/ZFS on a single Linux
box
– No Solaris; no external RDMA cables; no extra server in the mix
– 100% of the Lustre/ZFS integration features, e.g. read-ahead optimization
works

• Easy to install from RHEL-style yum repo
• Commercial support available
Page

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Slide 5
Value of this to HPC systems?
• Complete PXE support
– Now you can can PXE boot everything, from RAID to the parallel FS layers,
and run “diskless” – i.e. no image to flash inside storage nodes
– In other words, manage your storage the same way you manage the compute
layer

• Complete base OS control
– Some customers want a GUI; others want control
– Full root access to Linux-based OS – which fully controls all layers of storage

• Complete open source stack
– Lowers cost for storage dramatically – just as it did on the server layer
– Allows “the community” to add features and tools
© 2013 WARP Mechanics Ltd. All Rights Reserved.

Slide 6
Value of this to HPC systems? (cont.)
• Moves in the “mainstream direction” of open source vs. proprietary
• Built-in compression gets 1.7:1 in real-world HPC environments
• ZFS RAID layer supports hybrid SSD pools: massive performance
benefits for Lustre especially with small random reads
• ZFS adds multiple layers of integrity protection: essential at petascale
• Allows running arbitrary user-provided code directly within the
storage array – e.g., CRAM could be implemented inside the
controller

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Slide 7
Concrete example?
• Not a sales pitch, but...
• WARP implements the stack as an appliance
• In this case, the controllers are integrated into the storage enclosure
• We’ll describe how we did it in a minute, but...
• You can “DIY” the same thing using COTS hardware and free
software
• It’s a question of “do you want an appliance with commercial
support”?

Page

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Slide 8
Core architecture of WARPhpc system
• 4u 60bay chassis
• Used for OSS “heads” and disk shelves
• Building block is 12u / 180 bay OSS
“pod”
• 1x chassis has 2x Sandy Bridge OSSs
(HA)
• 2x chassis have SAS JBOD controllers
• Can use100% HDD, 100% SSD, or
hybrid

• Typical case per pod:
– 4GB/s to 12GB/s
– 0.5PB to 1.0PB usable

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Slide 9
CPU-based Controllers for OSSs

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Slide 10
Example HPC Lustre/ZFS Storage
System arranged as 12u “pods”
• Chassis are
– 3x chassis in a group – 1x “smart” and 2x “expansion”
– 180 spindles (“S”), or 150 spindles + 30 SSDs (“H”), or 180 SSDs (“M”)
– 2x controllers (active/active HA) with 4x 56Gbps Infiniband or 4x 40Gbps Ethernet ports
– Each pod runs between 4GBytes/s to 12GBytes/s, depending on drive config and workload
– Example 4-rack system: ~14PB w/ compression (typical) and ~70+GB /sec (typical)

Page

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Slide 11
How to actually build that?
Option 1: WARP just rolls in racks
Option 2: DIY
• Everything about this is open source
• You can hook up COTS servers to JBODs, download code, and go
• Lustre layer works mostly like any other Lustre system

• Differences are in how OSTs are created:
– mkfs.lustre --mgs --backfstype=zfs warpfs-mgt0/mgt0 mirror sdc sdd
– mkfs.lustre --ost --backfstype=zfs –mgsnode=[nid] --fsname warpfs --index=1
warpfs-ost1/ost1 raidz sdf sdg sdh sdi sdj cache sde1

Page

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Slide 12
How to get started?
• WARP and LLNL are in final stages of reviewing a quick start
guide
• This shows how to get Lustre/ZFS running as a VM in minutes
• It’s not a comprehensive guide to Lustre or ZFS, but it distills
procedures down to a simple, “cut and paste” method to get
started
• [...]To get an advance copy, email lustre@warpmech.com
Next, tell the Lustre/ZFS startup scripts (contributed by LLNL) which Lustre services you want to start. Don’t worry that
you haven’t created these filesystems yet. They will be created shortly.
If you prefer the vi method to echoing, then: vi /etc/ldev.conf
Make sure ldev.conf contains the following lines:
warpdemo - mgs zfs:warp-°©‐mgt0/mgts0 warpdemo
[...]

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Slide 13
Thanks!

Sales@WARPmech.com
© 2013 WARP Mechanics Ltd. All Rights Reserved.

More Related Content

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Warp Mechanics: Lustre Over ZFS on Linux Podcast

  • 1. Lustre over ZFS on Linux Update on State of the Art HPC Filesystems Q1-2014 Josh Judd, CTO © 2013 WARP Mechanics Ltd. All Rights Reserved.
  • 2. Overview • Remind me... What is “Lustre over ZFS” and why do I care? • What was “production grade” last year? • What has changed since then? • Why do I care about that? • Can you give me a concrete implementation example? • How could I get started on this? Page © 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 2
  • 3. What is Lustre over ZFS? • Lustre: Horizontally-scalable “meta” filesystem which sits on top of “normal” filesystems and makes them big, fast, and unified – Historically, ext4 provided the “backing” FS for Lustre – This has scalability & performance issues, plus it lacks features & integrity assurances • ZFS: Vertically-scalable “normal” filesystem, which includes many powerful features, integrity assurances, and an advanced RAID stack – Historically, a ZFS filesystem could only exist on a single server at a time – It could scale vertically, but had no ability to scale out whatsoever • Lustre/ZFS: Marries the horizontal scalability and performance of Lustre to the vertical scalability and features of ZFS Page Slide 3 • Better together: Each fills missing pieces of the other © 2013 WARP Mechanics Ltd. All Rights Reserved.
  • 4. What worked last year? • LLNL was supporting part of ZFS in the Sequoia system – This had scalability benefits, and added some features – They didn’t have ZFS RAID/Volume at high confidence in time for Sequoia • WARP was supporting all of the ZFS features... But a bit differently – This was the most complete integration on the market. However... – WARP ran an OSS connected via RDMA to a separate Solaris-based ZFS box – This provided feature benefits, but wasn’t unified hardware and didn’t integrate the file and block layers – e.g., read ahead optimizations were “N/A” – And you had to manage both Linux and Solaris Page © 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 4
  • 5. What has changed? • LLNL has done more refinement and scaling on all layers • WARP has finished integrating the whole stack onto one controller • Now you can get file+block aspects of Lustre/ZFS on a single Linux box – No Solaris; no external RDMA cables; no extra server in the mix – 100% of the Lustre/ZFS integration features, e.g. read-ahead optimization works • Easy to install from RHEL-style yum repo • Commercial support available Page © 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 5
  • 6. Value of this to HPC systems? • Complete PXE support – Now you can can PXE boot everything, from RAID to the parallel FS layers, and run “diskless” – i.e. no image to flash inside storage nodes – In other words, manage your storage the same way you manage the compute layer • Complete base OS control – Some customers want a GUI; others want control – Full root access to Linux-based OS – which fully controls all layers of storage • Complete open source stack – Lowers cost for storage dramatically – just as it did on the server layer – Allows “the community” to add features and tools © 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 6
  • 7. Value of this to HPC systems? (cont.) • Moves in the “mainstream direction” of open source vs. proprietary • Built-in compression gets 1.7:1 in real-world HPC environments • ZFS RAID layer supports hybrid SSD pools: massive performance benefits for Lustre especially with small random reads • ZFS adds multiple layers of integrity protection: essential at petascale • Allows running arbitrary user-provided code directly within the storage array – e.g., CRAM could be implemented inside the controller © 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 7
  • 8. Concrete example? • Not a sales pitch, but... • WARP implements the stack as an appliance • In this case, the controllers are integrated into the storage enclosure • We’ll describe how we did it in a minute, but... • You can “DIY” the same thing using COTS hardware and free software • It’s a question of “do you want an appliance with commercial support”? Page © 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 8
  • 9. Core architecture of WARPhpc system • 4u 60bay chassis • Used for OSS “heads” and disk shelves • Building block is 12u / 180 bay OSS “pod” • 1x chassis has 2x Sandy Bridge OSSs (HA) • 2x chassis have SAS JBOD controllers • Can use100% HDD, 100% SSD, or hybrid • Typical case per pod: – 4GB/s to 12GB/s – 0.5PB to 1.0PB usable © 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 9
  • 10. CPU-based Controllers for OSSs © 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 10
  • 11. Example HPC Lustre/ZFS Storage System arranged as 12u “pods” • Chassis are – 3x chassis in a group – 1x “smart” and 2x “expansion” – 180 spindles (“S”), or 150 spindles + 30 SSDs (“H”), or 180 SSDs (“M”) – 2x controllers (active/active HA) with 4x 56Gbps Infiniband or 4x 40Gbps Ethernet ports – Each pod runs between 4GBytes/s to 12GBytes/s, depending on drive config and workload – Example 4-rack system: ~14PB w/ compression (typical) and ~70+GB /sec (typical) Page © 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 11
  • 12. How to actually build that? Option 1: WARP just rolls in racks Option 2: DIY • Everything about this is open source • You can hook up COTS servers to JBODs, download code, and go • Lustre layer works mostly like any other Lustre system • Differences are in how OSTs are created: – mkfs.lustre --mgs --backfstype=zfs warpfs-mgt0/mgt0 mirror sdc sdd – mkfs.lustre --ost --backfstype=zfs –mgsnode=[nid] --fsname warpfs --index=1 warpfs-ost1/ost1 raidz sdf sdg sdh sdi sdj cache sde1 Page © 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 12
  • 13. How to get started? • WARP and LLNL are in final stages of reviewing a quick start guide • This shows how to get Lustre/ZFS running as a VM in minutes • It’s not a comprehensive guide to Lustre or ZFS, but it distills procedures down to a simple, “cut and paste” method to get started • [...]To get an advance copy, email lustre@warpmech.com Next, tell the Lustre/ZFS startup scripts (contributed by LLNL) which Lustre services you want to start. Don’t worry that you haven’t created these filesystems yet. They will be created shortly. If you prefer the vi method to echoing, then: vi /etc/ldev.conf Make sure ldev.conf contains the following lines: warpdemo - mgs zfs:warp-°©‐mgt0/mgts0 warpdemo [...] © 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 13
  • 14. Thanks! Sales@WARPmech.com © 2013 WARP Mechanics Ltd. All Rights Reserved.