SlideShare a Scribd company logo
1 of 34
Download to read offline
LLNL-PRES-643675
This work was performed under the auspices of the U.S. Department
of Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
LinuxCon 2013
Brian Behlendorf, Open ZFS on Linux
September 17, 2013
Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx
2
●
Advanced Simulation
– Massive Scale
– Data Intensive
●
Top 500
– #3 Sequoia
●
20.1 Peak PFLOP/s
●
1,572,864 cores
●
55 PB of storage at 850 GB/s
– #8 Vulcan
●
5.0 Peak PFLOPS/s
●
393,216 cores
●
6.7 PB of storage at 106 GB/s
●
●
High Performance Computing
World class computing resources
Lawrence Livermore National Laboratory LLNL-PRES-643675
3
●
First Linux cluster deployed in 2001
●
Near-commodity hardware
●
Open Source
●
Clustered High Availability Operating
System (CHAOS)
– Modified RHEL Kernel
– New packages – monitoring,
power/console, compilers, etc
– Lustre Parallel Filesystem
– ZFS Filesystem
Linux Clusters
LLNL Loves Linux
Lawrence Livermore National Laboratory LLNL-PRES-643675
4
●
Massively parallel distributed filesystem
●
Lustre servers use a modified ext4
– Stable and fast, but...
– No scalability
– No data integrity
– No online manageability
●
Something better was needed
– Use XFS, BTRFS, etc?
– Write a filesystem from scratch?
– Port ZFS to Linux?
Lustre Filesystem
Existing Linux filesystems do not meet our requirements
Lawrence Livermore National Laboratory LLNL-PRES-643675
5
●
2008 – Prototype to determine viability
●
2009 – Initial ZVOL and Lustre support
●
2010 – Development moved to Github
●
2011 – POSIX layer added
●
2011 – Community of early adopters
●
2012 – Production usage of ZFS
●
2013 – Stable GA release
ZFS on Linux History
Community involvement was exceptionally helpful
Lawrence Livermore National Laboratory LLNL-PRES-643675
6
ARC
ZIO
VDEV Configuration
DMU
ZAP
ZIL
DSL
Traversal
OST
OFD
ZFS OSD
MDT
MDD
ZVOL /dev/zfsZPL
libzfs
ZFS CLI
Interface
Layer
Transactional
Object
Layer
Pooled
Storage
Layer
User
Kernel
Architecture
SPL
Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx
7
●
Core ZFS code is self contained
– May be built in user space or kernel space
– Includes functionality for snapshots, clones, I/O pipeline, etc
●
Solaris Porting Layer
– Adds stable Solaris/Illumos interfaces
●
Taskqs, lists, condition variables, rwlocks, memory allocators, etc...
– Layered on top of Linux equivalents if available
– Solaris specific interfaces were implemented from scratch
●
Vdev disk
– Interfaces with the Linux kernel block layer
– Had to be rewritten to use native Linux interfaces
Linux Specific Changes
The core ZFS code required little change
Lawrence Livermore National Laboratory LLNL-PRES-643675
8
●
Interface Layer
– /dev/zfs
●
Device node interface for user space zfs and zpool utilities
●
Minor changes needed for a Linux character device
– ZVOL: ZFS Volumes
●
Reimplemented as Linux block driver which is backed by the DMU
– ZPL: ZFS Posix Layer
●
Most complicated part, there are significant VFS differences
●
In general new functions were added for the Linux VFS handlers
●
If possible the Linux handlers use the equivalent Illumos handler
– Lustre
●
Support for Lustre was added by Sun/Oracle/Whamcloud/Intel
●
Lustre directly layers on the DMU and does not use the Posix Layer
Linux Specific Changes
The majority of changes to ZFS were done in the interface layer
Lawrence Livermore National Laboratory LLNL-PRES-643675
9
●
8k stacks
– Illumos allows larger stacks
– Needed to get stack usage down to support stock distribution kernels
– Code reworked as needed to save stack
●
Gcc
– ZFS was written for C99, Linux kernel is C89
– Fix numerous compiler warnings
●
GPL-only symbols
– ZFS is under an open source license but may not use all exported symbols
– This includes basic functionality such as work queues
– ZVOLs can't add entries in /sys/block/
– .zfs/snapshot's can't use the automounter
●
User space
– Solaris threads used instead of pthreads
– Block device naming differences
– Udev integration
Porting Issues
Lawrence Livermore National Laboratory LLNL-PRES-643675
10
●
Memory management
– ZFS relies heavily on virtual memory for its data buffers
●
By design the Linux kernel discourages the use of virtual memory
●
To resolve this the SPL provides a virtual memory based slab
– This allows use of the existing ZFS IO pipeline without modification
– Fragmentation results in underutilized memory
– Stability concerns under low memory conditions
●
We plan to modify ZFS to use scatter-gather lists of pages under Linux
●
Allows larger block sizes
●
Allows support for 32-bit systems
– The ARC is not integrated with the Linux page cache
●
Memory used by the ARC is not reported as cached pages
●
Complicates reclaiming memory from the ARC when needed
●
Requires an extra copy of the data for mmap'ed I/O
Porting Issues
Lawrence Livermore National Laboratory LLNL-PRES-643675
11
●
Features
– O_DIRECT
– Asynchronous IO
– POSIX ACLs
– Reflink
– Filefrag
– Fallocate
– TRIM
– FMA Infrastructure (event daemon)
– Multiple Modified Protection (MMP)
– Large blocks
Future Work
Possibilities for future work
Lawrence Livermore National Laboratory LLNL-PRES-643675
12
●
All source code and the issue tracker are kept at Github
– http://github.com/zfsonlinux/zfs
– 70 contributors, 171 forks, 816 watchers
●
Not in mainline kernel
– Similar to resier4, unionfs, lustre, ceph, …
– Autotools used for compatibility
– Advantages
●
One code base used for 2.6.26 - 3.11 kernels
●
Users can update ZFS independently from the kernel
●
Simplifies keeping in sync with Illumos and FreeBSD
●
User space utilities and kernel modules can share code
– Disadvantages
●
Support for the latest kernel lags slightly
●
Higher maintenance and testing burden for developers
Source Code
Lawrence Livermore National Laboratory LLNL-PRES-643675
13
●
Stable and used in production
●
Performance is comparable to existing Linux filesystems
●
All major ZFS features are available
– Simplified administration - Stripes, Mirrors, and RAIDZ[1,2,3]
– Online management - ZFS Intent Log (ZIL)
– Snapshots / Clones - L2ARC Tiered Caching
– Special .zfs directory - Transparent compression
– Send/receive of snapshots - Transparent deduplication
– Virtual block devices (ZVOL)
●
Currently used by Supercomputers, Desktops, NAS appliances
●
Enthusiastic user community
– zfs-discuss@zfsonlinux.org
Where is ZFS on Linux Today
ZFS is available on Linux today!
Lawrence Livermore National Laboratory LLNL-PRES-643675
14
●
ZFS is Open Source under the CDDL
●
ZFS is NOT a derived work of Linux
– “It would be rather preposterous to call the Andrew FileSystem a 'derived
work' of Linux, for example, so I think it's perfectly OK to have a AFS module,
for example.”
●
Linus Torvalds
– “Our view is that just using structure definitions, typedefs, enumeration
constants, macros with simple bodies, etc., is NOT enough to make a
derivative work. It would take a substantial amount of code (coming from
inline functions or macros with substantial bodies) to do that.”
●
Richard Stallman
Licensing
ZFS can be used in Linux
Lawrence Livermore National Laboratory LLNL-PRES-643675
15

More Related Content

What's hot

XenTT: Deterministic Systems Analysis in Xen
XenTT: Deterministic Systems Analysis in XenXenTT: Deterministic Systems Analysis in Xen
XenTT: Deterministic Systems Analysis in XenThe Linux Foundation
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and WhereKernel TLV
 
Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosGluster.org
 
OpenZFS at AsiaBSDcon FreeBSD Developer Summit
OpenZFS at AsiaBSDcon FreeBSD Developer SummitOpenZFS at AsiaBSDcon FreeBSD Developer Summit
OpenZFS at AsiaBSDcon FreeBSD Developer SummitMatthew Ahrens
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingAnne Nicolas
 
Kernel Recipes 2016 - Landlock LSM: Unprivileged sandboxing
Kernel Recipes 2016 - Landlock LSM: Unprivileged sandboxingKernel Recipes 2016 - Landlock LSM: Unprivileged sandboxing
Kernel Recipes 2016 - Landlock LSM: Unprivileged sandboxingAnne Nicolas
 
NeXTBSD aka FreeBSD X
NeXTBSD aka FreeBSD XNeXTBSD aka FreeBSD X
NeXTBSD aka FreeBSD XiXsystems
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 
Python on FreeBSD
Python on FreeBSDPython on FreeBSD
Python on FreeBSDpycontw
 
Kkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summitKkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summitGluster.org
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013dotCloud
 
network filesystem briefs
network filesystem briefsnetwork filesystem briefs
network filesystem briefsbergwolf
 
NFS updates for CLSF
NFS updates for CLSFNFS updates for CLSF
NFS updates for CLSFbergwolf
 
Porting the drm/kms graphic drivers to DragonFlyBSD by Francois Tigeot
Porting the drm/kms graphic drivers to DragonFlyBSD by Francois TigeotPorting the drm/kms graphic drivers to DragonFlyBSD by Francois Tigeot
Porting the drm/kms graphic drivers to DragonFlyBSD by Francois Tigeoteurobsdcon
 
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
Kernel Recipes 2016 -  New hwmon device registration API - Jean DelvareKernel Recipes 2016 -  New hwmon device registration API - Jean Delvare
Kernel Recipes 2016 - New hwmon device registration API - Jean DelvareAnne Nicolas
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep diveTrinath Somanchi
 
Lcna example-2012
Lcna example-2012Lcna example-2012
Lcna example-2012Gluster.org
 
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Anne Nicolas
 

What's hot (20)

XenTT: Deterministic Systems Analysis in Xen
XenTT: Deterministic Systems Analysis in XenXenTT: Deterministic Systems Analysis in Xen
XenTT: Deterministic Systems Analysis in Xen
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and Where
 
Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vos
 
OpenZFS at AsiaBSDcon FreeBSD Developer Summit
OpenZFS at AsiaBSDcon FreeBSD Developer SummitOpenZFS at AsiaBSDcon FreeBSD Developer Summit
OpenZFS at AsiaBSDcon FreeBSD Developer Summit
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
 
Kernel Recipes 2016 - Landlock LSM: Unprivileged sandboxing
Kernel Recipes 2016 - Landlock LSM: Unprivileged sandboxingKernel Recipes 2016 - Landlock LSM: Unprivileged sandboxing
Kernel Recipes 2016 - Landlock LSM: Unprivileged sandboxing
 
NeXTBSD aka FreeBSD X
NeXTBSD aka FreeBSD XNeXTBSD aka FreeBSD X
NeXTBSD aka FreeBSD X
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
Python on FreeBSD
Python on FreeBSDPython on FreeBSD
Python on FreeBSD
 
Kkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summitKkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summit
 
CentOS at Facebook
CentOS at FacebookCentOS at Facebook
CentOS at Facebook
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013
 
network filesystem briefs
network filesystem briefsnetwork filesystem briefs
network filesystem briefs
 
NFS updates for CLSF
NFS updates for CLSFNFS updates for CLSF
NFS updates for CLSF
 
Porting the drm/kms graphic drivers to DragonFlyBSD by Francois Tigeot
Porting the drm/kms graphic drivers to DragonFlyBSD by Francois TigeotPorting the drm/kms graphic drivers to DragonFlyBSD by Francois Tigeot
Porting the drm/kms graphic drivers to DragonFlyBSD by Francois Tigeot
 
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
Kernel Recipes 2016 -  New hwmon device registration API - Jean DelvareKernel Recipes 2016 -  New hwmon device registration API - Jean Delvare
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep dive
 
Foss Gadgematics
Foss GadgematicsFoss Gadgematics
Foss Gadgematics
 
Lcna example-2012
Lcna example-2012Lcna example-2012
Lcna example-2012
 
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
 

Similar to OpenZFS at LinuxCon

제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustre제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustreTommy Lee
 
Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Roger Zhou 周志强
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfsRami Jebara
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1sprdd
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1sprdd
 
ZFS for Databases
ZFS for DatabasesZFS for Databases
ZFS for Databasesahl0003
 
ZFS Tutorial USENIX June 2009
ZFS  Tutorial  USENIX June 2009ZFS  Tutorial  USENIX June 2009
ZFS Tutorial USENIX June 2009Richard Elling
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Boden Russell
 
GWAVACon 2013: Filr Pilot
GWAVACon 2013: Filr PilotGWAVACon 2013: Filr Pilot
GWAVACon 2013: Filr PilotGWAVA
 
Performant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux WayPerformant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux WayOpenNebula Project
 
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...OpenNebula Project
 
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Novell
 
Renaissance of sparc UKOUG 2014
Renaissance of sparc UKOUG 2014Renaissance of sparc UKOUG 2014
Renaissance of sparc UKOUG 2014Philippe Fierens
 
IT Assist - ZFS on linux
IT Assist - ZFS on linuxIT Assist - ZFS on linux
IT Assist - ZFS on linuxIDG Romania
 
Linux操作系统01 简介
Linux操作系统01 简介Linux操作系统01 简介
Linux操作系统01 简介lclsg123
 
New Oracle Infrastructure2
New Oracle Infrastructure2New Oracle Infrastructure2
New Oracle Infrastructure2markleeuw
 
State of the_gluster_-_lceu
State of the_gluster_-_lceuState of the_gluster_-_lceu
State of the_gluster_-_lceuGluster.org
 
Zettabyte File Storage System
Zettabyte File Storage SystemZettabyte File Storage System
Zettabyte File Storage SystemAmdocs
 
Zettabyte File Storage System
Zettabyte File Storage SystemZettabyte File Storage System
Zettabyte File Storage SystemAmdocs
 
OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...
OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...
OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...NETWAYS
 

Similar to OpenZFS at LinuxCon (20)

제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustre제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustre
 
Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfs
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
 
ZFS for Databases
ZFS for DatabasesZFS for Databases
ZFS for Databases
 
ZFS Tutorial USENIX June 2009
ZFS  Tutorial  USENIX June 2009ZFS  Tutorial  USENIX June 2009
ZFS Tutorial USENIX June 2009
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
 
GWAVACon 2013: Filr Pilot
GWAVACon 2013: Filr PilotGWAVACon 2013: Filr Pilot
GWAVACon 2013: Filr Pilot
 
Performant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux WayPerformant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux Way
 
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
 
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
 
Renaissance of sparc UKOUG 2014
Renaissance of sparc UKOUG 2014Renaissance of sparc UKOUG 2014
Renaissance of sparc UKOUG 2014
 
IT Assist - ZFS on linux
IT Assist - ZFS on linuxIT Assist - ZFS on linux
IT Assist - ZFS on linux
 
Linux操作系统01 简介
Linux操作系统01 简介Linux操作系统01 简介
Linux操作系统01 简介
 
New Oracle Infrastructure2
New Oracle Infrastructure2New Oracle Infrastructure2
New Oracle Infrastructure2
 
State of the_gluster_-_lceu
State of the_gluster_-_lceuState of the_gluster_-_lceu
State of the_gluster_-_lceu
 
Zettabyte File Storage System
Zettabyte File Storage SystemZettabyte File Storage System
Zettabyte File Storage System
 
Zettabyte File Storage System
Zettabyte File Storage SystemZettabyte File Storage System
Zettabyte File Storage System
 
OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...
OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...
OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...
 

Recently uploaded

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

OpenZFS at LinuxCon

  • 1.
  • 2. LLNL-PRES-643675 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC LinuxCon 2013 Brian Behlendorf, Open ZFS on Linux September 17, 2013
  • 3. Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 2 ● Advanced Simulation – Massive Scale – Data Intensive ● Top 500 – #3 Sequoia ● 20.1 Peak PFLOP/s ● 1,572,864 cores ● 55 PB of storage at 850 GB/s – #8 Vulcan ● 5.0 Peak PFLOPS/s ● 393,216 cores ● 6.7 PB of storage at 106 GB/s ● ● High Performance Computing World class computing resources
  • 4. Lawrence Livermore National Laboratory LLNL-PRES-643675 3 ● First Linux cluster deployed in 2001 ● Near-commodity hardware ● Open Source ● Clustered High Availability Operating System (CHAOS) – Modified RHEL Kernel – New packages – monitoring, power/console, compilers, etc – Lustre Parallel Filesystem – ZFS Filesystem Linux Clusters LLNL Loves Linux
  • 5. Lawrence Livermore National Laboratory LLNL-PRES-643675 4 ● Massively parallel distributed filesystem ● Lustre servers use a modified ext4 – Stable and fast, but... – No scalability – No data integrity – No online manageability ● Something better was needed – Use XFS, BTRFS, etc? – Write a filesystem from scratch? – Port ZFS to Linux? Lustre Filesystem Existing Linux filesystems do not meet our requirements
  • 6. Lawrence Livermore National Laboratory LLNL-PRES-643675 5 ● 2008 – Prototype to determine viability ● 2009 – Initial ZVOL and Lustre support ● 2010 – Development moved to Github ● 2011 – POSIX layer added ● 2011 – Community of early adopters ● 2012 – Production usage of ZFS ● 2013 – Stable GA release ZFS on Linux History Community involvement was exceptionally helpful
  • 7. Lawrence Livermore National Laboratory LLNL-PRES-643675 6 ARC ZIO VDEV Configuration DMU ZAP ZIL DSL Traversal OST OFD ZFS OSD MDT MDD ZVOL /dev/zfsZPL libzfs ZFS CLI Interface Layer Transactional Object Layer Pooled Storage Layer User Kernel Architecture SPL
  • 8. Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 7 ● Core ZFS code is self contained – May be built in user space or kernel space – Includes functionality for snapshots, clones, I/O pipeline, etc ● Solaris Porting Layer – Adds stable Solaris/Illumos interfaces ● Taskqs, lists, condition variables, rwlocks, memory allocators, etc... – Layered on top of Linux equivalents if available – Solaris specific interfaces were implemented from scratch ● Vdev disk – Interfaces with the Linux kernel block layer – Had to be rewritten to use native Linux interfaces Linux Specific Changes The core ZFS code required little change
  • 9. Lawrence Livermore National Laboratory LLNL-PRES-643675 8 ● Interface Layer – /dev/zfs ● Device node interface for user space zfs and zpool utilities ● Minor changes needed for a Linux character device – ZVOL: ZFS Volumes ● Reimplemented as Linux block driver which is backed by the DMU – ZPL: ZFS Posix Layer ● Most complicated part, there are significant VFS differences ● In general new functions were added for the Linux VFS handlers ● If possible the Linux handlers use the equivalent Illumos handler – Lustre ● Support for Lustre was added by Sun/Oracle/Whamcloud/Intel ● Lustre directly layers on the DMU and does not use the Posix Layer Linux Specific Changes The majority of changes to ZFS were done in the interface layer
  • 10. Lawrence Livermore National Laboratory LLNL-PRES-643675 9 ● 8k stacks – Illumos allows larger stacks – Needed to get stack usage down to support stock distribution kernels – Code reworked as needed to save stack ● Gcc – ZFS was written for C99, Linux kernel is C89 – Fix numerous compiler warnings ● GPL-only symbols – ZFS is under an open source license but may not use all exported symbols – This includes basic functionality such as work queues – ZVOLs can't add entries in /sys/block/ – .zfs/snapshot's can't use the automounter ● User space – Solaris threads used instead of pthreads – Block device naming differences – Udev integration Porting Issues
  • 11. Lawrence Livermore National Laboratory LLNL-PRES-643675 10 ● Memory management – ZFS relies heavily on virtual memory for its data buffers ● By design the Linux kernel discourages the use of virtual memory ● To resolve this the SPL provides a virtual memory based slab – This allows use of the existing ZFS IO pipeline without modification – Fragmentation results in underutilized memory – Stability concerns under low memory conditions ● We plan to modify ZFS to use scatter-gather lists of pages under Linux ● Allows larger block sizes ● Allows support for 32-bit systems – The ARC is not integrated with the Linux page cache ● Memory used by the ARC is not reported as cached pages ● Complicates reclaiming memory from the ARC when needed ● Requires an extra copy of the data for mmap'ed I/O Porting Issues
  • 12. Lawrence Livermore National Laboratory LLNL-PRES-643675 11 ● Features – O_DIRECT – Asynchronous IO – POSIX ACLs – Reflink – Filefrag – Fallocate – TRIM – FMA Infrastructure (event daemon) – Multiple Modified Protection (MMP) – Large blocks Future Work Possibilities for future work
  • 13. Lawrence Livermore National Laboratory LLNL-PRES-643675 12 ● All source code and the issue tracker are kept at Github – http://github.com/zfsonlinux/zfs – 70 contributors, 171 forks, 816 watchers ● Not in mainline kernel – Similar to resier4, unionfs, lustre, ceph, … – Autotools used for compatibility – Advantages ● One code base used for 2.6.26 - 3.11 kernels ● Users can update ZFS independently from the kernel ● Simplifies keeping in sync with Illumos and FreeBSD ● User space utilities and kernel modules can share code – Disadvantages ● Support for the latest kernel lags slightly ● Higher maintenance and testing burden for developers Source Code
  • 14. Lawrence Livermore National Laboratory LLNL-PRES-643675 13 ● Stable and used in production ● Performance is comparable to existing Linux filesystems ● All major ZFS features are available – Simplified administration - Stripes, Mirrors, and RAIDZ[1,2,3] – Online management - ZFS Intent Log (ZIL) – Snapshots / Clones - L2ARC Tiered Caching – Special .zfs directory - Transparent compression – Send/receive of snapshots - Transparent deduplication – Virtual block devices (ZVOL) ● Currently used by Supercomputers, Desktops, NAS appliances ● Enthusiastic user community – zfs-discuss@zfsonlinux.org Where is ZFS on Linux Today ZFS is available on Linux today!
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. Lawrence Livermore National Laboratory LLNL-PRES-643675 14 ● ZFS is Open Source under the CDDL ● ZFS is NOT a derived work of Linux – “It would be rather preposterous to call the Andrew FileSystem a 'derived work' of Linux, for example, so I think it's perfectly OK to have a AFS module, for example.” ● Linus Torvalds – “Our view is that just using structure definitions, typedefs, enumeration constants, macros with simple bodies, etc., is NOT enough to make a derivative work. It would take a substantial amount of code (coming from inline functions or macros with substantial bodies) to do that.” ● Richard Stallman Licensing ZFS can be used in Linux
  • 34. Lawrence Livermore National Laboratory LLNL-PRES-643675 15