SlideShare uma empresa Scribd logo
1 de 50
Baixar para ler offline
Linux File Systems
Enabling Cutting Edge Features in
RHEL 6 & 7
Ric Wheeler, Senior Manager
Steve Dickson, Consulting Engineer
File and Storage Team
Red Hat, Inc.
June 12, 2013
Red Hat Enterprise Linux 6
RHEL6 File and Storage Themes
● LVM Support for Scalable Snapshots and Thin
Provisioned Storage
● Expanded options for file systems
● General performance enhancements
● Industry leading support for new pNFS protocol
RHEL6 LVM Redesign
● Shared exception store
● Eliminates the need to have a dedicated partition for
each file system that uses snapshots
● Mechanism is much more scalable
● LVM thin provisioned target
● dm-thinp is part of the device mapper stack
● Implements a high end, enterprise storage array feature
● Makes re-sizing a file system almost obsolete!
RHEL Storage Provisioning
Improved Resource Utilization & Ease of Use
Data
Allocated Unused
Data
Allocated Unused
Data
Data
Available
Storage
Volume 2
Volume 1
Volume 1
Volume 2
Free
Space
Allocation
Pool
Traditional Provisioning RHEL Thin Provisioning
Inefficient resource utilization
Manually manage unused storage
Efficient resource utilization
Automatically managed
{
{
{
{
{
Avoids wasting space
Less administration
* Lower costs *
Thin Provisioned Targets Leverage Discard
● Red Hat Enterprise Linux 6 introduced support to notify
storage devices when a blocks becomes free in a file
system
● Useful for wear leveling in SSD's or space reclamation
in thin provisioned storage
● Discard maps to the appropriate low level command
● TRIM for SSD devices
● UNMAP or WRITE_SAME for SCSI devices
● dm-thinp handles the discard directly
RHEL Storage Provisioning
Improved Resource Utilization
20GB In use
Free
Space
Allocation
Pool
10GB In useVolume {
90GB Available
Storage
80GB Available
Storage
{
10GB In use
90GB Available
Storage
100GB TP Volume
10GB Data Written
Another 10GB
Data Written
10GB Data Erased
Space Reclaimed after
Discard Command
STEP #1 STEP #2 STEP #3
Space Reclamation with Thin Provisioned Volumes
Even More LVM Features
● Red Hat Enterprise Linux LVM commands can control
software RAID (MD) devices
● 6.4 added LVM Support for RAID10
● Introduction of a new user space daemon
● Daemon stores the device information after a scan in its
caches
● Major performance gains when systems have a large
number of disks
Expanding Choices
● Early in RHEL5 there are limited choices in the file
system space
● Ext3 was the only local file system
● GFS1 was your clustered file system
● RHEL5 updates brought in support for
● Ext4 added to the core Red Hat Linux platform
● Scalable File System (XFS) as a layered product
● GFS2 as the next generation of clustered file system
● Support for user space file systems via FUSE support
RHEL6 File System Highlights
● Scalable File System (XFS) enhancements support the
most challenging workloads
● Performance enhancements for meta-data workloads
● Selected as the base file system for Red Hat Storage
● GFS2 reached new levels of performance
● Base file system for clustered Samba servers
● XFS performance enhancements for synchronous
workloads
● Support for Parallel NFS file layout client
RHEL6.4 File System Updates
● Ext4 enhanced for virtual guest storage in RHEL6.4
● “hole punch” deallocates data in the middle of files
● Refresh of the btrfs file system technology preview to
3.5 upstream version
● Scalable File System (aka XFS) is a layered product
for the largest and most demanding workloads
● Series of performance enhancements learned courtesy
of Red Hat Storage and partners
● Refresh of key updates from the upstream Linux kernel
How to Choose a Local File System?
● The best way is to test each file system with your
specific workload on a well tuned system
● Make sure to use the RHEL tuned profile for your
storage
● The default file system will just work for most
applications and servers
● Many applications are not file or storage constrained
● If you are not waiting on the file system, changing it
probably will not make your application faster!
SAS on Standalone Systems
Picking a RHEL File System
ext3
ext4
xfs
0 3600 7200 10800 14400
SAS Mixed Analytics 9.3 running RHEL6.3
Comparing Total time and System CPU usage
TOTALtime SystemTime
Time in seconds (lower is better)
Filesystem
xfs most recommended
●
Max file system size 100TB
●
Max file size 100TB
●
Best performing
ext4 recommended
●
Max file system size 16TB
●
Max file size 16TB
ext3 not recommended
●
Max file system size 16TB
●
Max file size 2TB
INTERNAL ONLY14
Red Hat Enterprise Linux 6
GFS2 New Features
● Performance improvements
● Sorted ordered write list for improved log flush speed &
block reservation in the allocator for better on-disk
layout with complex workloads (6.4+)
● Orlov allocator & better scalability with very large
number of cached inodes (6.5+)
● Faster glock dump for debugging (6.4+)
● Support for 4k sector sized devices with TRIM (6.5+)
Performance Enhancement for User Space File
Systems
● User space file systems are increasingly popular
● Cloud file systems and “scale out” file systems like
HDFS or gluster
● FUSE is a common kernel mechanism for this class of
file system
● Work to enhance the performance of FUSE includes
● Support for FUSE readdirplus()
● Support for scatter-gather IO
● Reduces trips from user space to the kernel
RHEL6 NFS Updates
Traditional NFS
NFS Client NFS Client NFS Client
Linux NFS
Server
Storage
One Server for Multiple Clients
= Limited Scalability
Parallel NFS (pNFS)
● Architecture
● Metadata Server (MDS) – Handles all non-Data Traffic
● Data Server (DS) – Direct I/O access to clients
● Shared Storage Between Servers
● Layout Define server Architecture
● File Layout (NAS Env) - Netapp
● Block Layout (SAN Env) - EMC
● Object Layout (High Perf Env) Pananas & Tonian
Parallel NFS = Scalability
Parallel data paths to Storage
File Layout ==> NAS
Direct path – Client to Storage
Object Layout ==> SAN
pNFS ClientpNFS ClientNFS Client
pNFS
MetaData
Server
Storage
pNFS ClientpNFS ClientpNFS ClientpNFS Client ...
StorageStorage Storage
pNFS Clients
pNFS
Data
Server
pNFS
Data
Server
pNFS
Data
Server
Parallel NFS (pNFS) - RHEL 6.4
● First to market with Client support (file layout)
● Thank you very much Upstream and Partners!!!
● Enabling pNFS:
● mount -o v4.1 server:/export /mnt/export
● RHEL-Next
● Block and Object layout support
10 20 40 60 80 100
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
RHEL 6.4 pNFS vs NFSv4
Oracle11gR2 OLTP Workload
pNFS
NFSv4
Number of Users
TransactionsPerMinute(TPM
Parallel NFS = High Performance and Scalability
Source: Tonian Systems
Red Hat Enterprise Linux 7
RHEL7 Areas of Focus
● Enhanced performance
● Focus on very high performance, low latency devices
● Support for new types of hardware
● Working with our storage partners to enable their latest
devices
● Support for higher capacities across the range of file
and storage options
● Ease of use and management
RHEL7 Storage Updates
Storage Devices are too Fast for the Kernel!
● We are too slow for modern SSD devices
● The Linux kernel did pretty well with just S-ATA SSD's
● PCI-e SSD cards can sustain 500,000 or more IOPs
and our stack is the bottleneck in performance
● A new generation of persistent memory is coming that
will be
● Roughly the same capacity, cost and performance as
DRAM
● The IO stack needs to go to millions of IOPs
● http://lwn.net/Articles/547903
Dueling Block Layer Caching Schemes
● With all classes of SSD's, the cost makes it difficult to
have a purely SSD system at large capacity
● Obvious extension is to have a block layer cache
● Two major upstream choices:
● Device mapper dm-cache target will be in RHEL7
● BCACHE queued up for 3.10 kernel, might make
RHEL7
● Performance testing underway
● BCACHE is finer grained cache
● Dm-cache has a pluggable policy (similar to dm MPIO)
● https://lwn.net/Articles/548348
Thinly Provisioned Storage & Alerts
● Thinly provisioned storage lies to users
● Similar to DRAM versus virtual address space
● Sys admin can give all users a virtual TB and only back
it up with 100GB of real storage for each user
● Supported in arrays & by device mapper dm-thinp
● Trouble comes when physical storage nears its limit
● Watermarks are set to trigger an alert
● Debate is over where & how to log that
● How much is done in kernel versus user space?
● User space policy agent was slightly more popular
● http://lwn.net/Articles/548295
Continued LVM Enhancements for Software RAID
● LVM will support more software RAID features
● Scrubbing proactively detects latent disk errors in
software RAID devices
● Reshape moves a device from one RAID level to
another
● Support for write-mostly, write-behind, resync throttling
all help performance
● BTRFS will provide very basic software RAID
capabilities
● Still very new code
Storage Management APIs
● Unification of the code that does storage management
● Low level libraries with C & Python bindings
● libstoragemgt manages SAN and NAS
● liblvm is the API equivalent of LVM user commands
● API's in the works include HBA, SCSI, iSCSI, FcoE,
multipath
● Storage system manager provides an easy to use
command line interface
● Blivet is a new high level storage and file system
library that will be used by anaconda and openlmi
Future Red Hat Stack Overview
Kernel
Storage Target
Low Level Tools:
LVM,
Device Mapper,
FS Utilities
Anaconda
Storage System
Manager (SSM)
Vendor Specific Tools
Hardware RAID
Array Specific
OVIRT OpenLMI
LIBSTORAGEMGT
BLIVET
LIBLVM
RHEL7 Local File System
Updates
RHEL7 Will Bring in More Choices
● RHEL 7 is looking to support ext4, XFS and btrfs
● All can be used for boot, system & data partitions
● Btrfs going through intense testing and qualification
● Ext2/Ext3 will be fully supported
● Use the ext4 driver which is mostly invisible to users
● Full support for all pNFS client layout types
● Add in support for vendors NAS boxes which support
the pNFS object and block layouts
RHEL7 Default File System
● In RHEL7, Red Hat is looking to make XFS the new
default
● XFS will be the default for boot, root and user data
partitions on all supported architectures
● Red Hat is working with partners and customers during
this selection process to test and validate XFS
● Final decision will be made pending successful testing
● Evaluating maximum file system sizes for RHEL7
XFS Strengths
● XFS is the reigning champion of larger servers and
high end storage devices
● Tends to extract the most from the hardware
● Well tuned to multi-socket and multi-core servers
● XFS has a proven track record with the largest
systems
● Carefully selected RHEL partners and customers run
XFS up to 300TB in RHEL6
● Popular base for enterprise NAS servers
● Including Red Hat Storage
EXT4 Strengths
● Ext4 is very well know to system administrators and
users
● Default file system in RHEL6
● Closely related to ext3 our RHEL5 default
● Can outperform XFS in some specific workloads
● Single threaded, single disk workload with synchronous
updates
● Avoid ext4 for larger storage
● Base file system for Android and Google File System
BTRFS – The New File System Choice
● Integrates many block layer functions into the file
system layer
● Logical volume management functions like snapshots
● Can do several versions of RAID
● Designed around ease of use and has sophisticated
metadata
● Back references help map IO errors to file system
objects
● Great choice for systems with lots of independent disks
and no hardware RAID
RHEL7 NFS Updates
RHEL7 NFS Server Updates
● Red Hat Enterprise Linux 7.0 completes the server
side support for NFS 4.1
● Support for only-once semantics
● Callbacks use port 2049
● No server side support for parallel NFS ... yet!
Parallel NFS Updates
● Parallel NFS has three layout types
● Block layouts allow direct client access to SAN data
● Object layouts for direct access to the object backend
● File layout
● RHEL7.0 will add support for block and object layout
types
● Will provide support for all enterprise pNFS servers!
Support for SELinux over NFS
● Labeled NFS enable fine grained SELinux contexts
● Part of the NFS4.2 specification
● Use cases include
● Secure virtual machines stored on NFS server
● Restricted home directory access
INTERNAL ONLY42
Red Hat Enterprise Linux 7.9
GFS2 New Features
● All of the previously mentioned RHEL6 features
● Streamlined journaling code
● Less memory overhead
● Less prone to pauses during low memory conditions
● New cluster stack interface (no gfs_controld)
● Performance co-pilot (PCP) support for glock statistics
● Faster fsck
● RAID stripe aware mkfs
Pulling it all Together....
● Ease of Use
● Tuning & automation of Local FS to LVM new features
● Thin provisioned storage
● Upgrade rollback
● Scalable snapshots
● Major focus on stability testing of btrfs
● Looking to see what use cases it fits best
● Harden XFS metadata
● Detect errors to confidently support 500TB single FS
Upstream Projects
Performance Work: SBC-4 Commands
● SBC-4 is a reduced & optimized SCSI disk command
set proposed in T10
● Current command set is too large and supports too
many odd devices (tapes, USB, etc)
● SBC-4 will support just disks and be optimized for low
latency
● Probably needs a new SCSI disk drive
● New atomic write command from T10
● All of the change or nothing happens
● Supported by some new devices like FusionIO already
● http://lwn.net/Articles/548116/
Copy Offload System Calls
● Upstream kernel community has debated “copy
offload” for several years
● Popular use case is VM guest image copy
● Proposal is to have one new system call
● int copy_range(int fd_in, loff_t, upos_in, int fd_out, loff_t
upos_out, int count)
● Offload copy to SCSI devices, NFS or copy enabled file
systems (like reflink in OCFS2 or btrfs)
● Patches for copy_range() posted by Zach Brown
● https://lkml.org/lkml/2013/5/14/622
● http://lwn.net/Articles/548347
IO Hints
● Storage device manufacturers want help from
applications and the kernel
● Tag data with hints about streaming vs random, boot
versus run time, critical data
● T10 standards body proposed SCSI versions which was
voted down
● Suggestion raised to allow hints to be passed down via
struct bio from file system to block layer
● Support for expanding fadvise() hints for applications
● No consensus on what hints to issue from the file or
storage stack internally
● http://lwn.net/Articles/548296/
Improving Error Return Codes?
● The interface from the IO subsystem up to the file
system is pretty basic
● Low level device errors almost always propagate as EIO
● Causes file system to go offline or read only
● Makes it hard to do intelligent error handling at FS level
● Suggestion was to re-use select POSIX error codes to
differentiate from temporary to permanent errors
● File system might retry on temporary errors
● Will know to give up immediately on others
● Challenge is that IO layer itself cannot always tell!
● http://lwn.net/Articles/548353
Learning More
Learn more about File Systems & Storage
● Attend related Summit Sessions
● Linux File Systems: Enabling Cutting-edge Features in
Red Hat Enterprise Linux 6 & 7 (Wed 4:50)
● Kernel Storage & File System Demo Pod (Wed 5:30)
● Evolving & Improving RHEL NFS (Thurs 2:30)
● Parallel NFS: Storage Leaders & NFS Architects Panel
(Thurs 3:40)
● Engage the community
● http://lwn.net
● Mailing lists: linux-ext4, linux-btrfs, linux-nfs,
xfs@oss.sgi.com

Mais conteúdo relacionado

Mais procurados

Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerYongseok Oh
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionGluster.org
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvSahina Bose
 
Glusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGlusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGluster.org
 
Performance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsPerformance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsNeependra Khare
 
The Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgThe Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgJohn Mark Walker
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalabGluster.org
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterRed_Hat_Storage
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Divejoshdurgin
 
Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster.org
 
Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-CasesRed Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-CasesRed_Hat_Storage
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed_Hat_Storage
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
HDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemHDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemKonstantin V. Shvachko
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013Gluster.org
 
Gluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster.org
 
Geek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionGeek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionIDERA Software
 
HyperLoop: Group-Based NIC-Offloading to Accelerate Replicated Transactions i...
HyperLoop: Group-Based NIC-Offloading to Accelerate Replicated Transactions i...HyperLoop: Group-Based NIC-Offloading to Accelerate Replicated Transactions i...
HyperLoop: Group-Based NIC-Offloading to Accelerate Replicated Transactions i...Daehyeok Kim
 

Mais procurados (20)

Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS Scheduler
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introduction
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlv
 
Glusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGlusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_clift
 
Performance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsPerformance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fs
 
The Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgThe Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.org
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalab
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Dive
 
Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephant
 
Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-CasesRed Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
HDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemHDFS for Geographically Distributed File System
HDFS for Geographically Distributed File System
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
 
Qnap nas tvs serie x71 catalogo
Qnap nas tvs serie x71 catalogoQnap nas tvs serie x71 catalogo
Qnap nas tvs serie x71 catalogo
 
Gluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmap
 
Gluster d2
Gluster d2Gluster d2
Gluster d2
 
Geek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An IntroductionGeek Sync | Infrastructure for the Data Professional: An Introduction
Geek Sync | Infrastructure for the Data Professional: An Introduction
 
HyperLoop: Group-Based NIC-Offloading to Accelerate Replicated Transactions i...
HyperLoop: Group-Based NIC-Offloading to Accelerate Replicated Transactions i...HyperLoop: Group-Based NIC-Offloading to Accelerate Replicated Transactions i...
HyperLoop: Group-Based NIC-Offloading to Accelerate Replicated Transactions i...
 

Destaque

Colly Round The World
Colly Round The WorldColly Round The World
Colly Round The Worldguestb78d3f
 
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13Gosuke Miyashita
 
Cluster pitfalls recommand
Cluster pitfalls recommandCluster pitfalls recommand
Cluster pitfalls recommandsprdd
 
Linuxkongress2010.gfs2ocfs2.talk
Linuxkongress2010.gfs2ocfs2.talkLinuxkongress2010.gfs2ocfs2.talk
Linuxkongress2010.gfs2ocfs2.talkUdo Seidel
 
Sine nomine HAO for RHEL on IBM System z
Sine nomine HAO for RHEL on IBM System zSine nomine HAO for RHEL on IBM System z
Sine nomine HAO for RHEL on IBM System zFilipe Miranda
 
Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsWSO2
 
PBS and Scheduling at NCI: The past, present and future
PBS and Scheduling at NCI: The past, present and futurePBS and Scheduling at NCI: The past, present and future
PBS and Scheduling at NCI: The past, present and futureinside-BigData.com
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Giuseppe Paterno'
 

Destaque (11)

Colly Round The World
Colly Round The WorldColly Round The World
Colly Round The World
 
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
 
Cluster pitfalls recommand
Cluster pitfalls recommandCluster pitfalls recommand
Cluster pitfalls recommand
 
Linuxkongress2010.gfs2ocfs2.talk
Linuxkongress2010.gfs2ocfs2.talkLinuxkongress2010.gfs2ocfs2.talk
Linuxkongress2010.gfs2ocfs2.talk
 
Cluster filesystems
Cluster filesystemsCluster filesystems
Cluster filesystems
 
Sine nomine HAO for RHEL on IBM System z
Sine nomine HAO for RHEL on IBM System zSine nomine HAO for RHEL on IBM System z
Sine nomine HAO for RHEL on IBM System z
 
Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and Solutions
 
PBS and Scheduling at NCI: The past, present and future
PBS and Scheduling at NCI: The past, present and futurePBS and Scheduling at NCI: The past, present and future
PBS and Scheduling at NCI: The past, present and future
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
 
Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 

Semelhante a Linux File Systems: Enabling Cutting Edge Features in RHEL 6 & 7

IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic StoragePatrick Bouillaud
 
Gpfs introandsetup
Gpfs introandsetupGpfs introandsetup
Gpfs introandsetupasihan
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSandeep Patil
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Trishali Nayar
 
LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)Linaro
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewMarcel Hergaarden
 
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo SeidelOSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo SeidelNETWAYS
 
Linuxtag.ceph.talk
Linuxtag.ceph.talkLinuxtag.ceph.talk
Linuxtag.ceph.talkUdo Seidel
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2BradDesAulniers2
 
Introducing StorNext5 and Lattus
Introducing StorNext5 and LattusIntroducing StorNext5 and Lattus
Introducing StorNext5 and Lattusinside-BigData.com
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Red Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed_Hat_Storage
 
New Oracle Infrastructure2
New Oracle Infrastructure2New Oracle Infrastructure2
New Oracle Infrastructure2markleeuw
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4UniFabric
 

Semelhante a Linux File Systems: Enabling Cutting Edge Features in RHEL 6 & 7 (20)

IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
 
Gpfs introandsetup
Gpfs introandsetupGpfs introandsetup
Gpfs introandsetup
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...
 
Storage Managment
Storage ManagmentStorage Managment
Storage Managment
 
LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)
 
DAS RAID NAS SAN
DAS RAID NAS SANDAS RAID NAS SAN
DAS RAID NAS SAN
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo SeidelOSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
 
Linuxtag.ceph.talk
Linuxtag.ceph.talkLinuxtag.ceph.talk
Linuxtag.ceph.talk
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2
 
Introducing StorNext5 and Lattus
Introducing StorNext5 and LattusIntroducing StorNext5 and Lattus
Introducing StorNext5 and Lattus
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Red Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open Stack
 
New Oracle Infrastructure2
New Oracle Infrastructure2New Oracle Infrastructure2
New Oracle Infrastructure2
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
 

Mais de sprdd

Linux con europe_2014_full_system_rollback_btrfs_snapper_0
Linux con europe_2014_full_system_rollback_btrfs_snapper_0Linux con europe_2014_full_system_rollback_btrfs_snapper_0
Linux con europe_2014_full_system_rollback_btrfs_snapper_0sprdd
 
Linux con europe_2014_f
Linux con europe_2014_fLinux con europe_2014_f
Linux con europe_2014_fsprdd
 
Openstack v4 0
Openstack v4 0Openstack v4 0
Openstack v4 0sprdd
 
Hardware accelerated virtio networking for nfv linux con
Hardware accelerated virtio networking for nfv linux conHardware accelerated virtio networking for nfv linux con
Hardware accelerated virtio networking for nfv linux consprdd
 
난공불락세미나 Ldap
난공불락세미나 Ldap난공불락세미나 Ldap
난공불락세미나 Ldapsprdd
 
Lkda facebook seminar_140419
Lkda facebook seminar_140419Lkda facebook seminar_140419
Lkda facebook seminar_140419sprdd
 
Glusterfs 소개 v1.0_난공불락세미나
Glusterfs 소개 v1.0_난공불락세미나Glusterfs 소개 v1.0_난공불락세미나
Glusterfs 소개 v1.0_난공불락세미나sprdd
 
Zinst 패키지 기반의-리눅스_중앙관리시스템_20140415
Zinst 패키지 기반의-리눅스_중앙관리시스템_20140415Zinst 패키지 기반의-리눅스_중앙관리시스템_20140415
Zinst 패키지 기반의-리눅스_중앙관리시스템_20140415sprdd
 
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0sprdd
 
오픈소스컨설팅 클러스터제안 V1.0
오픈소스컨설팅 클러스터제안 V1.0오픈소스컨설팅 클러스터제안 V1.0
오픈소스컨설팅 클러스터제안 V1.0sprdd
 
HP NMI WATCHDOG
HP NMI WATCHDOGHP NMI WATCHDOG
HP NMI WATCHDOGsprdd
 
H2890 emc-clariion-asymm-active-wp
H2890 emc-clariion-asymm-active-wpH2890 emc-clariion-asymm-active-wp
H2890 emc-clariion-asymm-active-wpsprdd
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1sprdd
 
2013fooscoverpageimage 130417105210-phpapp01
2013fooscoverpageimage 130417105210-phpapp012013fooscoverpageimage 130417105210-phpapp01
2013fooscoverpageimage 130417105210-phpapp01sprdd
 
Openstackinsideoutv10 140222065532-phpapp01
Openstackinsideoutv10 140222065532-phpapp01Openstackinsideoutv10 140222065532-phpapp01
Openstackinsideoutv10 140222065532-phpapp01sprdd
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformancesprdd
 
Doldoggi bisiri
Doldoggi bisiriDoldoggi bisiri
Doldoggi bisirisprdd
 
유닉스 리눅스 마이그레이션_이호성_v1.0
유닉스 리눅스 마이그레이션_이호성_v1.0유닉스 리눅스 마이그레이션_이호성_v1.0
유닉스 리눅스 마이그레이션_이호성_v1.0sprdd
 
세미나설문
세미나설문세미나설문
세미나설문sprdd
 
5231 140-hellwig
5231 140-hellwig5231 140-hellwig
5231 140-hellwigsprdd
 

Mais de sprdd (20)

Linux con europe_2014_full_system_rollback_btrfs_snapper_0
Linux con europe_2014_full_system_rollback_btrfs_snapper_0Linux con europe_2014_full_system_rollback_btrfs_snapper_0
Linux con europe_2014_full_system_rollback_btrfs_snapper_0
 
Linux con europe_2014_f
Linux con europe_2014_fLinux con europe_2014_f
Linux con europe_2014_f
 
Openstack v4 0
Openstack v4 0Openstack v4 0
Openstack v4 0
 
Hardware accelerated virtio networking for nfv linux con
Hardware accelerated virtio networking for nfv linux conHardware accelerated virtio networking for nfv linux con
Hardware accelerated virtio networking for nfv linux con
 
난공불락세미나 Ldap
난공불락세미나 Ldap난공불락세미나 Ldap
난공불락세미나 Ldap
 
Lkda facebook seminar_140419
Lkda facebook seminar_140419Lkda facebook seminar_140419
Lkda facebook seminar_140419
 
Glusterfs 소개 v1.0_난공불락세미나
Glusterfs 소개 v1.0_난공불락세미나Glusterfs 소개 v1.0_난공불락세미나
Glusterfs 소개 v1.0_난공불락세미나
 
Zinst 패키지 기반의-리눅스_중앙관리시스템_20140415
Zinst 패키지 기반의-리눅스_중앙관리시스템_20140415Zinst 패키지 기반의-리눅스_중앙관리시스템_20140415
Zinst 패키지 기반의-리눅스_중앙관리시스템_20140415
 
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
 
오픈소스컨설팅 클러스터제안 V1.0
오픈소스컨설팅 클러스터제안 V1.0오픈소스컨설팅 클러스터제안 V1.0
오픈소스컨설팅 클러스터제안 V1.0
 
HP NMI WATCHDOG
HP NMI WATCHDOGHP NMI WATCHDOG
HP NMI WATCHDOG
 
H2890 emc-clariion-asymm-active-wp
H2890 emc-clariion-asymm-active-wpH2890 emc-clariion-asymm-active-wp
H2890 emc-clariion-asymm-active-wp
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
 
2013fooscoverpageimage 130417105210-phpapp01
2013fooscoverpageimage 130417105210-phpapp012013fooscoverpageimage 130417105210-phpapp01
2013fooscoverpageimage 130417105210-phpapp01
 
Openstackinsideoutv10 140222065532-phpapp01
Openstackinsideoutv10 140222065532-phpapp01Openstackinsideoutv10 140222065532-phpapp01
Openstackinsideoutv10 140222065532-phpapp01
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformance
 
Doldoggi bisiri
Doldoggi bisiriDoldoggi bisiri
Doldoggi bisiri
 
유닉스 리눅스 마이그레이션_이호성_v1.0
유닉스 리눅스 마이그레이션_이호성_v1.0유닉스 리눅스 마이그레이션_이호성_v1.0
유닉스 리눅스 마이그레이션_이호성_v1.0
 
세미나설문
세미나설문세미나설문
세미나설문
 
5231 140-hellwig
5231 140-hellwig5231 140-hellwig
5231 140-hellwig
 

Linux File Systems: Enabling Cutting Edge Features in RHEL 6 & 7

  • 1. Linux File Systems Enabling Cutting Edge Features in RHEL 6 & 7 Ric Wheeler, Senior Manager Steve Dickson, Consulting Engineer File and Storage Team Red Hat, Inc. June 12, 2013
  • 3. RHEL6 File and Storage Themes ● LVM Support for Scalable Snapshots and Thin Provisioned Storage ● Expanded options for file systems ● General performance enhancements ● Industry leading support for new pNFS protocol
  • 4. RHEL6 LVM Redesign ● Shared exception store ● Eliminates the need to have a dedicated partition for each file system that uses snapshots ● Mechanism is much more scalable ● LVM thin provisioned target ● dm-thinp is part of the device mapper stack ● Implements a high end, enterprise storage array feature ● Makes re-sizing a file system almost obsolete!
  • 5. RHEL Storage Provisioning Improved Resource Utilization & Ease of Use Data Allocated Unused Data Allocated Unused Data Data Available Storage Volume 2 Volume 1 Volume 1 Volume 2 Free Space Allocation Pool Traditional Provisioning RHEL Thin Provisioning Inefficient resource utilization Manually manage unused storage Efficient resource utilization Automatically managed { { { { { Avoids wasting space Less administration * Lower costs *
  • 6. Thin Provisioned Targets Leverage Discard ● Red Hat Enterprise Linux 6 introduced support to notify storage devices when a blocks becomes free in a file system ● Useful for wear leveling in SSD's or space reclamation in thin provisioned storage ● Discard maps to the appropriate low level command ● TRIM for SSD devices ● UNMAP or WRITE_SAME for SCSI devices ● dm-thinp handles the discard directly
  • 7. RHEL Storage Provisioning Improved Resource Utilization 20GB In use Free Space Allocation Pool 10GB In useVolume { 90GB Available Storage 80GB Available Storage { 10GB In use 90GB Available Storage 100GB TP Volume 10GB Data Written Another 10GB Data Written 10GB Data Erased Space Reclaimed after Discard Command STEP #1 STEP #2 STEP #3 Space Reclamation with Thin Provisioned Volumes
  • 8. Even More LVM Features ● Red Hat Enterprise Linux LVM commands can control software RAID (MD) devices ● 6.4 added LVM Support for RAID10 ● Introduction of a new user space daemon ● Daemon stores the device information after a scan in its caches ● Major performance gains when systems have a large number of disks
  • 9. Expanding Choices ● Early in RHEL5 there are limited choices in the file system space ● Ext3 was the only local file system ● GFS1 was your clustered file system ● RHEL5 updates brought in support for ● Ext4 added to the core Red Hat Linux platform ● Scalable File System (XFS) as a layered product ● GFS2 as the next generation of clustered file system ● Support for user space file systems via FUSE support
  • 10. RHEL6 File System Highlights ● Scalable File System (XFS) enhancements support the most challenging workloads ● Performance enhancements for meta-data workloads ● Selected as the base file system for Red Hat Storage ● GFS2 reached new levels of performance ● Base file system for clustered Samba servers ● XFS performance enhancements for synchronous workloads ● Support for Parallel NFS file layout client
  • 11. RHEL6.4 File System Updates ● Ext4 enhanced for virtual guest storage in RHEL6.4 ● “hole punch” deallocates data in the middle of files ● Refresh of the btrfs file system technology preview to 3.5 upstream version ● Scalable File System (aka XFS) is a layered product for the largest and most demanding workloads ● Series of performance enhancements learned courtesy of Red Hat Storage and partners ● Refresh of key updates from the upstream Linux kernel
  • 12. How to Choose a Local File System? ● The best way is to test each file system with your specific workload on a well tuned system ● Make sure to use the RHEL tuned profile for your storage ● The default file system will just work for most applications and servers ● Many applications are not file or storage constrained ● If you are not waiting on the file system, changing it probably will not make your application faster!
  • 13. SAS on Standalone Systems Picking a RHEL File System ext3 ext4 xfs 0 3600 7200 10800 14400 SAS Mixed Analytics 9.3 running RHEL6.3 Comparing Total time and System CPU usage TOTALtime SystemTime Time in seconds (lower is better) Filesystem xfs most recommended ● Max file system size 100TB ● Max file size 100TB ● Best performing ext4 recommended ● Max file system size 16TB ● Max file size 16TB ext3 not recommended ● Max file system size 16TB ● Max file size 2TB
  • 14. INTERNAL ONLY14 Red Hat Enterprise Linux 6 GFS2 New Features ● Performance improvements ● Sorted ordered write list for improved log flush speed & block reservation in the allocator for better on-disk layout with complex workloads (6.4+) ● Orlov allocator & better scalability with very large number of cached inodes (6.5+) ● Faster glock dump for debugging (6.4+) ● Support for 4k sector sized devices with TRIM (6.5+)
  • 15. Performance Enhancement for User Space File Systems ● User space file systems are increasingly popular ● Cloud file systems and “scale out” file systems like HDFS or gluster ● FUSE is a common kernel mechanism for this class of file system ● Work to enhance the performance of FUSE includes ● Support for FUSE readdirplus() ● Support for scatter-gather IO ● Reduces trips from user space to the kernel
  • 17. Traditional NFS NFS Client NFS Client NFS Client Linux NFS Server Storage One Server for Multiple Clients = Limited Scalability
  • 18. Parallel NFS (pNFS) ● Architecture ● Metadata Server (MDS) – Handles all non-Data Traffic ● Data Server (DS) – Direct I/O access to clients ● Shared Storage Between Servers ● Layout Define server Architecture ● File Layout (NAS Env) - Netapp ● Block Layout (SAN Env) - EMC ● Object Layout (High Perf Env) Pananas & Tonian
  • 19. Parallel NFS = Scalability Parallel data paths to Storage File Layout ==> NAS Direct path – Client to Storage Object Layout ==> SAN pNFS ClientpNFS ClientNFS Client pNFS MetaData Server Storage pNFS ClientpNFS ClientpNFS ClientpNFS Client ... StorageStorage Storage pNFS Clients pNFS Data Server pNFS Data Server pNFS Data Server
  • 20. Parallel NFS (pNFS) - RHEL 6.4 ● First to market with Client support (file layout) ● Thank you very much Upstream and Partners!!! ● Enabling pNFS: ● mount -o v4.1 server:/export /mnt/export ● RHEL-Next ● Block and Object layout support
  • 21. 10 20 40 60 80 100 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 RHEL 6.4 pNFS vs NFSv4 Oracle11gR2 OLTP Workload pNFS NFSv4 Number of Users TransactionsPerMinute(TPM
  • 22. Parallel NFS = High Performance and Scalability Source: Tonian Systems
  • 24. RHEL7 Areas of Focus ● Enhanced performance ● Focus on very high performance, low latency devices ● Support for new types of hardware ● Working with our storage partners to enable their latest devices ● Support for higher capacities across the range of file and storage options ● Ease of use and management
  • 26. Storage Devices are too Fast for the Kernel! ● We are too slow for modern SSD devices ● The Linux kernel did pretty well with just S-ATA SSD's ● PCI-e SSD cards can sustain 500,000 or more IOPs and our stack is the bottleneck in performance ● A new generation of persistent memory is coming that will be ● Roughly the same capacity, cost and performance as DRAM ● The IO stack needs to go to millions of IOPs ● http://lwn.net/Articles/547903
  • 27. Dueling Block Layer Caching Schemes ● With all classes of SSD's, the cost makes it difficult to have a purely SSD system at large capacity ● Obvious extension is to have a block layer cache ● Two major upstream choices: ● Device mapper dm-cache target will be in RHEL7 ● BCACHE queued up for 3.10 kernel, might make RHEL7 ● Performance testing underway ● BCACHE is finer grained cache ● Dm-cache has a pluggable policy (similar to dm MPIO) ● https://lwn.net/Articles/548348
  • 28. Thinly Provisioned Storage & Alerts ● Thinly provisioned storage lies to users ● Similar to DRAM versus virtual address space ● Sys admin can give all users a virtual TB and only back it up with 100GB of real storage for each user ● Supported in arrays & by device mapper dm-thinp ● Trouble comes when physical storage nears its limit ● Watermarks are set to trigger an alert ● Debate is over where & how to log that ● How much is done in kernel versus user space? ● User space policy agent was slightly more popular ● http://lwn.net/Articles/548295
  • 29. Continued LVM Enhancements for Software RAID ● LVM will support more software RAID features ● Scrubbing proactively detects latent disk errors in software RAID devices ● Reshape moves a device from one RAID level to another ● Support for write-mostly, write-behind, resync throttling all help performance ● BTRFS will provide very basic software RAID capabilities ● Still very new code
  • 30. Storage Management APIs ● Unification of the code that does storage management ● Low level libraries with C & Python bindings ● libstoragemgt manages SAN and NAS ● liblvm is the API equivalent of LVM user commands ● API's in the works include HBA, SCSI, iSCSI, FcoE, multipath ● Storage system manager provides an easy to use command line interface ● Blivet is a new high level storage and file system library that will be used by anaconda and openlmi
  • 31. Future Red Hat Stack Overview Kernel Storage Target Low Level Tools: LVM, Device Mapper, FS Utilities Anaconda Storage System Manager (SSM) Vendor Specific Tools Hardware RAID Array Specific OVIRT OpenLMI LIBSTORAGEMGT BLIVET LIBLVM
  • 32. RHEL7 Local File System Updates
  • 33. RHEL7 Will Bring in More Choices ● RHEL 7 is looking to support ext4, XFS and btrfs ● All can be used for boot, system & data partitions ● Btrfs going through intense testing and qualification ● Ext2/Ext3 will be fully supported ● Use the ext4 driver which is mostly invisible to users ● Full support for all pNFS client layout types ● Add in support for vendors NAS boxes which support the pNFS object and block layouts
  • 34. RHEL7 Default File System ● In RHEL7, Red Hat is looking to make XFS the new default ● XFS will be the default for boot, root and user data partitions on all supported architectures ● Red Hat is working with partners and customers during this selection process to test and validate XFS ● Final decision will be made pending successful testing ● Evaluating maximum file system sizes for RHEL7
  • 35. XFS Strengths ● XFS is the reigning champion of larger servers and high end storage devices ● Tends to extract the most from the hardware ● Well tuned to multi-socket and multi-core servers ● XFS has a proven track record with the largest systems ● Carefully selected RHEL partners and customers run XFS up to 300TB in RHEL6 ● Popular base for enterprise NAS servers ● Including Red Hat Storage
  • 36. EXT4 Strengths ● Ext4 is very well know to system administrators and users ● Default file system in RHEL6 ● Closely related to ext3 our RHEL5 default ● Can outperform XFS in some specific workloads ● Single threaded, single disk workload with synchronous updates ● Avoid ext4 for larger storage ● Base file system for Android and Google File System
  • 37. BTRFS – The New File System Choice ● Integrates many block layer functions into the file system layer ● Logical volume management functions like snapshots ● Can do several versions of RAID ● Designed around ease of use and has sophisticated metadata ● Back references help map IO errors to file system objects ● Great choice for systems with lots of independent disks and no hardware RAID
  • 39. RHEL7 NFS Server Updates ● Red Hat Enterprise Linux 7.0 completes the server side support for NFS 4.1 ● Support for only-once semantics ● Callbacks use port 2049 ● No server side support for parallel NFS ... yet!
  • 40. Parallel NFS Updates ● Parallel NFS has three layout types ● Block layouts allow direct client access to SAN data ● Object layouts for direct access to the object backend ● File layout ● RHEL7.0 will add support for block and object layout types ● Will provide support for all enterprise pNFS servers!
  • 41. Support for SELinux over NFS ● Labeled NFS enable fine grained SELinux contexts ● Part of the NFS4.2 specification ● Use cases include ● Secure virtual machines stored on NFS server ● Restricted home directory access
  • 42. INTERNAL ONLY42 Red Hat Enterprise Linux 7.9 GFS2 New Features ● All of the previously mentioned RHEL6 features ● Streamlined journaling code ● Less memory overhead ● Less prone to pauses during low memory conditions ● New cluster stack interface (no gfs_controld) ● Performance co-pilot (PCP) support for glock statistics ● Faster fsck ● RAID stripe aware mkfs
  • 43. Pulling it all Together.... ● Ease of Use ● Tuning & automation of Local FS to LVM new features ● Thin provisioned storage ● Upgrade rollback ● Scalable snapshots ● Major focus on stability testing of btrfs ● Looking to see what use cases it fits best ● Harden XFS metadata ● Detect errors to confidently support 500TB single FS
  • 45. Performance Work: SBC-4 Commands ● SBC-4 is a reduced & optimized SCSI disk command set proposed in T10 ● Current command set is too large and supports too many odd devices (tapes, USB, etc) ● SBC-4 will support just disks and be optimized for low latency ● Probably needs a new SCSI disk drive ● New atomic write command from T10 ● All of the change or nothing happens ● Supported by some new devices like FusionIO already ● http://lwn.net/Articles/548116/
  • 46. Copy Offload System Calls ● Upstream kernel community has debated “copy offload” for several years ● Popular use case is VM guest image copy ● Proposal is to have one new system call ● int copy_range(int fd_in, loff_t, upos_in, int fd_out, loff_t upos_out, int count) ● Offload copy to SCSI devices, NFS or copy enabled file systems (like reflink in OCFS2 or btrfs) ● Patches for copy_range() posted by Zach Brown ● https://lkml.org/lkml/2013/5/14/622 ● http://lwn.net/Articles/548347
  • 47. IO Hints ● Storage device manufacturers want help from applications and the kernel ● Tag data with hints about streaming vs random, boot versus run time, critical data ● T10 standards body proposed SCSI versions which was voted down ● Suggestion raised to allow hints to be passed down via struct bio from file system to block layer ● Support for expanding fadvise() hints for applications ● No consensus on what hints to issue from the file or storage stack internally ● http://lwn.net/Articles/548296/
  • 48. Improving Error Return Codes? ● The interface from the IO subsystem up to the file system is pretty basic ● Low level device errors almost always propagate as EIO ● Causes file system to go offline or read only ● Makes it hard to do intelligent error handling at FS level ● Suggestion was to re-use select POSIX error codes to differentiate from temporary to permanent errors ● File system might retry on temporary errors ● Will know to give up immediately on others ● Challenge is that IO layer itself cannot always tell! ● http://lwn.net/Articles/548353
  • 50. Learn more about File Systems & Storage ● Attend related Summit Sessions ● Linux File Systems: Enabling Cutting-edge Features in Red Hat Enterprise Linux 6 & 7 (Wed 4:50) ● Kernel Storage & File System Demo Pod (Wed 5:30) ● Evolving & Improving RHEL NFS (Thurs 2:30) ● Parallel NFS: Storage Leaders & NFS Architects Panel (Thurs 3:40) ● Engage the community ● http://lwn.net ● Mailing lists: linux-ext4, linux-btrfs, linux-nfs, xfs@oss.sgi.com