SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
Mario Held mario.held@de.ibm.com
Martin Kammerer martin.kammerer@de.ibm.com
07/07/2010




 Linux on System z disk I/O performance




         visit us at http://www.ibm.com/developerworks/linux/linux390/perf/index.html

                                                                                    © 2010 IBM Corporation
Linux on System z Performance Evaluation




Agenda

 Linux
     –   file system
     –   special functions
     –   logical volumes
     –   monitoring
     –   High performance FICON
 Storage server
     – internal structure
     – recommended usage and configuration
 Host
     –   Linux I/O stack
     –   sample I/O attachment
     –   FICON/ECKD and FCP/SCSI I/O setups
     –   possible bottlenecks
     –   recommendations



2                     Linux on System z disk I/O performance   © 2010 IBM Corporation
Linux on System z Performance Evaluation




Linux file system

 Use ext3 instead of reiserfs
 Tune your ext3 file system
     – Select the appropriate journaling mode (journal, ordered, writeback)
     – Consider to turn off atime
     – Consider upcoming ext4
 Temporary files
     – Don't place them on journaling file systems
     – Consider a ram disk instead of a disk device
     – Consider tmpfs for temporary files
 If possible, use direct I/O to avoid double buffering of files in memory
 If possible, use async I/O to continue program execution while data is fetched
     – Important for read operations
     – Applications usually are not depending on write completions




3                     Linux on System z disk I/O performance                       © 2010 IBM Corporation
Linux on System z Performance Evaluation




I/O options

 Direct I/O (DIO)
     – Transfer the data directly from the application buffers to the device driver, avoids copying the
       data to the page cache
     – Advantages:
         • Saves page cache memory and avoids caching the same data twice
         • Allows larger buffer pools for databases
     – Disadvantage:
         • Make sure that no utility is working through the file system (page cache) --> danger of
           data corruption
         • The size of the I/O requests may be smaller
 Asynchronous I/O (AIO)
     – The application is not blocked for the time of the I/O operation
     – It resumes its processing and gets notified when the I/O is completed.
     – Advantage
          • the issuer of a read/write operation is no longer waiting until the request finishes.
          • reduces the number of I/O processes (saves memory and CPU)
 Recommendation is to use both
     – Database benchmark workloads improved throughput by 50%
4                     Linux on System z disk I/O performance                                   © 2010 IBM Corporation
Linux on System z Performance Evaluation




I/O schedulers (1)

 Four different I/O schedulers are available
     – noop scheduler
       only request merging
     – deadline scheduler
       avoids read request starvation
     – anticipatory scheduler (as scheduler)
       designed for the usage with physical disks, not intended for storage subsystems
     – complete fair queuing scheduler (cfq scheduler)
       all users of a particular drive would be able to execute about the same number of I/O requests
       over a given time.
 The default in current distributions is deadline
     – Don't change the setting to as in Linux on System z. The throughput will be impacted
       significantly




5                     Linux on System z disk I/O performance                              © 2010 IBM Corporation
Linux on System z Performance Evaluation




HowTo

 How to identify which I/O scheduler is used
     – Searching (grep) in file /var/log/boot.msg for phrase 'io scheduler' will find a line like:
        • <4>Using deadline io scheduler
     – which returns the name of the scheduler in use
 How to select the scheduler
     – The I/O scheduler is selected with the boot parameter elevator in zipl.conf
         •   [ipl]
         •   target = /boot/zipl
         •   image = /boot/image
         •   ramdisk = /boot/initrd
         •   parameters = "maxcpus=8 dasd=5000 root=/dev/dasda1 elevator=deadline”
         • where elevator ::= as | deadline | cfq | noop
 For more details see /usr/src/linux/Documentation/kernel-parameters.txt.




6                     Linux on System z disk I/O performance                                    © 2010 IBM Corporation
Linux on System z Performance Evaluation




Logical volumes

 Linear logical volumes allow an easy extension of the file system
 Striped logical volumes
     – Provide the capability to perform simultaneous I/O on different stripes
     – Allow load balancing
     – Are also extendable
 Don't use logical volumes for “/” or “/usr”
     – If the logical volume gets corrupted your system is gone
 Logical volumes require more CPU cycles than physical disks
     – Consumption increases with the number of physical disks used for the logical volume




7                     Linux on System z disk I/O performance                            © 2010 IBM Corporation
Linux on System z Performance Evaluation




Monitoring disk I/O
                                                                                                        Merged requests
                                                                                                        I/Os per second
                                                                                                        Throughput
                                                                                                        Request size
                                                                                                        Queue length
                                                                                                        Service times
                                                                                                        utilization

    Output from iostat -dmx 2 /dev/sda


    Device:           rrqm/s        wrqm/s           r/s        w/s   rMB/s   wMB/s avgrq-sz avgqu-sz   await   svctm   %util

    sda               0.00 10800.50    6.00 1370.00     0.02    46.70    69.54     7.03    5.11   0.41 56.50
    sequential write
    ---------------------------------------------------------------------------------------------------------

    sda               0.00 19089.50   21.50 495.00      0.08    76.59   304.02     6.35   12.31   1.85 95.50
    sequential write
    ---------------------------------------------------------------------------------------------------------

    sda               0.00 15610.00 259.50 1805.50      1.01    68.14    68.58    26.07   12.23   0.48 100.00
    random write
    ---------------------------------------------------------------------------------------------------------

    sda             1239.50    0.00 538.00     0.00   115.58     0.00   439.99     4.96    9.23   1.77 95.00
    sequential read
    ---------------------------------------------------------------------------------------------------------

    sda               227.00           0.00 2452.00            0.00   94.26   0.00    78.73     1.76     0.72   0.32    78.50
    random read

8                     Linux on System z disk I/O performance                                                        © 2010 IBM Corporation
Linux on System z Performance Evaluation




High Performance FICON

 Advantages:
     – Provides a simpler link protocol than FICON
     – Is capable of sending multiple channel commands to the control unit in a single entity
 Performance expectation:
     – Should improve performance of database OLTP workloads
 Prerequisites:
     –   z10 GA2 or newer
     –   FICON Express2 or newer
     –   DS8000 R4.1 or newer
     –   DS8000 High Performance FICON feature
     –   A Linux dasd driver that supports read/write track data and HPF
           • Included in SLES11 and future Linux distributions, service packs or updates)




9                     Linux on System z disk I/O performance                                © 2010 IBM Corporation
Linux on System z Performance Evaluation




High Performance FICON results (1)

 For sequential write throughput improves with read/write track data up to 1.2x

                                    Throughput for sequential write [MB/s]
     1200

     1000
                                                                                    old dasd I/O
        800                                                                         new read/write track
                                                                                    data
                                                                                    new HPF (includes
        600                                                                         read/write track data)


        400

        200

            0
                    1               2              4             8   16   32   64
                                                  number of processes
10                      Linux on System z disk I/O performance                                    © 2010 IBM Corporation
Linux on System z Performance Evaluation




High Performance FICON results (2)

 For sequential read throughput improves with read/write track data up to 1.6x

                                             Throughput for sequential read [MB/s]
     2500


     2000
                                                                                    old dasd I/O
                                                                                    new read/write track
     1500                                                                           data
                                                                                    new HPF (includes
                                                                                    read/write/track data)

     1000


        500


            0
                    1               2              4             8   16   32   64
                                                  number of processes
11                      Linux on System z disk I/O performance                                    © 2010 IBM Corporation
Linux on System z Performance Evaluation




High Performance FICON results (3)

 With small record sizes, as e.g. used by databases, HPF throughput improves up to 2.2x
  versus old Linux channel programs
                               Throughput for random reader [MB/s]
     1000
        900
        800
                                                                                    old dasd I/O
        700                                                                         new read/write track
        600                                                                         data
                                                                                    new HPF (includes
        500                                                                         read/write track data)

        400
        300
        200
        100
            0
                    1               2              4             8   16   32   64
                                                  number of processes
12                      Linux on System z disk I/O performance                                     © 2010 IBM Corporation
Linux on System z Performance Evaluation




DS8000 Disk setup

 Don't treat a storage server as a black box, understand its structure
 Principles apply to other storage vendor products as well
 Several conveniently selected disks instead of one single disk can speed up the sequential
  read/write performance to more than a triple. Use the logical volume manager to set up the
  disks.
 Avoid using subsequent disk addresses in a storage server (e.g. the addresses 5100, 5101,
  5102,.. in an IBM Storage Server), because
     – they use the same rank
     – they use the same device adapter.
 If you ask for 16 disks and your system administrator gives you addresses 5100-510F
     – From a performance perspective this is close to the worst case




13                    Linux on System z disk I/O performance                      © 2010 IBM Corporation
Linux on System z Performance Evaluation




DS8000 Architecture
                                                               ●
                                                                   structure is complex
                                                                     - disks are connected via two internal
                                                                       FCP switches for higher bandwidth
                                                               ●
                                                                   the DS8000 is still divided into two parts
                                                                     named processor complex or just server
                                                                     - caches are organized per server
                                                               ●
                                                                   one device adapter pair addresses
                                                                     4 array sites
                                                               ●
                                                                   one array site is build from 8 disks
                                                                     - disks are distributed over the front and rear
                                                                       storage enclosures
                                                                     - have the same color in the chart
                                                               ●
                                                                   one RAID array is defined using one
                                                                     array site
                                                               ●
                                                                   one rank is built using one RAID array
                                                               ●
                                                                   ranks are assigned to an extent pool
                                                               ●
                                                                   extent pools are assigned to one of the
                                                                    servers
                                                                    - this assigns also the caches
                                                               ●
                                                                   the rules are the same as for ESS
                                                                    - one disk range resides in one extent pool




14                    Linux on System z disk I/O performance                                      © 2010 IBM Corporation
Linux on System z Performance Evaluation




Rules for selecting disks

 This makes it fast
     – Use as many paths as possible (CHPID -> host adapter)
     – Spread the host adapters used across all host adapter bays
        • For ECKD switching of the paths is done automatically
        • FCP needs a fixed relation between disk and path
                                    – Use Linux multipathing for load balancing
     –   Select disks from as many ranks as possible!
     –   Switch the rank for each new disk in a logical volume
     –   Switch the ranks used between servers and device adapters
     –   Avoid reusing the same resource (path, server, device adapter, and disk) as long as possible
 Goal is to get a balanced load on all paths and physical disks
 In addition striped Linux logical volumes and / or storage pool striping may help to increase
  the overall throghput.




15                    Linux on System z disk I/O performance                               © 2010 IBM Corporation
Linux on System z Performance Evaluation




Linux kernel components involved in disk I/O
               Application program                             Issuing reads and writes


                                                               VFS dispatches requests to different devices and
           Virtual File System (VFS)                           Translates to sector addressing

     Logical Volume Manager (LVM)                              LVM defines the physical to logical device relation
                Multipath                                      Multipath sets the multipath policies
          Device mapper (dm)                                   dm holds the generic mapping of all block devices and
          Block device layer                                   performs 1:n mapping for logical volumes and/or multipath

                      Page cache                               Page cache contains all file I/O data, direct I/O bypasses the
                                                               page cache
                     I/O scheduler                             I/O schedulers merge, order and queue requests, start
                                                               device drivers via (un)plug device

                                                               Data transfer handling

     Direct Acces Storage Device driver                            Small Computer System Interface driver (SCSI)

                                            Device drivers          z Fiber Channel Protocol driver (zFCP)
                                                                     Queued Direct I/O driver (qdio)
16                    Linux on System z disk I/O performance                                                  © 2010 IBM Corporation
Linux on System z Performance Evaluation




Disk I/O attachment and storage subsystem

 Each disk is a configured volume in an
  extent pool (here extent pool = rank)                                         Device    ranks
 An extent pool can be configured for                                          Adapter
  either ECKD dasds or SCSI LUNs                                                                  1         ECKD
 Host bus adapters connect internally to




                                                                     Server 0
  both servers                                                                                    2         SCSI
                   Channel                               Host Bus
                    path 1                               Adapter 1                                          ECKD
                                                                                                  3
                   Channel                               Host Bus
                                                                                                  4         SCSI
                                         Switch




                    path 2                               Adapter 2

                   Channel                               Host Bus                                 5         ECKD
                    path 3                               Adapter 3
                                                                     Server 1
                                                                                                  6         SCSI
                   Channel                               Host Bus
                    path 4                               Adapter 4
                                                                                                  7         ECKD
                 FICON
                 Express                                                                          8         SCSI
17                    Linux on System z disk I/O performance                                      © 2010 IBM Corporation
Linux on System z Performance Evaluation




I/O processing characteristics

 FICON/ECKD:
     –   1:1 mapping host subchannel:dasd
     –   Serialization of I/Os per subchannel
     –   I/O request queue in Linux
     –   Disk blocks are 4KB
     –   High availability by FICON path groups
     –   Load balancing by FICON path groups and Parallel Access Volumes
 FCP/SCSI
     –   Several I/Os can be issued against a LUN immediately
     –   Queuing in the FICON Express card and/or in the storage server
     –   Additional I/O request queue in Linux
     –   Disk blocks are 512 bytes
     –   High availability by Linux multipathing, type failover or multibus
     –   Load balancing by Linux multipathing, type multibus




18                    Linux on System z disk I/O performance                  © 2010 IBM Corporation
Linux on System z Performance Evaluation




General layout for FICON/ECKD

                                                      The dasd driver starts the I/O on a subchannel
     Application program
                                                      Each subchannel connects to all channel paths in the path group
                                                      Each channel connects via a switch to a host bus adapter
                 VFS
                                                      A host bus adapter connects to both servers
                                                      Each server connects to its ranks
              LVM
             Multipath                                                                                     DA   ranks
               dm
     Block device layer                                 Channel                                                                     1
                                                       subsystem




                                                                                                Server 0
           Page cache
          I/O scheduler                                 a          chpid 1            HBA 1
                                                                                                                                    3
                                                        b          chpid 2            HBA 2
            dasd driver                                                      Switch
                                                        c          chpid 3            HBA 3                                         5




                                                                                                Server 1
                                                        d          chpid 4            HBA 4
                                                                                                                                    7

19                    Linux on System z disk I/O performance                                                     © 2010 IBM Corporation
Linux on System z Performance Evaluation




FICON/ECKD dasd I/O to a single disk

                                                      Assume that subchannel a corresponds to disk 2 in rank 1
     Application program                              The full choice of host adapters can be used
                                                      Only one I/O can be issued at a time through subchannel a
                 VFS                                  All other I/Os need to be queued in the dasd driver and in the block device layer until
                                                       the subchannel is no longer busy with the preceding I/O


                                                                                                                     DA   ranks
     Block device layer                                 Channel                                                                                 1
                                                       subsystem




                                                                                                          Server 0
           Page cache
          I/O scheduler                                 a          chpid 1                HBA 1
                                                                                                                                                3
                                                                   chpid 2                HBA 2
            dasd driver                                 !                        Switch
                                                                   chpid 3                HBA 3                                                 5




                                                                                                          Server 1
                                                                   chpid 4                HBA 4
                                                                                                                                                7

20                    Linux on System z disk I/O performance                                                                 © 2010 IBM Corporation
Linux on System z Performance Evaluation




FICON/ECKD dasd I/O to a single disk with PAV (SLES10 / RHEL5)

                                                      VFS sees one device
     Application program                              The device mapper sees the real device and all alias devices
                                                      Parallel Access Volumes solve the I/O queuing in front of the subchannel
                                                      Each alias device uses its own subchannel
                 VFS                                  Additional processor cycles are spent to do the load balancing in the device mapper
                                                      The next slowdown is the fact that only one disk is used in the storage server. This implies the use of
                                                       only one rank, one device adapter, one server

             Multipath
               dm                   !                                                                                          DA   ! ranks
     Block device layer                                 Channel                                                                                              1
                                                       subsystem




                                                                                                                    Server 0
           Page cache
          I/O scheduler                                 a             chpid 1                    HBA 1
                                                                                                                                                             3
                                                        b             chpid 2                    HBA 2
            dasd driver                                                                Switch
                                                        c             chpid 3                    HBA 3                                                       5




                                                                                                                    Server 1
                                                        d             chpid 4                    HBA 4
                                                                                                                                                             7

21                    Linux on System z disk I/O performance                                                                              © 2010 IBM Corporation
Linux on System z Performance Evaluation




FICON/ECKD dasd I/O to a single disk with HyperPAV (SLES11)

                                                    VFS sees one device
     Application program                            The dasd driver sees the real device and all alias devices
                                                    Load balancing with HyperPAV and static PAV is done in the dasd driver. The aliases need only to be
                                                     added to Linux. The load balancing works better than on the device mapper layer.
                 VFS                                Less additional processor cycles are needed than with Linux multipath.
                                                    The next slowdown is the fact that only one disk is used in the storage server. This implies the use of
                                                     only one rank, one device adapter, one server


                                                                                                                           DA   ! ranks
     Block device layer                               Channel                                                                                             1
                                                     subsystem




                                                                                                                Server 0
           Page cache
          I/O scheduler                                 a           chpid 1                   HBA 1
                                                                                                                                                          3
                                                        b           chpid 2                   HBA 2
            dasd driver                                                             Switch
                                                        c           chpid 3                   HBA 3                                                       5




                                                                                                                Server 1
                                                        d           chpid 4                   HBA 4
                                                                                                                                                          7

22                    Linux on System z disk I/O performance                                                                           © 2010 IBM Corporation
Linux on System z Performance Evaluation




FICON/ECKD dasd I/O to a linear or striped logical volume

                                                     VFS sees one device (logical volume)
     Application program                             The device mapper sees the logical volume and the physical volumes
                                                     Additional processor cycles are spent to map the I/Os to the physical volumes.
                                                     Striped logical volumes require more additional processor cycles than linear logical volumes
                 VFS                                 With a striped logical volume the I/Os can be well balanced over the entire storage server and overcome limitations
                                                      from a single rank, a single device adapter or a single server
                                                     To ensure that I/O to one physical disk is not limited by one subchannel, PAV or HyperPAV should be used in
                                                      combination with logical volumes
                LVM
                                 !                                                                                                 DA        ranks
                  dm
     Block device layer                                  Channel                                                                                                    1
                                                        subsystem




                                                                                                                        Server 0
           Page cache
          I/O scheduler                                  a             chpid 1                     HBA 1
                                                                                                                                                                    3
                                                         b             chpid 2                     HBA 2
            dasd driver                                                                  Switch
                                                         c             chpid 3                     HBA 3                                                            5




                                                                                                                        Server 1
                                                         d             chpid 4                     HBA 4
                                                                                                                                                                    7

23                     Linux on System z disk I/O performance                                                                                    © 2010 IBM Corporation
Linux on System z Performance Evaluation




FICON/ECKD dasd I/O to a storage pool striped volume with
HyperPAV (SLES11)

                                                    A storage pool striped volume makes only sense in combination with PAV or HyperPAV
     Application program                            VFS sees one device
                                                     The dasd driver sees the real device and all alias devices
                                                   
                                                    The storage pool striped volume spans over several ranks of one server and overcomes the limitations of a single rank and / or a
                                                     single device adapter
                 VFS                                Storage pool striped volumes can also be used as physical disks for a logical volume to use both server sides.
                                                    To ensure that I/O to one dasd is not limited by one subchannel, PAV or HyperPAV should be used




                                                                                                                                 ! DA                  ranks
     Block device layer                                 Channel                                                                                                                 1
                                                       subsystem




                                                                                                                                Server 0
           Page cache
          I/O scheduler                                 a               chpid 1                         HBA 1
                                                                                                                                                 extent pool                    3
                                                        b               chpid 2                         HBA 2
            dasd driver                                                                      Switch
                                                        c               chpid 3                         HBA 3                                                                   5




                                                                                                                                Server 1
                                                        d               chpid 4                         HBA 4
                                                                                                                                                                                7

24                    Linux on System z disk I/O performance                                                                                               © 2010 IBM Corporation
Linux on System z Performance Evaluation




General layout for FCP/SCSI

                                                      The SCSI driver finalizes the I/O requests
     Application program
                                                      The zFCP driver adds the FCP protocol to the requests
                                                      The qdio driver transfers the I/O to the channel
                 VFS
                                                      A host bus adapter connects to both servers
                                                      Each server connects to its ranks
              LVM
             Multipath                                                                                     DA   ranks
               dm
     Block device layer




                                                                                                Server 0
           Page cache
                                                                                                                                    2
          I/O scheduler                                          chpid 1             HBA 1

                                                                 chpid 2             HBA 2                                          4
             SCSI driver                                                    Switch
             zFCP driver
                                                                 chpid 3             HBA 3
              qdio driver




                                                                                                Server 1
                                                                                                                                    6
                                                                 chpid 4             HBA 4


                                                                                                                                    8
25                    Linux on System z disk I/O performance                                                     © 2010 IBM Corporation
Linux on System z Performance Evaluation




FCP/SCSI LUN I/O to a single disk

                                                      Assume that disk 3 in rank 8 is reachable via channel 2 and host bus adapter 2
     Application program                              Up to 32 (default value) I/O requests can be sent out to disk 3 before the first completion is required
                                                      The throughput will be limited by the rank and / or the device adapter
                                                      There is no high availability provided for the connection between the host and the storage server
                 VFS


                                                                                                                                     DA         ranks
     Block device layer




                                                                                                                          Server 0
           Page cache
                                                                                                                                                                      2
          I/O scheduler                                                 chpid 1                      HBA 1

                                                                        chpid 2                      HBA 2                                                            4
             SCSI driver                                                                   Switch
             zFCP driver
                                                                        chpid 3                      HBA 3
              qdio driver
                                                          !



                                                                                                                          Server 1
                                                                                                                                                                      6
                                                                        chpid 4                      HBA 4


                                                                                                                                                                      8
26                    Linux on System z disk I/O performance
                                                                                                                                           !       © 2010 IBM Corporation
Linux on System z Performance Evaluation




FCP/SCSI LUN I/O to a single disk with multipathing

                                                    VFS sees one device
     Application program                            The device mapper sees the multibus or failover alternatives to the same disk
                                                    Administrational effort is required to define all paths to one disk
                 VFS                                Additional processor cycles are spent to do the mapping to the desired path for the disk
                                                     in the device mapper
                                                    Multibus requires more additonal processor cycles than failover

             Multipath                                                                                            DA       ranks
               dm               !
     Block device layer




                                                                                                       Server 0
           Page cache
                                                                                                                                               2
          I/O scheduler                                          chpid 1                HBA 1

                                                                 chpid 2                HBA 2                                                  4
             SCSI driver                                                       Switch
             zFCP driver
                                                                 chpid 3                HBA 3
              qdio driver




                                                                                                       Server 1
                                                                                                                                               6
                                                                 chpid 4                HBA 4


                                                                                                                                               8
27                    Linux on System z disk I/O performance
                                                                                                                       !    © 2010 IBM Corporation
Linux on System z Performance Evaluation




FCP/SCSI LUN I/O to a to a linear or striped logical volume

                                                     VFS sees one device (logical volume)
     Application program                             The device mapper sees the logical volume and the physical volumes
                                                     Additional processor cycles are spent to do map the I/Os to the physical volumes.
                                                     Striped logical volumes require more additional processor cycles than linear logical volumes
                 VFS                                 With a striped logical volume the I/Os can be well balanced over the entire storage server and
                                                      overcome limitations from a single rank, a single device adapter or a single server
                                                     To ensure high availability the logical volume should be used in combination with multipathing
                LVM
                                 !                                                                                      DA    ranks
                  dm
     Block device layer




                                                                                                             Server 0
           Page cache
                                                                                                                                                    2
          I/O scheduler                                            chpid 1                  HBA 1

                                                                   chpid 2                  HBA 2                                                   4
             SCSI driver                                                           Switch
             zFCP driver
                                                                   chpid 3                  HBA 3
              qdio driver




                                                                                                             Server 1
                                                                                                                                                    6
                                                                   chpid 4                  HBA 4


                                                                                                                                                    8
28                     Linux on System z disk I/O performance                                                                    © 2010 IBM Corporation
Linux on System z Performance Evaluation




FCP/SCSI LUN I/O to a storage pool striped volume with
multipathing

                                                    Storage pool striped volumes make no sense without high availability
     Application program                            VFS sees one device
                                                    The device mapper sees the multibus or failover alternatives to the same disk
                                                     The storage pool striped volume spans over several ranks of one server and overcomes the limitations
                 VFS                               
                                                     of a single rank and / or a single device adapter
                                                    Storage pool striped volumes can also be used as physical disks for a logical volume to make use of
                                                     both server sides.

             Multipath                                                                                                   DA     ranks
               dm               !
     Block device layer




                                                                                                              Server 0
           Page cache
                                                                                                                                                       2
          I/O scheduler                                            chpid 1                  HBA 1

                                                                   chpid 2                  HBA 2                                                      4
             SCSI driver                                                           Switch
             zFCP driver
                                                                   chpid 3                  HBA 3
              qdio driver




                                                                                                              Server 1
                                                                                                                                                       6
                                                                   chpid 4                  HBA 4


                                                                                                                              extent pool              8
29                    Linux on System z disk I/O performance
                                                                                                                !                   © 2010 IBM Corporation
Linux on System z Performance Evaluation




Summary, recommendations and outlook

 FICON/ECKD
     –   Storage pool striped disks (no disk placement)
     –   HyperPAV (SLES11)
     –   Large volume (RHEL5, SLES11)
     –   High Performance FICON (SLES11)
 FCP/SCSI
     – Linux LV with striping (disk placement)
     – Multipath with failover




30                    Linux on System z disk I/O performance   © 2010 IBM Corporation
Linux on System z Performance Evaluation



Trademarks



IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
“Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United
States, other countries, or both.

Other product and service names might be trademarks of IBM or other companies.




Source: If applicable, describe source origin

31                            Linux on System z disk I/O performance                                © 2010 IBM Corporation

Mais conteúdo relacionado

Mais procurados (15)

Smpe
SmpeSmpe
Smpe
 
03unixintro2
03unixintro203unixintro2
03unixintro2
 
Ch14 system administration
Ch14 system administration Ch14 system administration
Ch14 system administration
 
Cpu spec
Cpu specCpu spec
Cpu spec
 
DB2 Accounting Reporting
DB2  Accounting ReportingDB2  Accounting Reporting
DB2 Accounting Reporting
 
Computer Hardware & Software Lab Manual 3
Computer Hardware & Software Lab Manual 3Computer Hardware & Software Lab Manual 3
Computer Hardware & Software Lab Manual 3
 
06threadsimp
06threadsimp06threadsimp
06threadsimp
 
What is Bootloader???
What is Bootloader???What is Bootloader???
What is Bootloader???
 
01intro
01intro01intro
01intro
 
Intel Roadmap 2010
Intel Roadmap 2010Intel Roadmap 2010
Intel Roadmap 2010
 
Kavi
KaviKavi
Kavi
 
Kavi
KaviKavi
Kavi
 
03 Hadoop
03 Hadoop03 Hadoop
03 Hadoop
 
Ch18 system administration
Ch18 system administration Ch18 system administration
Ch18 system administration
 
Router internals
Router internalsRouter internals
Router internals
 

Destaque

Impacto de las tic en el aula
Impacto de las tic en el aulaImpacto de las tic en el aula
Impacto de las tic en el aulaBladimir Hoyos
 
Rubrica del proyecto tarea 5
Rubrica del proyecto tarea 5Rubrica del proyecto tarea 5
Rubrica del proyecto tarea 5Bladimir Hoyos
 
Presentación curso itsm cap4
Presentación curso itsm cap4Presentación curso itsm cap4
Presentación curso itsm cap4Bladimir Hoyos
 
Presentación curso itsm cap8
Presentación curso itsm cap8Presentación curso itsm cap8
Presentación curso itsm cap8Bladimir Hoyos
 
Presentación curso itsm cap11
Presentación curso itsm cap11Presentación curso itsm cap11
Presentación curso itsm cap11Bladimir Hoyos
 
Presentación curso itsm cap10
Presentación curso itsm cap10Presentación curso itsm cap10
Presentación curso itsm cap10Bladimir Hoyos
 
IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation Joe Krotz
 
IBM Flash System® Family webinar 9 de noviembre de 2016
IBM Flash System® Family   webinar 9 de noviembre de 2016 IBM Flash System® Family   webinar 9 de noviembre de 2016
IBM Flash System® Family webinar 9 de noviembre de 2016 Diego Alberto Tamayo
 
Storage Spectrum and Cloud deck late 2016
Storage Spectrum and Cloud deck late 2016Storage Spectrum and Cloud deck late 2016
Storage Spectrum and Cloud deck late 2016Joe Krotz
 
XIV Storage deck final
XIV Storage deck finalXIV Storage deck final
XIV Storage deck finalJoe Krotz
 
FlashSystem February 2017
FlashSystem February 2017FlashSystem February 2017
FlashSystem February 2017Joe Krotz
 
Storwize SVC presentation February 2017
Storwize SVC presentation February 2017Storwize SVC presentation February 2017
Storwize SVC presentation February 2017Joe Krotz
 
Masters stretched svc-cluster-2012-04-13 v2
Masters stretched svc-cluster-2012-04-13 v2Masters stretched svc-cluster-2012-04-13 v2
Masters stretched svc-cluster-2012-04-13 v2solarisyougood
 
Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718brettallison
 
Xiv svc best practices - march 2013
Xiv   svc best practices - march 2013Xiv   svc best practices - march 2013
Xiv svc best practices - march 2013Jinesh Shah
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysisbrettallison
 
Ibm flash system v9000 technical deep dive workshop
Ibm flash system v9000 technical deep dive workshopIbm flash system v9000 technical deep dive workshop
Ibm flash system v9000 technical deep dive workshopsolarisyougood
 

Destaque (20)

Impacto de las tic en el aula
Impacto de las tic en el aulaImpacto de las tic en el aula
Impacto de las tic en el aula
 
Rubrica del proyecto tarea 5
Rubrica del proyecto tarea 5Rubrica del proyecto tarea 5
Rubrica del proyecto tarea 5
 
3487570
34875703487570
3487570
 
Presentación curso itsm cap4
Presentación curso itsm cap4Presentación curso itsm cap4
Presentación curso itsm cap4
 
Presentación curso itsm cap8
Presentación curso itsm cap8Presentación curso itsm cap8
Presentación curso itsm cap8
 
Presentación curso itsm cap11
Presentación curso itsm cap11Presentación curso itsm cap11
Presentación curso itsm cap11
 
Presentación curso itsm cap10
Presentación curso itsm cap10Presentación curso itsm cap10
Presentación curso itsm cap10
 
IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation IBM FlashSystems A9000/R presentation
IBM FlashSystems A9000/R presentation
 
IBM Flash System® Family webinar 9 de noviembre de 2016
IBM Flash System® Family   webinar 9 de noviembre de 2016 IBM Flash System® Family   webinar 9 de noviembre de 2016
IBM Flash System® Family webinar 9 de noviembre de 2016
 
Storage Spectrum and Cloud deck late 2016
Storage Spectrum and Cloud deck late 2016Storage Spectrum and Cloud deck late 2016
Storage Spectrum and Cloud deck late 2016
 
XIV Storage deck final
XIV Storage deck finalXIV Storage deck final
XIV Storage deck final
 
FlashSystem February 2017
FlashSystem February 2017FlashSystem February 2017
FlashSystem February 2017
 
Storwize SVC presentation February 2017
Storwize SVC presentation February 2017Storwize SVC presentation February 2017
Storwize SVC presentation February 2017
 
Masters stretched svc-cluster-2012-04-13 v2
Masters stretched svc-cluster-2012-04-13 v2Masters stretched svc-cluster-2012-04-13 v2
Masters stretched svc-cluster-2012-04-13 v2
 
Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718
 
Xiv overview
Xiv overviewXiv overview
Xiv overview
 
Xiv svc best practices - march 2013
Xiv   svc best practices - march 2013Xiv   svc best practices - march 2013
Xiv svc best practices - march 2013
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysis
 
IBM XIV Gen3 Storage System
IBM XIV Gen3 Storage SystemIBM XIV Gen3 Storage System
IBM XIV Gen3 Storage System
 
Ibm flash system v9000 technical deep dive workshop
Ibm flash system v9000 technical deep dive workshopIbm flash system v9000 technical deep dive workshop
Ibm flash system v9000 technical deep dive workshop
 

Semelhante a Linux on System z disk I/O performance

Virtualisation overview
Virtualisation overviewVirtualisation overview
Virtualisation overviewsagaroceanic11
 
Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...
Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...
Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...IBM India Smarter Computing
 
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso MainframeVisão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso MainframeAnderson Bassani
 
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalShak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalTommy Lee
 
We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT Group
 
Z109889 z4 r-storage-dfsms-vegas-v1910b
Z109889 z4 r-storage-dfsms-vegas-v1910bZ109889 z4 r-storage-dfsms-vegas-v1910b
Z109889 z4 r-storage-dfsms-vegas-v1910bTony Pearson
 
제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustre제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustreTommy Lee
 
Db2 for z/OS and FlashCopy - Practical use cases (June 2019 Edition)
Db2 for z/OS and FlashCopy - Practical use cases (June 2019 Edition)Db2 for z/OS and FlashCopy - Practical use cases (June 2019 Edition)
Db2 for z/OS and FlashCopy - Practical use cases (June 2019 Edition)Florence Dubois
 
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance ConsiderationsShawn Wells
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services
 
Presentation aix performance updates &amp; issues
Presentation   aix performance updates &amp; issuesPresentation   aix performance updates &amp; issues
Presentation aix performance updates &amp; issuessolarisyougood
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
 
Z4R: Intro to Storage and DFSMS for z/OS
Z4R: Intro to Storage and DFSMS for z/OSZ4R: Intro to Storage and DFSMS for z/OS
Z4R: Intro to Storage and DFSMS for z/OSTony Pearson
 
Z109889 z4 r-storage-dfsms-jburg-v1909d
Z109889 z4 r-storage-dfsms-jburg-v1909dZ109889 z4 r-storage-dfsms-jburg-v1909d
Z109889 z4 r-storage-dfsms-jburg-v1909dTony Pearson
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceGlenn K. Lockwood
 
Leveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN PerformanceLeveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN Performancebrettallison
 
Operating Systems (slides)
Operating Systems (slides)Operating Systems (slides)
Operating Systems (slides)wx672
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.pptzagreb2
 

Semelhante a Linux on System z disk I/O performance (20)

Virtualisation overview
Virtualisation overviewVirtualisation overview
Virtualisation overview
 
Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...
Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...
Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...
 
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso MainframeVisão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
Visão geral do hardware do servidor System z e Linux on z - Concurso Mainframe
 
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalShak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-final
 
We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster
 
Z109889 z4 r-storage-dfsms-vegas-v1910b
Z109889 z4 r-storage-dfsms-vegas-v1910bZ109889 z4 r-storage-dfsms-vegas-v1910b
Z109889 z4 r-storage-dfsms-vegas-v1910b
 
IO Dubi Lebel
IO Dubi LebelIO Dubi Lebel
IO Dubi Lebel
 
제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustre제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustre
 
Db2 for z/OS and FlashCopy - Practical use cases (June 2019 Edition)
Db2 for z/OS and FlashCopy - Practical use cases (June 2019 Edition)Db2 for z/OS and FlashCopy - Practical use cases (June 2019 Edition)
Db2 for z/OS and FlashCopy - Practical use cases (June 2019 Edition)
 
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
Presentation aix performance updates &amp; issues
Presentation   aix performance updates &amp; issuesPresentation   aix performance updates &amp; issues
Presentation aix performance updates &amp; issues
 
Refining Linux
Refining LinuxRefining Linux
Refining Linux
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
Z4R: Intro to Storage and DFSMS for z/OS
Z4R: Intro to Storage and DFSMS for z/OSZ4R: Intro to Storage and DFSMS for z/OS
Z4R: Intro to Storage and DFSMS for z/OS
 
Z109889 z4 r-storage-dfsms-jburg-v1909d
Z109889 z4 r-storage-dfsms-jburg-v1909dZ109889 z4 r-storage-dfsms-jburg-v1909d
Z109889 z4 r-storage-dfsms-jburg-v1909d
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
 
Leveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN PerformanceLeveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN Performance
 
Operating Systems (slides)
Operating Systems (slides)Operating Systems (slides)
Operating Systems (slides)
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.ppt
 

Mais de IBM India Smarter Computing

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments IBM India Smarter Computing
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...IBM India Smarter Computing
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceIBM India Smarter Computing
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM India Smarter Computing
 

Mais de IBM India Smarter Computing (20)

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments
 
All-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage EfficiencyAll-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage Efficiency
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
 
IBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product GuideIBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product Guide
 
IBM System x3250 M5
IBM System x3250 M5IBM System x3250 M5
IBM System x3250 M5
 
IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4
 
IBM System x3650 M4 HD
IBM System x3650 M4 HDIBM System x3650 M4 HD
IBM System x3650 M4 HD
 
IBM System x3300 M4
IBM System x3300 M4IBM System x3300 M4
IBM System x3300 M4
 
IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4
 
IBM System x3500 M4
IBM System x3500 M4IBM System x3500 M4
IBM System x3500 M4
 
IBM System x3550 M4
IBM System x3550 M4IBM System x3550 M4
IBM System x3550 M4
 
IBM System x3650 M4
IBM System x3650 M4IBM System x3650 M4
IBM System x3650 M4
 
IBM System x3500 M3
IBM System x3500 M3IBM System x3500 M3
IBM System x3500 M3
 
IBM System x3400 M3
IBM System x3400 M3IBM System x3400 M3
IBM System x3400 M3
 
IBM System x3250 M3
IBM System x3250 M3IBM System x3250 M3
IBM System x3250 M3
 
IBM System x3200 M3
IBM System x3200 M3IBM System x3200 M3
IBM System x3200 M3
 
IBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and ConfigurationIBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and Configuration
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization Performance
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architecture
 
X6: The sixth generation of EXA Technology
X6: The sixth generation of EXA TechnologyX6: The sixth generation of EXA Technology
X6: The sixth generation of EXA Technology
 

Linux on System z disk I/O performance

  • 1. Mario Held mario.held@de.ibm.com Martin Kammerer martin.kammerer@de.ibm.com 07/07/2010 Linux on System z disk I/O performance visit us at http://www.ibm.com/developerworks/linux/linux390/perf/index.html © 2010 IBM Corporation
  • 2. Linux on System z Performance Evaluation Agenda  Linux – file system – special functions – logical volumes – monitoring – High performance FICON  Storage server – internal structure – recommended usage and configuration  Host – Linux I/O stack – sample I/O attachment – FICON/ECKD and FCP/SCSI I/O setups – possible bottlenecks – recommendations 2 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 3. Linux on System z Performance Evaluation Linux file system  Use ext3 instead of reiserfs  Tune your ext3 file system – Select the appropriate journaling mode (journal, ordered, writeback) – Consider to turn off atime – Consider upcoming ext4  Temporary files – Don't place them on journaling file systems – Consider a ram disk instead of a disk device – Consider tmpfs for temporary files  If possible, use direct I/O to avoid double buffering of files in memory  If possible, use async I/O to continue program execution while data is fetched – Important for read operations – Applications usually are not depending on write completions 3 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 4. Linux on System z Performance Evaluation I/O options  Direct I/O (DIO) – Transfer the data directly from the application buffers to the device driver, avoids copying the data to the page cache – Advantages: • Saves page cache memory and avoids caching the same data twice • Allows larger buffer pools for databases – Disadvantage: • Make sure that no utility is working through the file system (page cache) --> danger of data corruption • The size of the I/O requests may be smaller  Asynchronous I/O (AIO) – The application is not blocked for the time of the I/O operation – It resumes its processing and gets notified when the I/O is completed. – Advantage • the issuer of a read/write operation is no longer waiting until the request finishes. • reduces the number of I/O processes (saves memory and CPU)  Recommendation is to use both – Database benchmark workloads improved throughput by 50% 4 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 5. Linux on System z Performance Evaluation I/O schedulers (1)  Four different I/O schedulers are available – noop scheduler only request merging – deadline scheduler avoids read request starvation – anticipatory scheduler (as scheduler) designed for the usage with physical disks, not intended for storage subsystems – complete fair queuing scheduler (cfq scheduler) all users of a particular drive would be able to execute about the same number of I/O requests over a given time.  The default in current distributions is deadline – Don't change the setting to as in Linux on System z. The throughput will be impacted significantly 5 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 6. Linux on System z Performance Evaluation HowTo  How to identify which I/O scheduler is used – Searching (grep) in file /var/log/boot.msg for phrase 'io scheduler' will find a line like: • <4>Using deadline io scheduler – which returns the name of the scheduler in use  How to select the scheduler – The I/O scheduler is selected with the boot parameter elevator in zipl.conf • [ipl] • target = /boot/zipl • image = /boot/image • ramdisk = /boot/initrd • parameters = "maxcpus=8 dasd=5000 root=/dev/dasda1 elevator=deadline” • where elevator ::= as | deadline | cfq | noop  For more details see /usr/src/linux/Documentation/kernel-parameters.txt. 6 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 7. Linux on System z Performance Evaluation Logical volumes  Linear logical volumes allow an easy extension of the file system  Striped logical volumes – Provide the capability to perform simultaneous I/O on different stripes – Allow load balancing – Are also extendable  Don't use logical volumes for “/” or “/usr” – If the logical volume gets corrupted your system is gone  Logical volumes require more CPU cycles than physical disks – Consumption increases with the number of physical disks used for the logical volume 7 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 8. Linux on System z Performance Evaluation Monitoring disk I/O Merged requests I/Os per second Throughput Request size Queue length Service times utilization Output from iostat -dmx 2 /dev/sda Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 10800.50 6.00 1370.00 0.02 46.70 69.54 7.03 5.11 0.41 56.50 sequential write --------------------------------------------------------------------------------------------------------- sda 0.00 19089.50 21.50 495.00 0.08 76.59 304.02 6.35 12.31 1.85 95.50 sequential write --------------------------------------------------------------------------------------------------------- sda 0.00 15610.00 259.50 1805.50 1.01 68.14 68.58 26.07 12.23 0.48 100.00 random write --------------------------------------------------------------------------------------------------------- sda 1239.50 0.00 538.00 0.00 115.58 0.00 439.99 4.96 9.23 1.77 95.00 sequential read --------------------------------------------------------------------------------------------------------- sda 227.00 0.00 2452.00 0.00 94.26 0.00 78.73 1.76 0.72 0.32 78.50 random read 8 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 9. Linux on System z Performance Evaluation High Performance FICON  Advantages: – Provides a simpler link protocol than FICON – Is capable of sending multiple channel commands to the control unit in a single entity  Performance expectation: – Should improve performance of database OLTP workloads  Prerequisites: – z10 GA2 or newer – FICON Express2 or newer – DS8000 R4.1 or newer – DS8000 High Performance FICON feature – A Linux dasd driver that supports read/write track data and HPF • Included in SLES11 and future Linux distributions, service packs or updates) 9 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 10. Linux on System z Performance Evaluation High Performance FICON results (1)  For sequential write throughput improves with read/write track data up to 1.2x Throughput for sequential write [MB/s] 1200 1000 old dasd I/O 800 new read/write track data new HPF (includes 600 read/write track data) 400 200 0 1 2 4 8 16 32 64 number of processes 10 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 11. Linux on System z Performance Evaluation High Performance FICON results (2)  For sequential read throughput improves with read/write track data up to 1.6x Throughput for sequential read [MB/s] 2500 2000 old dasd I/O new read/write track 1500 data new HPF (includes read/write/track data) 1000 500 0 1 2 4 8 16 32 64 number of processes 11 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 12. Linux on System z Performance Evaluation High Performance FICON results (3)  With small record sizes, as e.g. used by databases, HPF throughput improves up to 2.2x versus old Linux channel programs Throughput for random reader [MB/s] 1000 900 800 old dasd I/O 700 new read/write track 600 data new HPF (includes 500 read/write track data) 400 300 200 100 0 1 2 4 8 16 32 64 number of processes 12 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 13. Linux on System z Performance Evaluation DS8000 Disk setup  Don't treat a storage server as a black box, understand its structure  Principles apply to other storage vendor products as well  Several conveniently selected disks instead of one single disk can speed up the sequential read/write performance to more than a triple. Use the logical volume manager to set up the disks.  Avoid using subsequent disk addresses in a storage server (e.g. the addresses 5100, 5101, 5102,.. in an IBM Storage Server), because – they use the same rank – they use the same device adapter.  If you ask for 16 disks and your system administrator gives you addresses 5100-510F – From a performance perspective this is close to the worst case 13 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 14. Linux on System z Performance Evaluation DS8000 Architecture ● structure is complex - disks are connected via two internal FCP switches for higher bandwidth ● the DS8000 is still divided into two parts named processor complex or just server - caches are organized per server ● one device adapter pair addresses 4 array sites ● one array site is build from 8 disks - disks are distributed over the front and rear storage enclosures - have the same color in the chart ● one RAID array is defined using one array site ● one rank is built using one RAID array ● ranks are assigned to an extent pool ● extent pools are assigned to one of the servers - this assigns also the caches ● the rules are the same as for ESS - one disk range resides in one extent pool 14 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 15. Linux on System z Performance Evaluation Rules for selecting disks  This makes it fast – Use as many paths as possible (CHPID -> host adapter) – Spread the host adapters used across all host adapter bays • For ECKD switching of the paths is done automatically • FCP needs a fixed relation between disk and path – Use Linux multipathing for load balancing – Select disks from as many ranks as possible! – Switch the rank for each new disk in a logical volume – Switch the ranks used between servers and device adapters – Avoid reusing the same resource (path, server, device adapter, and disk) as long as possible  Goal is to get a balanced load on all paths and physical disks  In addition striped Linux logical volumes and / or storage pool striping may help to increase the overall throghput. 15 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 16. Linux on System z Performance Evaluation Linux kernel components involved in disk I/O Application program Issuing reads and writes VFS dispatches requests to different devices and Virtual File System (VFS) Translates to sector addressing Logical Volume Manager (LVM) LVM defines the physical to logical device relation Multipath Multipath sets the multipath policies Device mapper (dm) dm holds the generic mapping of all block devices and Block device layer performs 1:n mapping for logical volumes and/or multipath Page cache Page cache contains all file I/O data, direct I/O bypasses the page cache I/O scheduler I/O schedulers merge, order and queue requests, start device drivers via (un)plug device Data transfer handling Direct Acces Storage Device driver Small Computer System Interface driver (SCSI) Device drivers z Fiber Channel Protocol driver (zFCP) Queued Direct I/O driver (qdio) 16 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 17. Linux on System z Performance Evaluation Disk I/O attachment and storage subsystem  Each disk is a configured volume in an extent pool (here extent pool = rank) Device ranks  An extent pool can be configured for Adapter either ECKD dasds or SCSI LUNs 1 ECKD  Host bus adapters connect internally to Server 0 both servers 2 SCSI Channel Host Bus path 1 Adapter 1 ECKD 3 Channel Host Bus 4 SCSI Switch path 2 Adapter 2 Channel Host Bus 5 ECKD path 3 Adapter 3 Server 1 6 SCSI Channel Host Bus path 4 Adapter 4 7 ECKD FICON Express 8 SCSI 17 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 18. Linux on System z Performance Evaluation I/O processing characteristics  FICON/ECKD: – 1:1 mapping host subchannel:dasd – Serialization of I/Os per subchannel – I/O request queue in Linux – Disk blocks are 4KB – High availability by FICON path groups – Load balancing by FICON path groups and Parallel Access Volumes  FCP/SCSI – Several I/Os can be issued against a LUN immediately – Queuing in the FICON Express card and/or in the storage server – Additional I/O request queue in Linux – Disk blocks are 512 bytes – High availability by Linux multipathing, type failover or multibus – Load balancing by Linux multipathing, type multibus 18 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 19. Linux on System z Performance Evaluation General layout for FICON/ECKD  The dasd driver starts the I/O on a subchannel Application program  Each subchannel connects to all channel paths in the path group  Each channel connects via a switch to a host bus adapter VFS  A host bus adapter connects to both servers  Each server connects to its ranks LVM Multipath DA ranks dm Block device layer Channel 1 subsystem Server 0 Page cache I/O scheduler a chpid 1 HBA 1 3 b chpid 2 HBA 2 dasd driver Switch c chpid 3 HBA 3 5 Server 1 d chpid 4 HBA 4 7 19 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 20. Linux on System z Performance Evaluation FICON/ECKD dasd I/O to a single disk  Assume that subchannel a corresponds to disk 2 in rank 1 Application program  The full choice of host adapters can be used  Only one I/O can be issued at a time through subchannel a VFS  All other I/Os need to be queued in the dasd driver and in the block device layer until the subchannel is no longer busy with the preceding I/O DA ranks Block device layer Channel 1 subsystem Server 0 Page cache I/O scheduler a chpid 1 HBA 1 3 chpid 2 HBA 2 dasd driver ! Switch chpid 3 HBA 3 5 Server 1 chpid 4 HBA 4 7 20 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 21. Linux on System z Performance Evaluation FICON/ECKD dasd I/O to a single disk with PAV (SLES10 / RHEL5)  VFS sees one device Application program  The device mapper sees the real device and all alias devices  Parallel Access Volumes solve the I/O queuing in front of the subchannel  Each alias device uses its own subchannel VFS  Additional processor cycles are spent to do the load balancing in the device mapper  The next slowdown is the fact that only one disk is used in the storage server. This implies the use of only one rank, one device adapter, one server Multipath dm ! DA ! ranks Block device layer Channel 1 subsystem Server 0 Page cache I/O scheduler a chpid 1 HBA 1 3 b chpid 2 HBA 2 dasd driver Switch c chpid 3 HBA 3 5 Server 1 d chpid 4 HBA 4 7 21 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 22. Linux on System z Performance Evaluation FICON/ECKD dasd I/O to a single disk with HyperPAV (SLES11)  VFS sees one device Application program  The dasd driver sees the real device and all alias devices  Load balancing with HyperPAV and static PAV is done in the dasd driver. The aliases need only to be added to Linux. The load balancing works better than on the device mapper layer. VFS  Less additional processor cycles are needed than with Linux multipath.  The next slowdown is the fact that only one disk is used in the storage server. This implies the use of only one rank, one device adapter, one server DA ! ranks Block device layer Channel 1 subsystem Server 0 Page cache I/O scheduler a chpid 1 HBA 1 3 b chpid 2 HBA 2 dasd driver Switch c chpid 3 HBA 3 5 Server 1 d chpid 4 HBA 4 7 22 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 23. Linux on System z Performance Evaluation FICON/ECKD dasd I/O to a linear or striped logical volume  VFS sees one device (logical volume) Application program  The device mapper sees the logical volume and the physical volumes  Additional processor cycles are spent to map the I/Os to the physical volumes.  Striped logical volumes require more additional processor cycles than linear logical volumes VFS  With a striped logical volume the I/Os can be well balanced over the entire storage server and overcome limitations from a single rank, a single device adapter or a single server  To ensure that I/O to one physical disk is not limited by one subchannel, PAV or HyperPAV should be used in combination with logical volumes LVM ! DA ranks dm Block device layer Channel 1 subsystem Server 0 Page cache I/O scheduler a chpid 1 HBA 1 3 b chpid 2 HBA 2 dasd driver Switch c chpid 3 HBA 3 5 Server 1 d chpid 4 HBA 4 7 23 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 24. Linux on System z Performance Evaluation FICON/ECKD dasd I/O to a storage pool striped volume with HyperPAV (SLES11)  A storage pool striped volume makes only sense in combination with PAV or HyperPAV Application program  VFS sees one device The dasd driver sees the real device and all alias devices   The storage pool striped volume spans over several ranks of one server and overcomes the limitations of a single rank and / or a single device adapter VFS  Storage pool striped volumes can also be used as physical disks for a logical volume to use both server sides.  To ensure that I/O to one dasd is not limited by one subchannel, PAV or HyperPAV should be used ! DA ranks Block device layer Channel 1 subsystem Server 0 Page cache I/O scheduler a chpid 1 HBA 1 extent pool 3 b chpid 2 HBA 2 dasd driver Switch c chpid 3 HBA 3 5 Server 1 d chpid 4 HBA 4 7 24 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 25. Linux on System z Performance Evaluation General layout for FCP/SCSI  The SCSI driver finalizes the I/O requests Application program  The zFCP driver adds the FCP protocol to the requests  The qdio driver transfers the I/O to the channel VFS  A host bus adapter connects to both servers  Each server connects to its ranks LVM Multipath DA ranks dm Block device layer Server 0 Page cache 2 I/O scheduler chpid 1 HBA 1 chpid 2 HBA 2 4 SCSI driver Switch zFCP driver chpid 3 HBA 3 qdio driver Server 1 6 chpid 4 HBA 4 8 25 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 26. Linux on System z Performance Evaluation FCP/SCSI LUN I/O to a single disk  Assume that disk 3 in rank 8 is reachable via channel 2 and host bus adapter 2 Application program  Up to 32 (default value) I/O requests can be sent out to disk 3 before the first completion is required  The throughput will be limited by the rank and / or the device adapter  There is no high availability provided for the connection between the host and the storage server VFS DA ranks Block device layer Server 0 Page cache 2 I/O scheduler chpid 1 HBA 1 chpid 2 HBA 2 4 SCSI driver Switch zFCP driver chpid 3 HBA 3 qdio driver ! Server 1 6 chpid 4 HBA 4 8 26 Linux on System z disk I/O performance ! © 2010 IBM Corporation
  • 27. Linux on System z Performance Evaluation FCP/SCSI LUN I/O to a single disk with multipathing  VFS sees one device Application program  The device mapper sees the multibus or failover alternatives to the same disk  Administrational effort is required to define all paths to one disk VFS  Additional processor cycles are spent to do the mapping to the desired path for the disk in the device mapper  Multibus requires more additonal processor cycles than failover Multipath DA ranks dm ! Block device layer Server 0 Page cache 2 I/O scheduler chpid 1 HBA 1 chpid 2 HBA 2 4 SCSI driver Switch zFCP driver chpid 3 HBA 3 qdio driver Server 1 6 chpid 4 HBA 4 8 27 Linux on System z disk I/O performance ! © 2010 IBM Corporation
  • 28. Linux on System z Performance Evaluation FCP/SCSI LUN I/O to a to a linear or striped logical volume  VFS sees one device (logical volume) Application program  The device mapper sees the logical volume and the physical volumes  Additional processor cycles are spent to do map the I/Os to the physical volumes.  Striped logical volumes require more additional processor cycles than linear logical volumes VFS  With a striped logical volume the I/Os can be well balanced over the entire storage server and overcome limitations from a single rank, a single device adapter or a single server  To ensure high availability the logical volume should be used in combination with multipathing LVM ! DA ranks dm Block device layer Server 0 Page cache 2 I/O scheduler chpid 1 HBA 1 chpid 2 HBA 2 4 SCSI driver Switch zFCP driver chpid 3 HBA 3 qdio driver Server 1 6 chpid 4 HBA 4 8 28 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 29. Linux on System z Performance Evaluation FCP/SCSI LUN I/O to a storage pool striped volume with multipathing  Storage pool striped volumes make no sense without high availability Application program  VFS sees one device  The device mapper sees the multibus or failover alternatives to the same disk The storage pool striped volume spans over several ranks of one server and overcomes the limitations VFS  of a single rank and / or a single device adapter  Storage pool striped volumes can also be used as physical disks for a logical volume to make use of both server sides. Multipath DA ranks dm ! Block device layer Server 0 Page cache 2 I/O scheduler chpid 1 HBA 1 chpid 2 HBA 2 4 SCSI driver Switch zFCP driver chpid 3 HBA 3 qdio driver Server 1 6 chpid 4 HBA 4 extent pool 8 29 Linux on System z disk I/O performance ! © 2010 IBM Corporation
  • 30. Linux on System z Performance Evaluation Summary, recommendations and outlook  FICON/ECKD – Storage pool striped disks (no disk placement) – HyperPAV (SLES11) – Large volume (RHEL5, SLES11) – High Performance FICON (SLES11)  FCP/SCSI – Linux LV with striping (disk placement) – Multipath with failover 30 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 31. Linux on System z Performance Evaluation Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Other product and service names might be trademarks of IBM or other companies. Source: If applicable, describe source origin 31 Linux on System z disk I/O performance © 2010 IBM Corporation