SlideShare a Scribd company logo
1 of 69
Download to read offline
PostgreSQL Portland
Performance Practice Project
    Database Test 2 (DBT-2)
    Characterizing Filesystems


         Mark Wong
    markwkm@postgresql.org

       Portland State University


           April 9, 2009
Review


  From last time:
         DBT-2 live demo.
    ◮


            __      __
           / ~~~/  . o O ( Questions? )
     ,----(      oo    )
    /      __      __/
   /|          ( |(
  ^    /___ / |
     |__|   |__|-quot;
This is not:
This is:
                                          A comprehensive Linux
      A collection of reason why      ◮
  ◮
                                          filesystem evaluation.
      you should test your own
      system.                             A filesystem tuning guide
                                      ◮
                                          (mkfs and mount options).
      A guide for how to
  ◮
      understand the i/o behavior         Representative of your
                                      ◮
      of your own system.                 system (unless you have the
                                          exact same hardware.)
      A model of simple i/o
  ◮
      behavior based on                   A test of reliability or
                                      ◮
      PostgreSQL                          durability.
Contents




        Criteria
    ◮

        Preconceptions
    ◮

        fio Parameters
    ◮

        Testing
    ◮

        Test Results
    ◮
Some hardware details

        HP DL380 G5
    ◮

        2 x Quad Core Intel(R) Xeon(R) E5405
    ◮

        HP 32GB Fully Buffered DIMM PC2-5300 8x4GB DR LP
    ◮
        Memory
        HP Smart Array P800 Controller
    ◮

        MSA70 with 25 x HP 72GB Hot Plug 2.5 SAS 15,000 rpm
    ◮
        Hard Drives

              __     __
             / ~~~/  . o O ( Thank you HP! )
       ,----(     oo    )
     /       __     __/
    /|          ( |(
   ^     /___ / |
       |__|   |__|-quot;
Criteria


   PostgreSQL dictates the following (even though some of them are
   actually configurable:
         8KB block size
     ◮

         No use of posix fadvise
     ◮

         No use of direct i/o
     ◮

         No use of async i/o
     ◮

         Uses processes, not threads.
     ◮

   Based on the hardware we determine these criteria:
         64GB data set - To minimize any caching in the 32GB of
     ◮
         system memory.
         8 processes performing i/o - One fio process per core.
     ◮
Linux Filesystems Tested
          ext2
      ◮

          ext3
      ◮

          ext4
      ◮

          jfs
      ◮

          reiserfs
      ◮

          xfs
      ◮

              a8888b. . o O ( What is your favorite filesystem? )
             d888888b.
             8Pquot;YPquot;Y88
             8|o||o|88
             8’     .88
             8‘._.’ Y8.
           d/        ‘8b.
         .dP      .     Y8b.
       d8:’       quot;  ‘::88b.
      d8quot;               ‘Y88b
     :8P        ’        :888
      8a.       :       _a88P
   ._/quot;Yaa_ :        .| 88P|
         YPquot;        ‘| 8P ‘.
   /       ._____.d|        .’
   ‘--..__)888888P‘._.’
Hardware RAID Configurations

        RAID 0
    ◮

        RAID 1
    ◮

        RAID 1+0
    ◮

        RAID 5
    ◮

        RAID 6
    ◮


             __     __
            / ~~~/  . o O ( Which RAID would you use? )
      ,----(     oo    )
    /       __     __/
   /|          ( |(
  ^     /___ / |
      |__|   |__|-quot;
Preconceptions




              __     __
             / ~~~/  . o O ( Everyone gets to participate. )
       ,----(     oo    )
     /       __     __/
    /|          ( |(
   ^     /___ / |
       |__|   |__|-quot;
__     __
          / ~~~/  . o O ( atime slows us down a lot. )
    ,----(     oo    )
  /       __     __/
 /|          ( |(
^     /___ / |
    |__|   |__|-quot;
__     __
          / ~~~/  . o O ( Partition alignment is huge. )
    ,----(     oo    )
  /       __     __/
 /|          ( |(
^     /___ / |
    |__|   |__|-quot;
__     __      /                            
          / ~~~/  . o O | Doubling your disks doubles |
    ,----(     oo    )    | your performance.           |
  /       __     __/                                 /
 /|          ( |(
^     /___ / |
    |__|   |__|-quot;
__     __
          / ~~~/  . o O ( RAID-5 is awful at writes. )
    ,----(     oo    )
  /       __     __/
 /|          ( |(
^     /___ / |
    |__|   |__|-quot;
__     __
          / ~~~/  . o O ( RAID-6 improves on RAID-5. )
    ,----(     oo    )
  /       __     __/
 /|          ( |(
^     /___ / |
    |__|   |__|-quot;
__     __      /                           
          / ~~~/  . o O | Journaling filesystems are |
    ,----(     oo    )    | the slowest.                |
  /       __     __/                                /
 /|          ( |(
^     /___ / |
    |__|   |__|-quot;
fio1




      Flexible I/O by Jens Axboe.
             Sequential Read
        ◮

             Sequential Write
        ◮

             Random Read
        ◮

             Random Write
        ◮

             Random Read/Write
        ◮




        1
            http://brick.kernel.dk/snaps/
Sequential Read


   [seq-read]
   rw=read
   size=8g
   directory=/test
   fadvise_hint=0
   blocksize=8k
   direct=0
   numjobs=8
   nrfiles=1
   runtime=1h
   exec_prerun=echo 3 > /proc/sys/vm/drop_caches
Sequential Write


   [seq-write]
   rw=write
   size=8g
   directory=/test
   fadvise_hint=0
   blocksize=8k
   direct=0
   numjobs=8
   nrfiles=1
   runtime=1h
   exec_prerun=echo 3 > /proc/sys/vm/drop_caches
Random Read


  [random-read]
  rw=randread
  size=8g
  directory=/test
  fadvise_hint=0
  blocksize=8k
  direct=0
  numjobs=8
  nrfiles=1
  runtime=1h
  exec_prerun=echo 3 > /proc/sys/vm/drop_caches
Random Write


  [random-write]
  rw=randwrite
  size=8g
  directory=/test
  fadvise_hint=0
  blocksize=8k
  direct=0
  numjobs=8
  nrfiles=1
  runtime=1h
  exec_prerun=echo 3 > /proc/sys/vm/drop_caches
Random Read/Write


  [read-write]
  rw=rw
  rwmixread=50
  size=8g
  directory=/test
  fadvise_hint=0
  blocksize=8k
  direct=0
  numjobs=8
  nrfiles=1
  runtime=1h
  exec_prerun=echo 3 > /proc/sys/vm/drop_caches
Versions Tested




   Linux 2.6.25-gentoo-r6
   fio v1.22
   and
   Linux 2.6.28-gentoo
   fio v1.23
__     __
          / ~~~/  . o O ( Results! )
    ,----(     oo    )
  /       __     __/
 /|          ( |(
^     /___ / |
    |__|   |__|-quot;


There is more information than what is in this presentation, raw data is
at: http://207.173.203.223/~markwkm/community6/fio/2.6.28/
For example other things of interest may be:
      Number of i/o operations.
  ◮

      Processor utilization.
  ◮
A few words about statistics...




   Not enough data to calculate good statistics.
   Examine the following charts for a feel of how reliable the results
   may be.
Random Read with Linux 2.6.28-gentoo ext2 on 1 Disk
Random Write with Linux 2.6.28-gentoo ext2 on 1 Disk
Read/Write Mix with Linux 2.6.28-gentoo ext2 on 1 Disk
Sequential Read with Linux 2.6.28-gentoo ext2 on 1 Disk
Sequential Write with Linux 2.6.28-gentoo ext2 on 1 Disk
Results for all the filesystems when using only a single disk...
(If you see missing data, it means the test didn’t complete.)
Results for all the filesystems when striping across four disks...
Let’s look at that step by step for RAID 0.
Note trends might not be the same when adding disks to other
RAID configurations.
What benefits are there in mirroring?
Should have same write performance as a single disk.
◮

    May be able to take advantage of using two disks for reading.
◮
What is the best use of 4 disks?
Is RAID-6 and improvement over RAID-5?2
       RAID 5 (striped disks with parity) combines three or more
  ◮
       disks in a way that protects data against loss of any one disk;
       the storage capacity of the array is reduced by one disk.
       RAID 6 (striped disks with dual parity) (less common) can
  ◮
       recover from the loss of two disks.
Do the answers depend on what filesystem is used?




  2
      http://en.wikipedia.org/wiki/RAID
Filesystem Readahead



   Values are in number of 512-byte segments, set per device.
   To get the current readahead size:

   $ blockdev --getra /dev/sda
   256


   To set the readahead size to 8 MB:

   $ blockdev --setra 16384 /dev/sda
What does this all mean for my database system?




   It depends... Let’s look at one sample of just using difference
   filesystems.
Raw data for comparing filesystems




   Initial inspection of results suggests we are close to being bound by
   processor and storage performance:
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext2.1500.2/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext3.1500.2/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext4.1500.2/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/jfs.1500.2/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/reiserfs.1500.2/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/xfs.1500.2/
Linux I/O Elevator Algorithm



   To get current algorithm in []’s:

   $ cat /sys/block/sda/queue/scheduler
   noop anticipatory [deadline] cfq


   To set a different algorithm:

   $ echo cfg > /sys/block/sda/queue/scheduler
   noop anticipatory [deadline] cfq
Raw data for comparing I/O elevator algorithms




   Initial inspection of results suggests we are close to being bound by
   processor and storage performance:
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-elevator/ext2.anticipatory.1/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-elevator/ext2.cfs.1/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-elevator/ext2.deadline.1/
     ◮   http://207.173.203.223/~ markwkm/community6/dbt2/m-elevator/ext2.noop.1/
Why use deadline?3



          Attention! Database servers, especially those using ”TCQ”
    ◮
          disks should investigate performance with the ’deadline’ IO
          scheduler. Any system with high disk performance
          requirements should do so, in fact.
          Also, users with hardware RAID controllers, doing striping,
    ◮
          may find highly variable performance results with using the
          as-iosched. The as-iosched anticipatory implementation is
          based on the notion that a disk device has only one physical
          seeking head. A striped RAID controller actually has a head
          for each physical device in the logical RAID device.




     3
         Linux source code: Documentation/block/as-iosched.txt
Bonus Data - Preview of FreeBSD
                ,         ,
               /(         )‘
                ___   /|
               /- _ ‘-/ ’
              (//      /
              //    |‘       
              OO    )/       |
              ‘-^--’‘<       ’
             (_.) _ )      /
              ‘.___/‘     /
                ‘-----’ /
   <----.     __ / __   
   <----|====O)))==) ) /====
   <----’    ‘--’ ‘.__,’ 
                |         |
                         /     /
            ______( (_ / ______/
          ,’ ,-----’    |
          ‘--{__________)
Final Notes




        You have to test your own system to see what it can do.
    ◮
Materials Are Freely Available - In a new location!




   PDF
         http://www.slideshare.net/markwkm
     ◮

   LTEX Beamer (source)
   A
     ◮ http://git.postgresql.org/git/performance-tuning.git
Time and Location




   When: 2nd Thursday of the month
   Location: Portland State University
   Room: FAB 86-01 (Fourth Avenue Building)
   Map: http://www.pdx.edu/map.html
Coming up next time. . . Effects of tuning PostgreSQL
Parameters



             __      __
            / ~~~/  . o O ( Thank you! )
      ,----(      oo    )
     /      __      __/
    /|          ( |(
   ^    /___ / |
      |__|   |__|-quot;
Acknowledgements



  Haley Jane Wakenshaw

            __      __
           / ~~~/ 
     ,----(      oo    )
    /      __      __/
   /|          ( |(
  ^    /___ / |
     |__|   |__|-quot;
License




   This work is licensed under a Creative Commons Attribution 3.0
   Unported License. To view a copy of this license, (a) visit
   http://creativecommons.org/licenses/by/3.0/us/; or, (b)
   send a letter to Creative Commons, 171 2nd Street, Suite 300, San
   Francisco, California, 94105, USA.

More Related Content

What's hot

Performance Whack A Mole
Performance Whack A MolePerformance Whack A Mole
Performance Whack A Mole
oscon2007
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
Jack Levin
 
PostgreSQL Disaster Recovery with Barman
PostgreSQL Disaster Recovery with BarmanPostgreSQL Disaster Recovery with Barman
PostgreSQL Disaster Recovery with Barman
Gabriele Bartolini
 

What's hot (20)

PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and Benchmarks
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
 
Performance Whack A Mole
Performance Whack A MolePerformance Whack A Mole
Performance Whack A Mole
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Geographically Distributed PostgreSQL
Geographically Distributed PostgreSQLGeographically Distributed PostgreSQL
Geographically Distributed PostgreSQL
 
HBase operations
HBase operationsHBase operations
HBase operations
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.final
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
PostgreSQL and Linux Containers
PostgreSQL and Linux ContainersPostgreSQL and Linux Containers
PostgreSQL and Linux Containers
 
PostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverPostgreSQL Scaling And Failover
PostgreSQL Scaling And Failover
 
Deep Dive into RDS PostgreSQL Universe
Deep Dive into RDS PostgreSQL UniverseDeep Dive into RDS PostgreSQL Universe
Deep Dive into RDS PostgreSQL Universe
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
Best Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisBest Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on Solaris
 
PostgreSQL Disaster Recovery with Barman
PostgreSQL Disaster Recovery with BarmanPostgreSQL Disaster Recovery with Barman
PostgreSQL Disaster Recovery with Barman
 
The Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQLThe Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQL
 
HDFS NameNode High Availability
HDFS NameNode High AvailabilityHDFS NameNode High Availability
HDFS NameNode High Availability
 
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMsGlobal Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
 
Performance Whackamole (short version)
Performance Whackamole (short version)Performance Whackamole (short version)
Performance Whackamole (short version)
 
Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016
 
My experience with embedding PostgreSQL
 My experience with embedding PostgreSQL My experience with embedding PostgreSQL
My experience with embedding PostgreSQL
 

Viewers also liked

Researching postgresql
Researching postgresqlResearching postgresql
Researching postgresql
Fernando Ike
 

Viewers also liked (9)

Managing Postgres with Ansible
Managing Postgres with AnsibleManaging Postgres with Ansible
Managing Postgres with Ansible
 
PGDay India 2016
PGDay India 2016PGDay India 2016
PGDay India 2016
 
PostgreSQL Conference: West 08
PostgreSQL Conference: West 08PostgreSQL Conference: West 08
PostgreSQL Conference: West 08
 
5 Tips to Simplify the Management of Your Postgres Database
5 Tips to Simplify the Management of Your Postgres Database5 Tips to Simplify the Management of Your Postgres Database
5 Tips to Simplify the Management of Your Postgres Database
 
Enterprise grade deployment and security with PostgreSQL
Enterprise grade deployment and security with PostgreSQLEnterprise grade deployment and security with PostgreSQL
Enterprise grade deployment and security with PostgreSQL
 
Researching postgresql
Researching postgresqlResearching postgresql
Researching postgresql
 
Security Best Practices for your Postgres Deployment
Security Best Practices for your Postgres DeploymentSecurity Best Practices for your Postgres Deployment
Security Best Practices for your Postgres Deployment
 
24/7 Monitoring and Alerting of PostgreSQL
24/7 Monitoring and Alerting of PostgreSQL24/7 Monitoring and Alerting of PostgreSQL
24/7 Monitoring and Alerting of PostgreSQL
 
Pitr Made Easy
Pitr Made EasyPitr Made Easy
Pitr Made Easy
 

Similar to PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization

Systems Automation with Puppet
Systems Automation with PuppetSystems Automation with Puppet
Systems Automation with Puppet
elliando dias
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
Command Prompt., Inc
 

Similar to PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization (20)

Filesystem Performance from a Database Perspective
Filesystem Performance from a Database PerspectiveFilesystem Performance from a Database Perspective
Filesystem Performance from a Database Perspective
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoPostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
 
Measuring Firebird Disk I/O
Measuring Firebird Disk I/OMeasuring Firebird Disk I/O
Measuring Firebird Disk I/O
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Systems Automation with Puppet
Systems Automation with PuppetSystems Automation with Puppet
Systems Automation with Puppet
 
Introduction to clarity
Introduction to clarityIntroduction to clarity
Introduction to clarity
 
Os Wilhelm
Os WilhelmOs Wilhelm
Os Wilhelm
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Inspection and maintenance tools (Linux / OpenStack)
Inspection and maintenance tools (Linux / OpenStack)Inspection and maintenance tools (Linux / OpenStack)
Inspection and maintenance tools (Linux / OpenStack)
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuilding
 
Augeas
AugeasAugeas
Augeas
 
HoneyPy & HoneyDB (TriPython)
HoneyPy & HoneyDB (TriPython)HoneyPy & HoneyDB (TriPython)
HoneyPy & HoneyDB (TriPython)
 
Adventures in infrastructure as code
Adventures in infrastructure as codeAdventures in infrastructure as code
Adventures in infrastructure as code
 
Database Hardware Benchmarking
Database Hardware BenchmarkingDatabase Hardware Benchmarking
Database Hardware Benchmarking
 
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan HoracekOpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
 

More from Mark Wong

OHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 MockumentaryOHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 Mockumentary
Mark Wong
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
Mark Wong
 

More from Mark Wong (15)

OHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 MockumentaryOHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 Mockumentary
 
OHAI, my name is Chelnik! Postgres Open 2013 Report
OHAI, my name is Chelnik! Postgres Open 2013 ReportOHAI, my name is Chelnik! Postgres Open 2013 Report
OHAI, my name is Chelnik! Postgres Open 2013 Report
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQL
 
Android & PostgreSQL
Android & PostgreSQLAndroid & PostgreSQL
Android & PostgreSQL
 
PGTop for Android: Things I learned making this app
PGTop for Android: Things I learned making this appPGTop for Android: Things I learned making this app
PGTop for Android: Things I learned making this app
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
Developing PGTop for Android
Developing PGTop for AndroidDeveloping PGTop for Android
Developing PGTop for Android
 
Pg in-the-brazilian-armed-forces-presentation
Pg in-the-brazilian-armed-forces-presentationPg in-the-brazilian-armed-forces-presentation
Pg in-the-brazilian-armed-forces-presentation
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
PostgreSQL Portland Performance Practice Project - Database Test 2 TuningPostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
PostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
PostgreSQL Portland Performance Practice Project - Database Test 2 BackgroundPostgreSQL Portland Performance Practice Project - Database Test 2 Background
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
 
pg_top is 'top' for PostgreSQL: pg_top + pg_proctab
pg_top is 'top' for PostgreSQL: pg_top + pg_proctabpg_top is 'top' for PostgreSQL: pg_top + pg_proctab
pg_top is 'top' for PostgreSQL: pg_top + pg_proctab
 
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreLinux Filesystems, RAID, and more
Linux Filesystems, RAID, and more
 
pg_top is 'top' for PostgreSQL
pg_top is 'top' for PostgreSQLpg_top is 'top' for PostgreSQL
pg_top is 'top' for PostgreSQL
 
What Is Going On?
What Is Going On?What Is Going On?
What Is Going On?
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization

  • 1. PostgreSQL Portland Performance Practice Project Database Test 2 (DBT-2) Characterizing Filesystems Mark Wong markwkm@postgresql.org Portland State University April 9, 2009
  • 2. Review From last time: DBT-2 live demo. ◮ __ __ / ~~~/ . o O ( Questions? ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 3. This is not: This is: A comprehensive Linux A collection of reason why ◮ ◮ filesystem evaluation. you should test your own system. A filesystem tuning guide ◮ (mkfs and mount options). A guide for how to ◮ understand the i/o behavior Representative of your ◮ of your own system. system (unless you have the exact same hardware.) A model of simple i/o ◮ behavior based on A test of reliability or ◮ PostgreSQL durability.
  • 4. Contents Criteria ◮ Preconceptions ◮ fio Parameters ◮ Testing ◮ Test Results ◮
  • 5. Some hardware details HP DL380 G5 ◮ 2 x Quad Core Intel(R) Xeon(R) E5405 ◮ HP 32GB Fully Buffered DIMM PC2-5300 8x4GB DR LP ◮ Memory HP Smart Array P800 Controller ◮ MSA70 with 25 x HP 72GB Hot Plug 2.5 SAS 15,000 rpm ◮ Hard Drives __ __ / ~~~/ . o O ( Thank you HP! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 6. Criteria PostgreSQL dictates the following (even though some of them are actually configurable: 8KB block size ◮ No use of posix fadvise ◮ No use of direct i/o ◮ No use of async i/o ◮ Uses processes, not threads. ◮ Based on the hardware we determine these criteria: 64GB data set - To minimize any caching in the 32GB of ◮ system memory. 8 processes performing i/o - One fio process per core. ◮
  • 7. Linux Filesystems Tested ext2 ◮ ext3 ◮ ext4 ◮ jfs ◮ reiserfs ◮ xfs ◮ a8888b. . o O ( What is your favorite filesystem? ) d888888b. 8Pquot;YPquot;Y88 8|o||o|88 8’ .88 8‘._.’ Y8. d/ ‘8b. .dP . Y8b. d8:’ quot; ‘::88b. d8quot; ‘Y88b :8P ’ :888 8a. : _a88P ._/quot;Yaa_ : .| 88P| YPquot; ‘| 8P ‘. / ._____.d| .’ ‘--..__)888888P‘._.’
  • 8. Hardware RAID Configurations RAID 0 ◮ RAID 1 ◮ RAID 1+0 ◮ RAID 5 ◮ RAID 6 ◮ __ __ / ~~~/ . o O ( Which RAID would you use? ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 9. Preconceptions __ __ / ~~~/ . o O ( Everyone gets to participate. ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 10. __ __ / ~~~/ . o O ( atime slows us down a lot. ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 11. __ __ / ~~~/ . o O ( Partition alignment is huge. ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 12. __ __ / / ~~~/ . o O | Doubling your disks doubles | ,----( oo ) | your performance. | / __ __/ / /| ( |( ^ /___ / | |__| |__|-quot;
  • 13. __ __ / ~~~/ . o O ( RAID-5 is awful at writes. ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 14. __ __ / ~~~/ . o O ( RAID-6 improves on RAID-5. ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 15. __ __ / / ~~~/ . o O | Journaling filesystems are | ,----( oo ) | the slowest. | / __ __/ / /| ( |( ^ /___ / | |__| |__|-quot;
  • 16. fio1 Flexible I/O by Jens Axboe. Sequential Read ◮ Sequential Write ◮ Random Read ◮ Random Write ◮ Random Read/Write ◮ 1 http://brick.kernel.dk/snaps/
  • 17. Sequential Read [seq-read] rw=read size=8g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  • 18. Sequential Write [seq-write] rw=write size=8g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  • 19. Random Read [random-read] rw=randread size=8g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  • 20. Random Write [random-write] rw=randwrite size=8g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  • 21. Random Read/Write [read-write] rw=rw rwmixread=50 size=8g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  • 22. Versions Tested Linux 2.6.25-gentoo-r6 fio v1.22 and Linux 2.6.28-gentoo fio v1.23
  • 23. __ __ / ~~~/ . o O ( Results! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot; There is more information than what is in this presentation, raw data is at: http://207.173.203.223/~markwkm/community6/fio/2.6.28/ For example other things of interest may be: Number of i/o operations. ◮ Processor utilization. ◮
  • 24. A few words about statistics... Not enough data to calculate good statistics. Examine the following charts for a feel of how reliable the results may be.
  • 25. Random Read with Linux 2.6.28-gentoo ext2 on 1 Disk
  • 26. Random Write with Linux 2.6.28-gentoo ext2 on 1 Disk
  • 27. Read/Write Mix with Linux 2.6.28-gentoo ext2 on 1 Disk
  • 28. Sequential Read with Linux 2.6.28-gentoo ext2 on 1 Disk
  • 29. Sequential Write with Linux 2.6.28-gentoo ext2 on 1 Disk
  • 30. Results for all the filesystems when using only a single disk... (If you see missing data, it means the test didn’t complete.)
  • 31.
  • 32. Results for all the filesystems when striping across four disks...
  • 33.
  • 34. Let’s look at that step by step for RAID 0.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. Note trends might not be the same when adding disks to other RAID configurations.
  • 42. What benefits are there in mirroring?
  • 43.
  • 44. Should have same write performance as a single disk. ◮ May be able to take advantage of using two disks for reading. ◮
  • 45. What is the best use of 4 disks? Is RAID-6 and improvement over RAID-5?2 RAID 5 (striped disks with parity) combines three or more ◮ disks in a way that protects data against loss of any one disk; the storage capacity of the array is reduced by one disk. RAID 6 (striped disks with dual parity) (less common) can ◮ recover from the loss of two disks. Do the answers depend on what filesystem is used? 2 http://en.wikipedia.org/wiki/RAID
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52. Filesystem Readahead Values are in number of 512-byte segments, set per device. To get the current readahead size: $ blockdev --getra /dev/sda 256 To set the readahead size to 8 MB: $ blockdev --setra 16384 /dev/sda
  • 53.
  • 54. What does this all mean for my database system? It depends... Let’s look at one sample of just using difference filesystems.
  • 55.
  • 56. Raw data for comparing filesystems Initial inspection of results suggests we are close to being bound by processor and storage performance: ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext2.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext3.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext4.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/jfs.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/reiserfs.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/xfs.1500.2/
  • 57. Linux I/O Elevator Algorithm To get current algorithm in []’s: $ cat /sys/block/sda/queue/scheduler noop anticipatory [deadline] cfq To set a different algorithm: $ echo cfg > /sys/block/sda/queue/scheduler noop anticipatory [deadline] cfq
  • 58.
  • 59. Raw data for comparing I/O elevator algorithms Initial inspection of results suggests we are close to being bound by processor and storage performance: ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-elevator/ext2.anticipatory.1/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-elevator/ext2.cfs.1/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-elevator/ext2.deadline.1/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-elevator/ext2.noop.1/
  • 60. Why use deadline?3 Attention! Database servers, especially those using ”TCQ” ◮ disks should investigate performance with the ’deadline’ IO scheduler. Any system with high disk performance requirements should do so, in fact. Also, users with hardware RAID controllers, doing striping, ◮ may find highly variable performance results with using the as-iosched. The as-iosched anticipatory implementation is based on the notion that a disk device has only one physical seeking head. A striped RAID controller actually has a head for each physical device in the logical RAID device. 3 Linux source code: Documentation/block/as-iosched.txt
  • 61. Bonus Data - Preview of FreeBSD , , /( )‘ ___ /| /- _ ‘-/ ’ (// / // |‘ OO )/ | ‘-^--’‘< ’ (_.) _ ) / ‘.___/‘ / ‘-----’ / <----. __ / __ <----|====O)))==) ) /==== <----’ ‘--’ ‘.__,’ | | / / ______( (_ / ______/ ,’ ,-----’ | ‘--{__________)
  • 62.
  • 63.
  • 64. Final Notes You have to test your own system to see what it can do. ◮
  • 65. Materials Are Freely Available - In a new location! PDF http://www.slideshare.net/markwkm ◮ LTEX Beamer (source) A ◮ http://git.postgresql.org/git/performance-tuning.git
  • 66. Time and Location When: 2nd Thursday of the month Location: Portland State University Room: FAB 86-01 (Fourth Avenue Building) Map: http://www.pdx.edu/map.html
  • 67. Coming up next time. . . Effects of tuning PostgreSQL Parameters __ __ / ~~~/ . o O ( Thank you! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 68. Acknowledgements Haley Jane Wakenshaw __ __ / ~~~/ ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  • 69. License This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, (a) visit http://creativecommons.org/licenses/by/3.0/us/; or, (b) send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA.