SlideShare uma empresa Scribd logo
1 de 94
Baixar para ler offline
1
    3                1
                     5
            1
            4
1
2
    Five Steps
    to PostgreSQL
1
1      Performance
                       Josh Berkus
                PostgreSQL Project
                     MelPUG 2013
1
                  3                   1
                                      5
postgresql.conf
                          1
                          4          Hardware
      1
      2                    OS &
                        Filesystem
           Query
           Tuning


    1
    1     Application
           Design
0. Getting Outfitted
5 Layer Cake


   Queries     Transactions             Application


   Drivers     Connections    Caching   Middleware

   Schema        Config                 PostgreSQL


  Filesystem     Kernel            Operating System


   Storage      RAM/CPU       Network     Hardware
5 Layer Cake


   Queries     Transactions             Application


   Drivers     Connections    Caching   Middleware

   Schema        Config                 PostgreSQL


  Filesystem     Kernel            Operating System


   Storage      RAM/CPU       Network     Hardware
Scalability Funnel


             Application

             Middleware


             PostgreSQL

                 OS


                 HW
What Flavor is Your DB?                 O
                                        1
W   ►Web Application (Web)
     ●DB smaller than RAM
     ●90% or more “one-liner” queries
What Flavor is Your DB?               O
                                      1
O   ►Online Transaction Processing
     (OLTP)
     ●DB slightly larger than RAM to 1TB
     ●20-70% small data write queries,
      some large transactions
What Flavor is Your DB?             O
                                    1
D   ►Data Warehousing (DW)
     ●Large to huge databases (100GB to
      100TB)
     ●Large complex reporting queries
     ●Large bulk loads of data
     ●Also called "Decision Support" or
      "Business Intelligence"
Tips for Good Form               O
                                 1
►Engineer for the problems you have
 ●not for the ones you don't
Tips for Good Form                     O
                                       1
►A little overallocation is cheaper than
 downtime
  ●unless you're an OEM, don't stint a few
   GB
  ●resource use will grow over time
Tips for Good Form                   O
                                     1
►Test, Tune, and Test Again
 ●you can't measure performance by “it
  seems fast”
Tips for Good Form                        O
                                          1
►Most server performance is
 thresholded
 ●“slow” usually means “25x slower”
 ●it's not how fast it is, it's how close you
  are to capacity
1   Application
     Design
Schema Design                             1
                                          1
►Table design
 ●do not optimize prematurely
   ▬normalize your tables and wait for a proven
    issue to denormalize
   ▬Postgres is designed to perform well with

    normalized tables
 ●Entity-Attribute-Value tables and other
  innovative designs tend to perform poorly
Schema Design                             1
                                          1
►Table design
 ●consider using natural keys
   ▬can cut down on the number of joins you
    need
 ●BLOBs can be slow
   ▬have to be completely rewritten,
    compressed
   ▬can also be fast, thanks to compression
Schema Design                              1
                                           1
►Table design
 ●think of when data needs to be updated,
  as well as read
   ▬sometimes you need to split tables which
    will be updated at different times
   ▬don't trap yourself into updating the same

    rows multiple times
Schema Design                         1
                                      1
►Indexing
 ●index most foreign keys
 ●index common WHERE criteria
 ●index common aggregated columns
 ●learn to use special index types:
  expressions, full text, partial
Schema Design                            1
                                         1
►Not Indexing
 ●indexes cost you on updates, deletes
   ▬especially with HOT
 ●too many indexes can confuse the
  planner
 ●don't index: tiny tables, low-cardinality
  columns
Right indexes?                         1
                                       1
►pg_stat_user_indexes
 ●shows indexes not being used
 ●note that it doesn't record unique index
  usage
►pg_stat_user_tables
 ●shows seq scans: index candidates?
 ●shows heavy update/delete tables: index
  less
Partitioning                              1
                                          1
►Partition large or growing tables
  ●historical data
    ▬data will be purged
    ▬massive deletes are server-killers

  ●very large tables
    ▬anything over 10GB / 10m rows
    ▬partition by active/passive
Partitioning                             1
                                         1
►Application must be partition-compliant
  ●every query should call the partition key
  ●pre-create your partitions
    ▬do not create them on demand … they will
     lock
Query design                         1
                                     1
►Do more with each query
 ●PostgreSQL does well with fewer larger
  queries
 ●not as well with many small queries
 ●avoid doing joins, tree-walking in
  middleware
Query design                        1
                                    1
►Do more with each transaction
 ●batch related writes into large
  transactions
Query design                           1
                                       1
►Know the query gotchas (per version)
 ●Always try rewriting subqueries as joins
 ●try swapping NOT IN and NOT EXISTS
  for bad queries
 ●try to make sure that index/key types
  match
 ●avoid unanchored text searches "ILIKE
  '%josh%'"
But I use ORM!                         1
                                       1
►ORM != high performance
 ●ORM is for fast development, not fast
  databases
 ●make sure your ORM allows "tweaking"
  queries
 ●applications which are pushing the limits
  of performance probably can't use ORM
   ▬but most don't have a problem
It's All About Caching                 1
                                       1
►Use prepared queries    W O

 ●whenever you have repetitive loops
It's All About Caching                       1
                                             1
►Cache, cache everywhere               W O

 ●plan caching: on the PostgreSQL server
 ●parse caching: in some drivers
 ●data caching:
   ▬in the appserver
   ▬in memcached/varnish/nginx

   ▬in the client (javascript, etc.)

 ●use as many kinds of caching as you can
It's All About Caching              1
                                    1
But …
►think carefully about cache invalidation
  ●and avoid “cache storms”
Connection Management               1
                                    1
►Connections take resources   W O

 ●RAM, CPU
 ●transaction checking
Connection Management                    1
                                         1
►Make sure you're only using       W O
 connections you need
 ●look for “<IDLE>” and “<IDLE> in
  Transaction”
 ●log and check for a pattern of connection
  growth
 ●make sure that database and appserver
  timeouts are synchronized
Pooling                              1
                                     1
►Over 100 connections? You need
 pooling!

   Webserver

   Webserver   Pool     PostgreSQL

   Webserver
Pooling                              1
                                     1
►New connections are expensive
 ●use persistent connections or connection
  pooling sofware
   ▬appservers
   ▬pgBouncer

   ▬pgPool (sort of)

 ●set pool side to maximum connections
  needed
2
1
    Query
    Tuning
Bad Queries                                                                                                                     1
                                                                                                                                2
                                   Ranked Query Execution Times

                   5000




                   4000




                   3000
  execution time




                   2000




                   1000




                      0
                          5   10   15   20   25   30   35   40   45   50    55    60   65   70   75   80   85   90   95   100
                                                                           % ranking
Optimize Your Queries
                                          1
                                          2
in Test
►Before you go production
 ●simulate user load on the application
 ●monitor and fix slow queries
 ●look for worst procedures
Optimize Your Queries
                                        1
                                        2
in Test
►Look for “bad queries”
 ●queries which take too long
 ●data updates which never complete
 ●long-running stored procedures
 ●interfaces issuing too many queries
 ●queries which block
Finding bad queries            1
                               2

               ►Log Analysis
                 ●dozens of logging
                  options
                 ●log_min_duration_
                  statement
                 ●pgfouine
                 ●pgBadger
Fixing bad queries         1
                           2
►EXPLAIN ANALYZE
►things to look for:
 ●bad rowcount estimates
 ●sequential scans
 ●high-count loops
 ●large on-disk sorts
Fixing bad queries                      1
                                        2
►reading explain analyze is an art
  ●it's an inverted tree
  ●look for the deepest level at which the
   problem occurs
►try re-writing complex queries several
 ways
Query Optimization Cycle                    1
                                             2
    log queries               run pgbadger



                                   explain analyze
apply fixes                        worst queries



                  troubleshoot
                  worst queries
Query Optimization Cycle
                                        1
                                        2
 (new)
          check pg_stat_statements




                               explain analyze
apply fixes                    worst queries



              troubleshoot
              worst queries
Procedure Optimization
                                           1
                                           2
 Cycle
    log queries            run pg_fouine




                                  instrument
apply fixes                       worst
                                  functions


                  find slow
                  operations
Procedure Optimization
                                         1
                                         2
 Cycle (new)
          check pg_stat_function




                                   instrument
apply fixes                        worst
                                   functions

                find slow
                operations
3
                  1
postgresql.conf
max_connections                    3
                                   1
►As many as you need to use
  ●web apps: 100 to 300   W O
                           D
  ●analytics: 20 to 40
►If you need more than 100 regularly,
 use a connection pooler
  ●like pgbouncer
shared_buffers                          3
                                        1
►1/4 of RAM on a dedicated server
                                  W O
 ●not more than 8GB (test)
 ●cache_miss statistics can tell you if you
  need more
►less buffers to preserve cache space
                                        D
Other memory parameters                   3
                                          1
►work_mem
 ●non-shared
   ▬lower it for many connections   W O

   ▬raise it for large queries D

 ●watch for signs of misallocation
   ▬swapping RAM: too much work_mem
   ▬log temp files: not enough work_mem

 ●probably better to allocate by task/ROLE
Other memory parameters                     3
                                            1
►maintenance_work_mem
 ●the faster vacuum completes, the better
   ▬but watch out for multiple autovacuum
    workers!
 ●raise to 256MB to 1GB for large
  databases
 ●also used for index creation
   ▬raise it for bulk loads
Other memory parameters             3
                                    1
►temp_buffers
 ●max size of temp tables before swapping
  to disk
 ●raise if you use lots of temp tables D
►wal_buffers
 ●raise it to 32MB
Commits                               3
                                      1
►checkpoint_segments
 ●more if you have the disk: 16, 64, 128
►synchronous_commit
 ●response time more important than data
  integrity?
 ●turn synchronous_commit = off W
 ●lose a finite amount of data in a
  shutdown
Query tuning                             3
                                         1
►effective_cache_size
  ●RAM available for queries
  ●set it to 3/4 of your available RAM
►default_statistics_target                   D
  ●raise to 200 to 1000 for large databases
  ●now defaults to 100
  ●setting statistics per column is better
Query tuning                           3
                                       1
►effective_io_concurrency
 ●set to number of disks or channels
 ●advisory only
 ●Linux only
A word about
Random Page Cost
                                      3
                                      1
►Abused as a “force index use”
 parameter
►Lower it if the seek/scan ratio of your
 storage is actually different
  ●SSD/NAND: 1.0 to 2.0
  ●EC2: 1.1 to 2.0
  ●High-end SAN: 2.5 to 3.5
►Never below 1.0
Maintenance                            3
                                       1
►Autovacuum
 ●leave it on for any application which gets
  constant writes W O
 ●not so good for batch writes -- do manual
  vacuum for bulk loads D
Maintenance                           3
                                      1
►Autovacuum
 ●have 100's or 1000's of tables?
  multiple_autovacuum_workers
   ▬but not more than ½ cores
 ●large tables? raise
  autovacuum_vacuum_cost_limit
 ●you can change settings per table
1
  4
   OS &
Filesystem
Spread Your Files Around             1
                                     4
►Separate the transaction log if   O D

 possible
  ●pg_xlog directory
  ●on a dedicated disk/array, performs
   10-50% faster
  ●many WAL options only work if you have
   a separate drive
Spread Your Files Around              1
                                      4

number of drives/arrays      1    2     3
                          which partition
OS/applications              1    1     1
transaction log              1    1     2
database                     1    2     3
Spread Your Files Around               1
                                       4
►Tablespaces for temp files     D

 ●more frequently useful if you do a lot of
  disk sorts
 ●Postgres can round-robin multiple temp
  tablespaces
Linux Tuning                             1
                                         4
►Filesystems
 ●Use XFS or Ext4
   ▬butrfs not ready yet, may never work for DB
   ▬Ext3 has horrible flushing behavior

 ●Reduce logging
   ▬data=ordered, noatime,
    nodiratime
Linux Tuning                        1
                                    4
►OS tuning
 ●must increase shmmax, shmall in kernel
 ●use deadline or noop scheduler to speed
  writes
 ●disable NUMA memory localization
  (recent)
 ●check your kernel version carefully for
  performance issues!
Linux Tuning                   1
                               4
►Turn off the OOM Killer!
  ● vm.oom-kill = 0
  ● vm.overcommit_memory = 2
  ● vm.overcommit_ratio = 80
OpenSolaris/IIlumos                 1
                                    4
►Filesystems
 ●Use ZFS
   ▬reduce block size to 8K   W O

 ●turn off full_page_writes
►OS configuration
 ●no need to configure shared memory
 ●use packages compiled with Sun
  compiler
Windows, OSX Tuning      1
                         4
►You're joking, right?
What about The Cloud?               1
                                    4
►Configuring for cloud servers is
 different
  ●shared resources
  ●unreliable I/O
  ●small resource limits
►Also depends on which cloud
  ●AWS, Rackspace, Joyent, GoGrid

… so I can't address it all here.
What about The Cloud?                    1
                                         4
►Some general advice:
 ●make sure your database fits in RAM
   ▬except on Joyent
 ●Don't bother with most OS/FS tuning
   ▬just some basic FS configuration options
 ●use synchronous_commit = off if
  possible
Set up Monitoring!                        1
                                          4
►Get warning ahead of time
 ●know about performance problems
  before they go critical
 ●set up alerts
   ▬80% of capacity is an emergency!
 ●set up trending reports
   ▬is there a pattern of steady growth
1
 5
Hardware
Hardware Basics                   1
                                  5
►Four basic components:
 ●CPU
 ●RAM
 ●I/O: Disks and disk bandwidth
 ●Network
Hardware Basics                           1
                                          5
►Different priorities for different
 applications
  ●Web: CPU, Network, RAM, ... I/O    W

  ●OLTP: balance all O
  ●DW: I/O, CPU, RAM D
Getting Enough CPU                   1
                                     5
►One Core, One Query
  ●How many concurrent queries do you
   need?
  ●Best performance at 1 core per no more
   than two concurrent queries
►So if you can up your core count, do
►Also: L1, L2 cache size matters
Getting Enough RAM                 1
                                   5
►RAM use is "thresholded"
 ●as long as you are above the amount of
  RAM you need, even 5%, server will be
  fast
 ●go even 1% over and things slow down a
  lot
Getting Enough RAM                     1
                                       5
►Critical RAM thresholds           W

 ●Do you have enough RAM to keep the
  database in shared_buffers?
    ▬Ram 3x to 6x the size of DB
Getting Enough RAM                          1
                                            5
►Critical RAM thresholds           O

 ●Do you have enough RAM to cache the
  whole database?
    ▬RAM 2x to 3x the on-disk size of the
     database
 ●Do you have enough RAM to cache the
  “working set”?
    ▬the data which is needed 95% of the time
Getting Enough RAM                          1
                                            5
►Critical RAM thresholds           D

 ●Do you have enough RAM for sorts &
  aggregates?
    ▬What's the largest data set you'll need to
     work with?
    ▬For how many users
Other RAM Issues                    1
                                    5
►Get ECC RAM
 ●Better to know about bad RAM before it
  corrupts your data.
►What else will you want RAM for?
 ●RAMdisk?
 ●SWRaid?
 ●Applications?
Getting Enough I/O                   1
                                     5
►Will your database be I/O Bound?
 ●many writes: bound by transaction log O
 ●database much larger than RAM: bound
  by I/O for many/most queries D
Getting Enough I/O                       1
                                         5
►Optimize for the I/O you'll need
  ●if you DB is terabytes, spend most of
   your money on disks
  ●calculate how long it will take to read
   your entire database from disk
    ▬backups
    ▬snapshots

  ●don't forget the transaction log!
I/O Decision Tree                                                 1
                                                                  5
lots of              fits in
           No                  Yes     mirrored
writes?              RAM?

     Yes        No

      afford
                               terabytes          HW RAID
     good HW          Yes                   No
                                of data?
      RAID?
                                 Yes
     No
                                                    mostly
SW RAID                        Storage              read?
                               Device
                                                  Yes        No

                                           RAID 5             RAID 1+0
I/O Tips                            1
                                    5
►RAID
 ●get battery backup and turn your write
  cache on
 ●SAS has 2x the real throughput of SATA
 ●more spindles = faster database
   ▬big disks are generally slow
I/O Tips                               1
                                       5
►DAS/SAN/NAS
 ●measure lag time: it can kill response
  time
 ●how many channels?
   ▬“gigabit” is only 100mb/s
   ▬make sure multipath works

 ●use fiber if you can afford it
I/O Tips           1
                   5

           iSCSI
             =
           death
SSD                                   1
                                      5
►Very fast seeks              D

 ●great for index access on large tables
 ●up to 20X faster
►Not very fast random writes
 ●low-end models can be slower than HDD
 ●most are about 2X speed
►And use server models, not desktop!
NAND (FusionIO)                      1
                                     5
All the advantages of SSD, Plus:
►Very fast writes ( 5X to 20X )    W O

  ●more concurrency on writes
  ●MUCH lower latency
►But … very expensive (50X)
Tablespaces for NVRAM                    1
                                         5
►Have a "hot" and a "cold" tablespace
  ●current data on "hot"                 O D

  ●older/less important data on "cold"
  ●combine with partitioning
►compromise between speed and size
Network                           1
                                  5
►Network can be your bottleneck
 ●lag time
 ●bandwith
 ●oversubscribed switches
 ●NAS
Network                            1
                                   5
►Have dedicated connections
 ●between appserver and database server
 ●between database server and failover
  server
 ●between database and storage
Network                                 1
                                        5
►Data Transfers
 ●Gigabit is only 100MB/s
 ●Calculate capacity for data copies,
  standby, dumps
The Most Important
Hardware Advice:
                                      1
                                      5

►Quality matters
 ●not all CPUs are the same
 ●not all RAID cards are the same
 ●not all server systems are the same
 ●one bad piece of hardware, or bad driver,
  can destroy your application
  performance
The Most Important
Hardware Advice:
                                       1
                                       5
►High-performance databases means
 hardware expertise
 ●the statistics don't tell you everything
 ●vendors lie
 ●you will need to research different
  models and combinations
 ●read the pgsql-performance mailing list
The Most Important
Hardware Advice:
                                    1
                                    5
►Make sure you test your hardware
 before you put your database on it
  ●“Try before you buy”
  ●Never trust the vendor or your sysadmins
The Most Important
Hardware Advice:
                                 1
                                 5
►So Test, Test, Test!
 ●CPU: PassMark, sysbench, Spec CPU
 ●RAM: memtest, cachebench, Stream
 ●I/O: bonnie++, dd, iozone
 ●Network: bwping, netperf
 ●DB: pgBench, sysbench
Questions?                                                                                           1
                                                                                                     6
►Josh Berkus                                      ►More Advice
 ● josh@pgexperts.com                                    ● www.postgresql.org/docs
 ● www.pgexperts.com                                     ● pgsql-performance
     ▬ /presentations.html                                 mailing list
 ● www.databasesoup.com                                  ● planet.postgresql.org
                                                         ● irc.freenode.net
                                                             ▬ #postgresql




            This talk is copyright 2013 Josh Berkus, and is licensed under the creative commons attribution license

Mais conteúdo relacionado

Mais procurados

Rancher と GitLab を使う3つの理由
Rancher と GitLab を使う3つの理由Rancher と GitLab を使う3つの理由
Rancher と GitLab を使う3つの理由Tetsurou Yano
 
sysloadや監視などの話(仮)
sysloadや監視などの話(仮)sysloadや監視などの話(仮)
sysloadや監視などの話(仮)Takanori Sejima
 
高速にコンテナを起動できるイメージフォーマット
高速にコンテナを起動できるイメージフォーマット高速にコンテナを起動できるイメージフォーマット
高速にコンテナを起動できるイメージフォーマットAkihiro Suda
 
Linux KVMではじめるカンタン仮想化入門
Linux KVMではじめるカンタン仮想化入門Linux KVMではじめるカンタン仮想化入門
Linux KVMではじめるカンタン仮想化入門VirtualTech Japan Inc.
 
NTTデータ流 Hadoop活用のすすめ ~インフラ構築・運用の勘所~
NTTデータ流 Hadoop活用のすすめ ~インフラ構築・運用の勘所~NTTデータ流 Hadoop活用のすすめ ~インフラ構築・運用の勘所~
NTTデータ流 Hadoop活用のすすめ ~インフラ構築・運用の勘所~NTT DATA OSS Professional Services
 
大量のデータ処理や分析に使えるOSS Apache Spark入門 - Open Source Conference2020 Online/Fukuoka...
大量のデータ処理や分析に使えるOSS Apache Spark入門 - Open Source Conference2020 Online/Fukuoka...大量のデータ処理や分析に使えるOSS Apache Spark入門 - Open Source Conference2020 Online/Fukuoka...
大量のデータ処理や分析に使えるOSS Apache Spark入門 - Open Source Conference2020 Online/Fukuoka...NTT DATA Technology & Innovation
 
分散環境におけるDocker とオーケストレーション
分散環境におけるDocker とオーケストレーション分散環境におけるDocker とオーケストレーション
分散環境におけるDocker とオーケストレーションMasahito Zembutsu
 
機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)
機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)
機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)Hadoop / Spark Conference Japan
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraEric Evans
 
EthernetやCPUなどの話
EthernetやCPUなどの話EthernetやCPUなどの話
EthernetやCPUなどの話Takanori Sejima
 
自宅サーバ仮想化
自宅サーバ仮想化自宅サーバ仮想化
自宅サーバ仮想化anubis_369
 
Ruby で高速なプログラムを書く
Ruby で高速なプログラムを書くRuby で高速なプログラムを書く
Ruby で高速なプログラムを書くmametter
 
go generate 完全入門
go generate 完全入門go generate 完全入門
go generate 完全入門yaegashi
 
IPVS for Docker Containers
IPVS for Docker ContainersIPVS for Docker Containers
IPVS for Docker ContainersBob Sokol
 
Node-v0.12の新機能について
Node-v0.12の新機能についてNode-v0.12の新機能について
Node-v0.12の新機能についてshigeki_ohtsu
 
多倍長整数の乗算と高速フーリエ変換
多倍長整数の乗算と高速フーリエ変換多倍長整数の乗算と高速フーリエ変換
多倍長整数の乗算と高速フーリエ変換京大 マイコンクラブ
 
C++による数値解析の並列化手法
C++による数値解析の並列化手法C++による数値解析の並列化手法
C++による数値解析の並列化手法dc1394
 

Mais procurados (20)

Rancher と GitLab を使う3つの理由
Rancher と GitLab を使う3つの理由Rancher と GitLab を使う3つの理由
Rancher と GitLab を使う3つの理由
 
sysloadや監視などの話(仮)
sysloadや監視などの話(仮)sysloadや監視などの話(仮)
sysloadや監視などの話(仮)
 
高速にコンテナを起動できるイメージフォーマット
高速にコンテナを起動できるイメージフォーマット高速にコンテナを起動できるイメージフォーマット
高速にコンテナを起動できるイメージフォーマット
 
Linux KVMではじめるカンタン仮想化入門
Linux KVMではじめるカンタン仮想化入門Linux KVMではじめるカンタン仮想化入門
Linux KVMではじめるカンタン仮想化入門
 
NTTデータ流 Hadoop活用のすすめ ~インフラ構築・運用の勘所~
NTTデータ流 Hadoop活用のすすめ ~インフラ構築・運用の勘所~NTTデータ流 Hadoop活用のすすめ ~インフラ構築・運用の勘所~
NTTデータ流 Hadoop活用のすすめ ~インフラ構築・運用の勘所~
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 
MapReduce/YARNの仕組みを知る
MapReduce/YARNの仕組みを知るMapReduce/YARNの仕組みを知る
MapReduce/YARNの仕組みを知る
 
大量のデータ処理や分析に使えるOSS Apache Spark入門 - Open Source Conference2020 Online/Fukuoka...
大量のデータ処理や分析に使えるOSS Apache Spark入門 - Open Source Conference2020 Online/Fukuoka...大量のデータ処理や分析に使えるOSS Apache Spark入門 - Open Source Conference2020 Online/Fukuoka...
大量のデータ処理や分析に使えるOSS Apache Spark入門 - Open Source Conference2020 Online/Fukuoka...
 
分散環境におけるDocker とオーケストレーション
分散環境におけるDocker とオーケストレーション分散環境におけるDocker とオーケストレーション
分散環境におけるDocker とオーケストレーション
 
機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)
機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)
機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in Cassandra
 
EthernetやCPUなどの話
EthernetやCPUなどの話EthernetやCPUなどの話
EthernetやCPUなどの話
 
自宅サーバ仮想化
自宅サーバ仮想化自宅サーバ仮想化
自宅サーバ仮想化
 
Ruby で高速なプログラムを書く
Ruby で高速なプログラムを書くRuby で高速なプログラムを書く
Ruby で高速なプログラムを書く
 
自宅インフラの育て方 第2回
自宅インフラの育て方 第2回自宅インフラの育て方 第2回
自宅インフラの育て方 第2回
 
go generate 完全入門
go generate 完全入門go generate 完全入門
go generate 完全入門
 
IPVS for Docker Containers
IPVS for Docker ContainersIPVS for Docker Containers
IPVS for Docker Containers
 
Node-v0.12の新機能について
Node-v0.12の新機能についてNode-v0.12の新機能について
Node-v0.12の新機能について
 
多倍長整数の乗算と高速フーリエ変換
多倍長整数の乗算と高速フーリエ変換多倍長整数の乗算と高速フーリエ変換
多倍長整数の乗算と高速フーリエ変換
 
C++による数値解析の並列化手法
C++による数値解析の並列化手法C++による数値解析の並列化手法
C++による数値解析の並列化手法
 

Semelhante a Five steps perform_2013

Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) PostgreSQL Experts, Inc.
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java codeAttila Balazs
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerkuchinskaya
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big timeproitconsult
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
How Much Kafka?
How Much Kafka?How Much Kafka?
How Much Kafka?confluent
 
The 5 Minute MySQL DBA
The 5 Minute MySQL DBAThe 5 Minute MySQL DBA
The 5 Minute MySQL DBAIrawan Soetomo
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
Pre and post tips to installing sql server correctly
Pre and post tips to installing sql server correctlyPre and post tips to installing sql server correctly
Pre and post tips to installing sql server correctlyAntonios Chatzipavlis
 
Utopia Kindgoms scaling case: From 4 to 50K users
Utopia Kindgoms scaling case: From 4 to 50K usersUtopia Kindgoms scaling case: From 4 to 50K users
Utopia Kindgoms scaling case: From 4 to 50K usersJaime Buelta
 
Utopia Kingdoms scaling case. From 4 users to 50.000+
Utopia Kingdoms scaling case. From 4 users to 50.000+Utopia Kingdoms scaling case. From 4 users to 50.000+
Utopia Kingdoms scaling case. From 4 users to 50.000+Python Ireland
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the CloudTony Tam
 
Deploying your SaaS stack OnPrem
Deploying your SaaS stack OnPremDeploying your SaaS stack OnPrem
Deploying your SaaS stack OnPremKris Buytaert
 
Path dependent-development (PyCon India)
Path dependent-development (PyCon India)Path dependent-development (PyCon India)
Path dependent-development (PyCon India)ncoghlan_dev
 
Scaling Magento
Scaling MagentoScaling Magento
Scaling MagentoCopious
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 

Semelhante a Five steps perform_2013 (20)

Five steps perform_2009 (1)
Five steps perform_2009 (1)Five steps perform_2009 (1)
Five steps perform_2009 (1)
 
Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009)
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big time
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
How Much Kafka?
How Much Kafka?How Much Kafka?
How Much Kafka?
 
Performance Whackamole (short version)
Performance Whackamole (short version)Performance Whackamole (short version)
Performance Whackamole (short version)
 
The 5 Minute MySQL DBA
The 5 Minute MySQL DBAThe 5 Minute MySQL DBA
The 5 Minute MySQL DBA
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Pre and post tips to installing sql server correctly
Pre and post tips to installing sql server correctlyPre and post tips to installing sql server correctly
Pre and post tips to installing sql server correctly
 
Utopia Kindgoms scaling case: From 4 to 50K users
Utopia Kindgoms scaling case: From 4 to 50K usersUtopia Kindgoms scaling case: From 4 to 50K users
Utopia Kindgoms scaling case: From 4 to 50K users
 
Utopia Kingdoms scaling case. From 4 users to 50.000+
Utopia Kingdoms scaling case. From 4 users to 50.000+Utopia Kingdoms scaling case. From 4 users to 50.000+
Utopia Kingdoms scaling case. From 4 users to 50.000+
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
Deploying your SaaS stack OnPrem
Deploying your SaaS stack OnPremDeploying your SaaS stack OnPrem
Deploying your SaaS stack OnPrem
 
Path dependent-development (PyCon India)
Path dependent-development (PyCon India)Path dependent-development (PyCon India)
Path dependent-development (PyCon India)
 
Scaling Magento
Scaling MagentoScaling Magento
Scaling Magento
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 

Mais de PostgreSQL Experts, Inc.

PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALEPostgreSQL Experts, Inc.
 
Elephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsElephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsPostgreSQL Experts, Inc.
 

Mais de PostgreSQL Experts, Inc. (20)

Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
 
Fail over fail_back
Fail over fail_backFail over fail_back
Fail over fail_back
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALE
 
HowTo DR
HowTo DRHowTo DR
HowTo DR
 
Give A Great Tech Talk 2013
Give A Great Tech Talk 2013Give A Great Tech Talk 2013
Give A Great Tech Talk 2013
 
Pg py-and-squid-pypgday
Pg py-and-squid-pypgdayPg py-and-squid-pypgday
Pg py-and-squid-pypgday
 
92 grand prix_2013
92 grand prix_201392 grand prix_2013
92 grand prix_2013
 
7 Ways To Crash Postgres
7 Ways To Crash Postgres7 Ways To Crash Postgres
7 Ways To Crash Postgres
 
PWNage: Producing a newsletter with Perl
PWNage: Producing a newsletter with PerlPWNage: Producing a newsletter with Perl
PWNage: Producing a newsletter with Perl
 
10 Ways to Destroy Your Community
10 Ways to Destroy Your Community10 Ways to Destroy Your Community
10 Ways to Destroy Your Community
 
Open Source Press Relations
Open Source Press RelationsOpen Source Press Relations
Open Source Press Relations
 
5 (more) Ways To Destroy Your Community
5 (more) Ways To Destroy Your Community5 (more) Ways To Destroy Your Community
5 (more) Ways To Destroy Your Community
 
Preventing Community (from Linux Collab)
Preventing Community (from Linux Collab)Preventing Community (from Linux Collab)
Preventing Community (from Linux Collab)
 
Development of 8.3 In India
Development of 8.3 In IndiaDevelopment of 8.3 In India
Development of 8.3 In India
 
PostgreSQL and MySQL
PostgreSQL and MySQLPostgreSQL and MySQL
PostgreSQL and MySQL
 
50 Ways To Love Your Project
50 Ways To Love Your Project50 Ways To Love Your Project
50 Ways To Love Your Project
 
8.4 Upcoming Features
8.4 Upcoming Features 8.4 Upcoming Features
8.4 Upcoming Features
 
Elephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsElephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and Variants
 
Writeable CTEs: The Next Big Thing
Writeable CTEs: The Next Big ThingWriteable CTEs: The Next Big Thing
Writeable CTEs: The Next Big Thing
 

Five steps perform_2013

  • 1. 1 3 1 5 1 4 1 2 Five Steps to PostgreSQL 1 1 Performance Josh Berkus PostgreSQL Project MelPUG 2013
  • 2. 1 3 1 5 postgresql.conf 1 4 Hardware 1 2 OS & Filesystem Query Tuning 1 1 Application Design
  • 4. 5 Layer Cake Queries Transactions Application Drivers Connections Caching Middleware Schema Config PostgreSQL Filesystem Kernel Operating System Storage RAM/CPU Network Hardware
  • 5. 5 Layer Cake Queries Transactions Application Drivers Connections Caching Middleware Schema Config PostgreSQL Filesystem Kernel Operating System Storage RAM/CPU Network Hardware
  • 6. Scalability Funnel Application Middleware PostgreSQL OS HW
  • 7. What Flavor is Your DB? O 1 W ►Web Application (Web) ●DB smaller than RAM ●90% or more “one-liner” queries
  • 8. What Flavor is Your DB? O 1 O ►Online Transaction Processing (OLTP) ●DB slightly larger than RAM to 1TB ●20-70% small data write queries, some large transactions
  • 9. What Flavor is Your DB? O 1 D ►Data Warehousing (DW) ●Large to huge databases (100GB to 100TB) ●Large complex reporting queries ●Large bulk loads of data ●Also called "Decision Support" or "Business Intelligence"
  • 10. Tips for Good Form O 1 ►Engineer for the problems you have ●not for the ones you don't
  • 11. Tips for Good Form O 1 ►A little overallocation is cheaper than downtime ●unless you're an OEM, don't stint a few GB ●resource use will grow over time
  • 12. Tips for Good Form O 1 ►Test, Tune, and Test Again ●you can't measure performance by “it seems fast”
  • 13. Tips for Good Form O 1 ►Most server performance is thresholded ●“slow” usually means “25x slower” ●it's not how fast it is, it's how close you are to capacity
  • 14. 1 Application Design
  • 15. Schema Design 1 1 ►Table design ●do not optimize prematurely ▬normalize your tables and wait for a proven issue to denormalize ▬Postgres is designed to perform well with normalized tables ●Entity-Attribute-Value tables and other innovative designs tend to perform poorly
  • 16. Schema Design 1 1 ►Table design ●consider using natural keys ▬can cut down on the number of joins you need ●BLOBs can be slow ▬have to be completely rewritten, compressed ▬can also be fast, thanks to compression
  • 17. Schema Design 1 1 ►Table design ●think of when data needs to be updated, as well as read ▬sometimes you need to split tables which will be updated at different times ▬don't trap yourself into updating the same rows multiple times
  • 18. Schema Design 1 1 ►Indexing ●index most foreign keys ●index common WHERE criteria ●index common aggregated columns ●learn to use special index types: expressions, full text, partial
  • 19. Schema Design 1 1 ►Not Indexing ●indexes cost you on updates, deletes ▬especially with HOT ●too many indexes can confuse the planner ●don't index: tiny tables, low-cardinality columns
  • 20. Right indexes? 1 1 ►pg_stat_user_indexes ●shows indexes not being used ●note that it doesn't record unique index usage ►pg_stat_user_tables ●shows seq scans: index candidates? ●shows heavy update/delete tables: index less
  • 21. Partitioning 1 1 ►Partition large or growing tables ●historical data ▬data will be purged ▬massive deletes are server-killers ●very large tables ▬anything over 10GB / 10m rows ▬partition by active/passive
  • 22. Partitioning 1 1 ►Application must be partition-compliant ●every query should call the partition key ●pre-create your partitions ▬do not create them on demand … they will lock
  • 23. Query design 1 1 ►Do more with each query ●PostgreSQL does well with fewer larger queries ●not as well with many small queries ●avoid doing joins, tree-walking in middleware
  • 24. Query design 1 1 ►Do more with each transaction ●batch related writes into large transactions
  • 25. Query design 1 1 ►Know the query gotchas (per version) ●Always try rewriting subqueries as joins ●try swapping NOT IN and NOT EXISTS for bad queries ●try to make sure that index/key types match ●avoid unanchored text searches "ILIKE '%josh%'"
  • 26. But I use ORM! 1 1 ►ORM != high performance ●ORM is for fast development, not fast databases ●make sure your ORM allows "tweaking" queries ●applications which are pushing the limits of performance probably can't use ORM ▬but most don't have a problem
  • 27. It's All About Caching 1 1 ►Use prepared queries W O ●whenever you have repetitive loops
  • 28. It's All About Caching 1 1 ►Cache, cache everywhere W O ●plan caching: on the PostgreSQL server ●parse caching: in some drivers ●data caching: ▬in the appserver ▬in memcached/varnish/nginx ▬in the client (javascript, etc.) ●use as many kinds of caching as you can
  • 29. It's All About Caching 1 1 But … ►think carefully about cache invalidation ●and avoid “cache storms”
  • 30. Connection Management 1 1 ►Connections take resources W O ●RAM, CPU ●transaction checking
  • 31. Connection Management 1 1 ►Make sure you're only using W O connections you need ●look for “<IDLE>” and “<IDLE> in Transaction” ●log and check for a pattern of connection growth ●make sure that database and appserver timeouts are synchronized
  • 32. Pooling 1 1 ►Over 100 connections? You need pooling! Webserver Webserver Pool PostgreSQL Webserver
  • 33. Pooling 1 1 ►New connections are expensive ●use persistent connections or connection pooling sofware ▬appservers ▬pgBouncer ▬pgPool (sort of) ●set pool side to maximum connections needed
  • 34. 2 1 Query Tuning
  • 35. Bad Queries 1 2 Ranked Query Execution Times 5000 4000 3000 execution time 2000 1000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 % ranking
  • 36. Optimize Your Queries 1 2 in Test ►Before you go production ●simulate user load on the application ●monitor and fix slow queries ●look for worst procedures
  • 37. Optimize Your Queries 1 2 in Test ►Look for “bad queries” ●queries which take too long ●data updates which never complete ●long-running stored procedures ●interfaces issuing too many queries ●queries which block
  • 38. Finding bad queries 1 2 ►Log Analysis ●dozens of logging options ●log_min_duration_ statement ●pgfouine ●pgBadger
  • 39. Fixing bad queries 1 2 ►EXPLAIN ANALYZE ►things to look for: ●bad rowcount estimates ●sequential scans ●high-count loops ●large on-disk sorts
  • 40. Fixing bad queries 1 2 ►reading explain analyze is an art ●it's an inverted tree ●look for the deepest level at which the problem occurs ►try re-writing complex queries several ways
  • 41. Query Optimization Cycle 1 2 log queries run pgbadger explain analyze apply fixes worst queries troubleshoot worst queries
  • 42. Query Optimization Cycle 1 2 (new) check pg_stat_statements explain analyze apply fixes worst queries troubleshoot worst queries
  • 43. Procedure Optimization 1 2 Cycle log queries run pg_fouine instrument apply fixes worst functions find slow operations
  • 44. Procedure Optimization 1 2 Cycle (new) check pg_stat_function instrument apply fixes worst functions find slow operations
  • 45. 3 1 postgresql.conf
  • 46. max_connections 3 1 ►As many as you need to use ●web apps: 100 to 300 W O D ●analytics: 20 to 40 ►If you need more than 100 regularly, use a connection pooler ●like pgbouncer
  • 47. shared_buffers 3 1 ►1/4 of RAM on a dedicated server W O ●not more than 8GB (test) ●cache_miss statistics can tell you if you need more ►less buffers to preserve cache space D
  • 48. Other memory parameters 3 1 ►work_mem ●non-shared ▬lower it for many connections W O ▬raise it for large queries D ●watch for signs of misallocation ▬swapping RAM: too much work_mem ▬log temp files: not enough work_mem ●probably better to allocate by task/ROLE
  • 49. Other memory parameters 3 1 ►maintenance_work_mem ●the faster vacuum completes, the better ▬but watch out for multiple autovacuum workers! ●raise to 256MB to 1GB for large databases ●also used for index creation ▬raise it for bulk loads
  • 50. Other memory parameters 3 1 ►temp_buffers ●max size of temp tables before swapping to disk ●raise if you use lots of temp tables D ►wal_buffers ●raise it to 32MB
  • 51. Commits 3 1 ►checkpoint_segments ●more if you have the disk: 16, 64, 128 ►synchronous_commit ●response time more important than data integrity? ●turn synchronous_commit = off W ●lose a finite amount of data in a shutdown
  • 52. Query tuning 3 1 ►effective_cache_size ●RAM available for queries ●set it to 3/4 of your available RAM ►default_statistics_target D ●raise to 200 to 1000 for large databases ●now defaults to 100 ●setting statistics per column is better
  • 53. Query tuning 3 1 ►effective_io_concurrency ●set to number of disks or channels ●advisory only ●Linux only
  • 54. A word about Random Page Cost 3 1 ►Abused as a “force index use” parameter ►Lower it if the seek/scan ratio of your storage is actually different ●SSD/NAND: 1.0 to 2.0 ●EC2: 1.1 to 2.0 ●High-end SAN: 2.5 to 3.5 ►Never below 1.0
  • 55. Maintenance 3 1 ►Autovacuum ●leave it on for any application which gets constant writes W O ●not so good for batch writes -- do manual vacuum for bulk loads D
  • 56. Maintenance 3 1 ►Autovacuum ●have 100's or 1000's of tables? multiple_autovacuum_workers ▬but not more than ½ cores ●large tables? raise autovacuum_vacuum_cost_limit ●you can change settings per table
  • 57. 1 4 OS & Filesystem
  • 58. Spread Your Files Around 1 4 ►Separate the transaction log if O D possible ●pg_xlog directory ●on a dedicated disk/array, performs 10-50% faster ●many WAL options only work if you have a separate drive
  • 59. Spread Your Files Around 1 4 number of drives/arrays 1 2 3 which partition OS/applications 1 1 1 transaction log 1 1 2 database 1 2 3
  • 60. Spread Your Files Around 1 4 ►Tablespaces for temp files D ●more frequently useful if you do a lot of disk sorts ●Postgres can round-robin multiple temp tablespaces
  • 61. Linux Tuning 1 4 ►Filesystems ●Use XFS or Ext4 ▬butrfs not ready yet, may never work for DB ▬Ext3 has horrible flushing behavior ●Reduce logging ▬data=ordered, noatime, nodiratime
  • 62. Linux Tuning 1 4 ►OS tuning ●must increase shmmax, shmall in kernel ●use deadline or noop scheduler to speed writes ●disable NUMA memory localization (recent) ●check your kernel version carefully for performance issues!
  • 63. Linux Tuning 1 4 ►Turn off the OOM Killer! ● vm.oom-kill = 0 ● vm.overcommit_memory = 2 ● vm.overcommit_ratio = 80
  • 64. OpenSolaris/IIlumos 1 4 ►Filesystems ●Use ZFS ▬reduce block size to 8K W O ●turn off full_page_writes ►OS configuration ●no need to configure shared memory ●use packages compiled with Sun compiler
  • 65. Windows, OSX Tuning 1 4 ►You're joking, right?
  • 66. What about The Cloud? 1 4 ►Configuring for cloud servers is different ●shared resources ●unreliable I/O ●small resource limits ►Also depends on which cloud ●AWS, Rackspace, Joyent, GoGrid … so I can't address it all here.
  • 67. What about The Cloud? 1 4 ►Some general advice: ●make sure your database fits in RAM ▬except on Joyent ●Don't bother with most OS/FS tuning ▬just some basic FS configuration options ●use synchronous_commit = off if possible
  • 68. Set up Monitoring! 1 4 ►Get warning ahead of time ●know about performance problems before they go critical ●set up alerts ▬80% of capacity is an emergency! ●set up trending reports ▬is there a pattern of steady growth
  • 70. Hardware Basics 1 5 ►Four basic components: ●CPU ●RAM ●I/O: Disks and disk bandwidth ●Network
  • 71. Hardware Basics 1 5 ►Different priorities for different applications ●Web: CPU, Network, RAM, ... I/O W ●OLTP: balance all O ●DW: I/O, CPU, RAM D
  • 72. Getting Enough CPU 1 5 ►One Core, One Query ●How many concurrent queries do you need? ●Best performance at 1 core per no more than two concurrent queries ►So if you can up your core count, do ►Also: L1, L2 cache size matters
  • 73. Getting Enough RAM 1 5 ►RAM use is "thresholded" ●as long as you are above the amount of RAM you need, even 5%, server will be fast ●go even 1% over and things slow down a lot
  • 74. Getting Enough RAM 1 5 ►Critical RAM thresholds W ●Do you have enough RAM to keep the database in shared_buffers? ▬Ram 3x to 6x the size of DB
  • 75. Getting Enough RAM 1 5 ►Critical RAM thresholds O ●Do you have enough RAM to cache the whole database? ▬RAM 2x to 3x the on-disk size of the database ●Do you have enough RAM to cache the “working set”? ▬the data which is needed 95% of the time
  • 76. Getting Enough RAM 1 5 ►Critical RAM thresholds D ●Do you have enough RAM for sorts & aggregates? ▬What's the largest data set you'll need to work with? ▬For how many users
  • 77. Other RAM Issues 1 5 ►Get ECC RAM ●Better to know about bad RAM before it corrupts your data. ►What else will you want RAM for? ●RAMdisk? ●SWRaid? ●Applications?
  • 78. Getting Enough I/O 1 5 ►Will your database be I/O Bound? ●many writes: bound by transaction log O ●database much larger than RAM: bound by I/O for many/most queries D
  • 79. Getting Enough I/O 1 5 ►Optimize for the I/O you'll need ●if you DB is terabytes, spend most of your money on disks ●calculate how long it will take to read your entire database from disk ▬backups ▬snapshots ●don't forget the transaction log!
  • 80. I/O Decision Tree 1 5 lots of fits in No Yes mirrored writes? RAM? Yes No afford terabytes HW RAID good HW Yes No of data? RAID? Yes No mostly SW RAID Storage read? Device Yes No RAID 5 RAID 1+0
  • 81. I/O Tips 1 5 ►RAID ●get battery backup and turn your write cache on ●SAS has 2x the real throughput of SATA ●more spindles = faster database ▬big disks are generally slow
  • 82. I/O Tips 1 5 ►DAS/SAN/NAS ●measure lag time: it can kill response time ●how many channels? ▬“gigabit” is only 100mb/s ▬make sure multipath works ●use fiber if you can afford it
  • 83. I/O Tips 1 5 iSCSI = death
  • 84. SSD 1 5 ►Very fast seeks D ●great for index access on large tables ●up to 20X faster ►Not very fast random writes ●low-end models can be slower than HDD ●most are about 2X speed ►And use server models, not desktop!
  • 85. NAND (FusionIO) 1 5 All the advantages of SSD, Plus: ►Very fast writes ( 5X to 20X ) W O ●more concurrency on writes ●MUCH lower latency ►But … very expensive (50X)
  • 86. Tablespaces for NVRAM 1 5 ►Have a "hot" and a "cold" tablespace ●current data on "hot" O D ●older/less important data on "cold" ●combine with partitioning ►compromise between speed and size
  • 87. Network 1 5 ►Network can be your bottleneck ●lag time ●bandwith ●oversubscribed switches ●NAS
  • 88. Network 1 5 ►Have dedicated connections ●between appserver and database server ●between database server and failover server ●between database and storage
  • 89. Network 1 5 ►Data Transfers ●Gigabit is only 100MB/s ●Calculate capacity for data copies, standby, dumps
  • 90. The Most Important Hardware Advice: 1 5 ►Quality matters ●not all CPUs are the same ●not all RAID cards are the same ●not all server systems are the same ●one bad piece of hardware, or bad driver, can destroy your application performance
  • 91. The Most Important Hardware Advice: 1 5 ►High-performance databases means hardware expertise ●the statistics don't tell you everything ●vendors lie ●you will need to research different models and combinations ●read the pgsql-performance mailing list
  • 92. The Most Important Hardware Advice: 1 5 ►Make sure you test your hardware before you put your database on it ●“Try before you buy” ●Never trust the vendor or your sysadmins
  • 93. The Most Important Hardware Advice: 1 5 ►So Test, Test, Test! ●CPU: PassMark, sysbench, Spec CPU ●RAM: memtest, cachebench, Stream ●I/O: bonnie++, dd, iozone ●Network: bwping, netperf ●DB: pgBench, sysbench
  • 94. Questions? 1 6 ►Josh Berkus ►More Advice ● josh@pgexperts.com ● www.postgresql.org/docs ● www.pgexperts.com ● pgsql-performance ▬ /presentations.html mailing list ● www.databasesoup.com ● planet.postgresql.org ● irc.freenode.net ▬ #postgresql This talk is copyright 2013 Josh Berkus, and is licensed under the creative commons attribution license