SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
The Beauty of
 Informix Disk Structures
Presented by Frédéric Delest
  Written by Andreas Legner
What to expect
•   On-disk persistence of an Informix Server instance
•   Touch on layout of spaces and chunks
•   Pages and page types
•   How’s your data stored in partitions
•   Ways to look at what’s on disk
•   Hands-on
    – Finding a spec. row of data in your server instance
• Your questions answered
    – Many things only documented vaguely nowadays,
      so wonder what you still know ;-)
• Hope there’s something new for everyone!
We’ll be talking…
•   Partitions
     –   What all your tables and indices consist of
     –   Even things like sequences or timeseries

•   Pages
     –   What a whole instance is based upon
     –   Changed heavily over time – and still remained the same

•   Dbspaces & Chunks
     –   Two very persisting species as well
     –   through all evolution since earliest versions of Informix

•   Physical & Logical Logs
     –   How old is your oldest phys. or log. log file?

•   All this supporting an ever growing, heavily expanding set of functionality
     –   Allowing for extremely seamless, reliable, unexpensive and fast
         migration from v7 through v11.7 (and back)

                    Ain’t this Beauty?         Simplicity designed for sustanability.
Test Environment
• Informix Virtual Appliance
  – Same as used for other sessions

• The main demo instance:
  – INFORMIXDIR=/opt/IBM/informix
  – INFORMIXSERVER=demo_on
  – ONCONFIG=onconfig.demo_on
  – ROOTPATH
    /data/IBM/informix/demo/demo_on/online_root
Jump right into it
• What makes up an Informix server instance when it’s down?
   – $INFORMIXDIR & $ONCONFIG
   – onconfig root chunk info
   – Chunks

• A chunk
   – A device (“raw”)
   – A file (“cooked”)
   – Actually a contiguous portion of them
       • Starting at an offset
       • reaching <size> kiloBytes further
       • NOT initialized anyhow as a whole
       • Unless newly created as a (cooked) file: blown up with zero bytes
         Can’t use ‘sparse files’ – for obvious reasons
       • Only first and third pages (0 + 2) are initialized
The Root Chunk
•   The Root Chunk is the only chunk initially
     –   Making up the Root dbspace (“rootdbs” usually)
     –   Holding everything required
     –   In specific order
•   … and will remain the key entry point to e.g. all other chunks

•   Begins on so called “root reserved pages”
     –   Starting from here anything else can be found

•   Followed by a single chunk free list page
     –   Every chunk logically begins in a chunk free list page recording its free space
     –   Only blob chunks (chunks of a blobspace) don’t have these – they are a totally different kind

•   Followed by the dbspace’s master partition
     –   “partition partition” or “TBLSpace TBLSpace”

•   (Almost) anything beyond this can change
     –   Database partition              <– this would never move
     –   The physical log
     –   Initial logical logs
     –   System and user databases …
Dbspaces/Sb(lob)spaces
• Up to m logical collections of 1 – n chunks each
    – We’ll see what m and n can be

• Home of
    – Partitions in case of dbspaces and – partially – sbspaces
    – Sblobs in case of sbspaces
    – Blobs in case of blobspaces

• Minimum entity of a backup or restore

• ‘Critical’
    – Rootdbs or
    – Dbspace containing physical or any logical log
    – Must be contained in any dbspace backup or L 0 restore
A Fresh Instance
                                             For newbies
         (or others still wishing to know – do this whenever you want to test something):

•   Let’s create a new baby instance:
     –   INFORMIXSERVER=baby
     –   ONCONFIG=onconfig.$INFORMIXSERVER
     –   Copy $INFORMIXDIR/etc/onconfig.std to $INFORMIXDIR/etc/$ONCONFIG
     –   Edit new config file:
           •   ROOTPATH /tmp/root_chunk.baby
           •   Lower ROOTSIZE, PHYSFILE and LOGSIZE by factor 10
           •   MSGPATH $INFORMIXDIR/online.baby.log
           •   SERVERNUM 123
           •   DBSERVERNAME baby

     –   Add an entry to $INFORMIXDIR/etc/sqlhosts (unset INFORMIXSQLHOSTS):
           •   baby onsoctcp     localhost 9876



     –   oninit –ivy             to initialize new instance on disk
     –   onstat –d               to see chunks and dbspaces we have - one only
     –   onstat –m –r 2          to see when system databases creation is done
oncheck -p…
•   oncheck’s -p option adds printing to checking
     –   DBA’s first choice for looking at disk objects
     –   -pr|R for printing reserved pages
     –   -pP for locating pages physically, taking chunk# and page offset (base pages)
     –   -pp for locating pages logically within a partition, taking partnum and log. page#
     –   -pe for extent listing
     –   -pt|T for printing partition pages
     –   … pd|D|k|K|l|L for data and index pages

•   Some options only working when server is up
     –   Esp. when needing more detail info than just a chunk
•   Others first attempting a connection
     –   Might have to wait up to $INFORMIXCONTIME seconds (default: 60) – when server is down

•   When server is up it will always go through the server
     –   Hence show you buffer cache content rather than reading from disk
First Peek at a Chunk
• Do an ‘oncheck -pe [rootdbs]’
   – Extent listing
      • we’ll clarify “extents” later

   – Can limit output to specific space
      • not any further … so can be big

   – Only available online (or quiescent)
   – And with all the space’s chunks online (!)
      • Won’t work if one chunk in space is down

• Try and locate the objects mentioned so far
oncheck -pe
DBspace Usage Report: rootdbs            Owner: informix   Created: 01/26/2011


 Chunk Pathname                            Pagesize(k)   Size(p)   Used(p)   Free(p)
     1 /tmp/rootchunk.baby                           2    100000     52256     47744

 Description                                                   Offset(p) Size(p)
 ------------------------------------------------------------- -------- --------
 RESERVED PAGES                                                       0       12
 CHUNK FREELIST PAGE                                                 12        1
 rootdbs:'informix'.TBLSpace                                         13      250
 PHYSICAL LOG                                                       263    15000
 LOGICAL LOG: Log file 1                                          15263      500
 LOGICAL LOG: Log file 2                                          15763      500
  ...


• ‘p’ is pages – base unit of a chunk

• First 3 items always the same
      – Root reserverd pages
      – Chunk’s first chunk free list page
      – TBLSpace TBLSpace’s first extent
• All 3 can have “extension”
Pages and Page Sizes
•   A chunk is made up of pages

•   Base i/o unit is a page
     – Also data and index buffering occurs in pages

•   2kB entities (4kB on AIX and Windows) by default
     – Mandatory page size on “critical dbspaces”:
       root dbspace or dbspace holding any phys. or log. logs

•   Configurable page size for other, non-critical dbspaces
     – Per dbspcace
     – At dbspace creation time
     – In multiples of default page size, up to 16k

•   Different game in blobspaces and sbspaces
     – Blobsspaces always had freely choosable pages sizes (multiples of base page size)
     – Sbspaces use default (base) page size
       … no matter what people (or Informix installers) keep telling you ;-)
How to look at a page?
•   oncheck -pP <chunk_no> <page_offset> [#pgs] [-h]
     – Prints page header
     – Prints page slot table and slots if applicable
           • Unless -h (headers only) specified
     – <#pgs> to see multiple pages
           • (not working yet with non-default page size)
     – Requires <page_offset> specified in base (default) pages !

•   SMI:
     –   sysrawdsk look at pages as raw space
     –   syspaghdr look at page headers only
     –   Both indexed, but not very smart – e.g. can’t well use <=/</>/>=
     –   Use base pages for offset!
     –   Use carefully – not too safe, esp. with non-default page size!

•   onstat: when pages in memory
•   dd / od / …
     – Latter two provide more ‘natural’ image of a page
Page Structure
•   (Almost) every used page has
     –   a 24byte page header
     –   a trailing stamp (last 4 bytes)

•   When header and stamp match, the page is considered consistent in itself
     –   At least it has been written completely
     –   A checksum mechanism used nowadays – used to be two stamps that needed to match

•   Page content usually is organized in slots
•   Slot table
     –   growing from page end
     –   Entries describing slots

•   Unused pages
     –   no structure or consistency assumed

•   What is ‘unused’ ?
     –   Not allocated to any object, so FREE in the chunk
     –   Or beyond it’s object’s “npused” (# pages used)
Some Pages Now
• Try this now:
   – oncheck -pP 1 0 12 > first12.pgs

• Find
   – Page headers
   – Slot tables and entries
   – Slots

• What is it what we’re looking at?

• Try to dump the same using ‘dd’ and/or ‘od’
   – dd if=$ROOTCHUNK bs=2k count=12 | od -A x -t x
     > first12.hex
Page Header Fields
• Page header size: 24 bytes:
• Fields – no longer documented:
  –   Chunk:Offset    (OOOOOOOO CCCC)   4+2
  –   Checksum        (ssss)            2
  –   N2k             (n:5)             2:5
  –   Nslots          (ssss:11)         2:11
  –   Flags/Type      (FFFF)            2
  –   Free Pointer    (ffff)            2
  –   Free Counter    (cccc)            2
  –   Next Page       (NNNNNNNN)        4
  –   Previous Page   (PPPPPPPP)        4
Page Types
•   Many different page types
     –   oncheck -pp|P naming them in page header output portion
     –   Encoded in lower bits of page flags

•   ROOTRSV:                root (and extended) reserved pages
                            recording system configuration
•   CHUNK:                  chunk free list pages, recording FREE extents
                            first one always at fixed position 2 in a chunk
                            chained if one doesn’t suffice
•   FREE:                   partition free bitmap,
                            recording page’s use state within a partition
                            at fixed intervals within a partition
                            first one always logical page 0
•   PARTN/SECPARTN:         partition pages and secondary partition pages
                            a partition’s details, incl. in-place alter history
•   DATA/REMAIN:            table data row and overflow (remainder) pages
•   BTREE:                  btree index page (root/twig/leaf node)
•   PBLOB                   partition blob page
•   BLOB/BMAP/BBITblobspace pages
Slots
•   Page content organized in slots normally
     – Only few page types don’t need real slots
       (chunk FREE list, bitmap, plog marker, any sort of blobspace pages …)

•   Slot
     – A contiguous range of bytes within a page
     – With a 2*2bytes slot table entry describing it
           • Slot begin and slot size, optional slot flags
     – Space consumption of a slot:                          slot size + 4
     – Slot size can be zero – deleted slot
     – Slot table size, growing from page end:               page’s #slots * 4

•   Page can have up to 2k slots
     – E.g. large index pages can have this many
     – Certain pages have much lower limits, for various reasons
           • DATA, REMAINDER, PBLOB: max. 255 slots          reason: ROWIDs (we’ll see later)
           • Reserved pages only few (tens)                  reason: slot vs. page sizes
Reserved Pages
•   Try this:
     – oncheck -pr > first12.txt

•   compare to what we’ve dumped earlier
     – Formatting those 12 “reserved pages”
          • We’re seeing:
                –   Page Zero: version information primarily
                –   Onconfig params and values (not all)
                –   Physical/Logical log definitions, and last Checkpoint details
                –   Dbspace definitions
                –   Chunk definitions
                –   Archive details and Data Replication status

     – Yet not all of them are displayed
          • Some are paired – for recoverability reasons
          • Only more recent of pair is taken

     – In a larger instance many more are displayed …
          • But not mentioned individually, as extra (extended) reserved pages
          • Initial 12 can only hold very limited amount of details
Reserved Pages Extension
                                Root Reserved Pages     Extended Reserved Pages
• Log. logs, dbspaces and
  chunks can be many            Zero
                                Config           More       logical   logs…
• To accommodate their
  definitions reserved pages    Ckpt1            More       logical   logs…
  can be extended
                                Ckpt2
                                                 More       space     specs…
• Extensions for each sort      Dbsp1
  always in contiguous blocks                    More       space     specs…
    – Within “rootdbs” chunks   Dbsp2
                                PChunk1
• Root reserved page pointing                    More       pchunk    specs…
  to its extension              PChunk2
                                                 More       pchunk    specs…
    – pg_next: start page       MChunk1
    – pg_prev: extension size
                                MChunk2          More       mchunk    specs…
                                Arch1            More       mchunk    specs…
                                Arch2
Extents
• Contiguous sets of pages allocated to a certain purpose
   – E.g. to a partition, or forming a log file

• Within one chunk

• Arbitrary size: 1 page up to (almost) chunk size

• Oncheck –pe: listing all extents of a dbspace (or whole
  instance)

• S.a sysextents SMI table
Sorts of Extents
• Possible extents:
   –   Reserved pages – root and extensions
   –   Chunk free list pages – single page extents
   –   Physical log – 1 large extent
   –   Logical logs – 1 extent each
   –   Partition extents – data/index partitions consist of 0 - many extents
   –   Unused areas of a chunk: FREE extents

• So what’s needed to read to compile a complete extent list?
   – Reserved pages (for log files)
   – Chunk free lists
   – Partition pages
Partitions
• Partitions form the containers for database objects
  recorded, by their Partnum or Fragid, in database catalogs
   – Tables (and their fragments)
   – Indices (and their fragments)
   – Sequences – relying on a partion’s ability to generate serial values
   – Even external tables possess a (dummy) partition – for having a
     partnum
   – Sbspace metadata

• Thinking of a partition as a ‘file’ (containing the partition data)
   – partition (header) page would be the ‘inode’
   – Partition extents would be ‘blocks’
   – dbspace would be the ‘file system’
Partitions (cont.)
• A partition (“tablespace”) consists of
    – Its partition header page
        • Holding the details that describe the partition
        • Potentially extending to secondary partition pages
    – A collections of allocated extents

• Partitions resides in a (db-/sb-)space, one abstraction level above chunks
    – Their extents reside in the space’s chunks

• All partitions of a space are recorded, by their partition header pages,
  in the space’s Partition Partition
    – aka. “TBLSpace TBLSpace”
    – The space’s master partition - the very first one
    – Holding the spaces partition pages
What’s a Partnum?
 • Visualizing a dbspace first:
Dbspace:
 DbsNo rp off flags 1.chk #chks flags (b)pg_sz name
     4 0 354 60001
Primary chunks:
                         4    3 N--BA        1 datadbs                                     Reserved Pages
 chkno rp off dbsno nxchk offset   fpage #bpages #freepgs ovhd    f l a g s pg_sz path
     4 0 39c      4     5      0       -     1000        0       30040 PO-B   2048 /data/IBM/informix/demo/demo_on/datadbs_1
     5 0 4c8      4     6      0       -     2500        2       30040 PO-B   2048 /data/IBM/informix/demo/demo_on/datadbs_2
     6 0 5f4      4     0      0       -     4000      270       10040 PO-B   2048 /data/IBM/informix/demo/demo_on/datadbs_3



     0     1. chunk …/datadbs_1         99       2. chunk …/datadbs_2                    3. chunk …/datadbs_3
                       …


   Partition                Partnum
                                                                 …
   Tblspace tblspace       0x00400001     100                                   199
   FREE + free list
   Table_1                 0x0040005b
   Table_2                 0x004000c2
   Table_3                 0x00400062
   Table_4                 0x00400005
So … What’s a partnum?
•   A partnum is a 4bytes integer number
     –   Uniquely identifying a partition
     –   Falling into 1.5 bytes “dbspace number”
     –   And 2.5 bytes “logical page number”
     –   Hex representation: 0xdddlllll

•   What does this mean?
     –   Each dbspace can hold partitions (TBLSpaces)
     –   It always holds a master partition (TBLSpace TBLSpace)
     –   All other partitions are recorded in this master partition
     –   The master partition only contains partition header and secondary pages
     –   Each partition header page describes one partition
     –   The ‘lllll’ fraction of a partition’s partnum is the number (position) of its partition
         header page within the dbspace’s (‘ddd’) TBLSpace TBLSpace

•   What special partnum then is 0xN00001 ?
     – TBLSpace TBLSpace’s own partnum for dbspace ‘N’
Looking at a Partition Page
•   oncheck –pt|T db:owner.table[,dbs] | partnum
•   Finds the desired partition header page(s)
•   Tells you the following recorded in those pages
     –    General partition info – slot 1
            •   Partnum, date, flags, rowsize, …
     –    Extents allocated to this partition – slot 5
     –    Evtl. a pointer to the partition’s current compression dictionary – slot 7
     –    Partition name printed is NOT taken from partition page – determined from catalogs instead

•   Specifying a partnum will target only this one partition page
     –    Will attempt to resolve partition name querying systables

•   Otherwise all partitions of the specified table are targeted
     –    Single data partition – or multiple in case of a fragmented table
     –    Index partitions – each index normally has its own partition (detached)

•   -pT: will scan an entire (set of) partition(s) to gather page statistics
            •   Index/Data/Bitmap page types and usage
            •   Index usage reports
            •   In-place alter versions

•   Only working with the server running
Partition Page ‘raw’
• oncheck –pp 0x<N>00001 <L>
   – What’s the difference ?
   – Not formatted as a partition page – but “complete”
     instead ;-)


• Try and compare the following:
   –   oncheck -pt 0x100001
   –   oncheck -pp 0x100001 1
   –   In how far are these the same?
   –   In how far different?
Find a specific Data Row now
• Given a specific row in a fragmented table
   – dbname:[owner.]tabname[,fragdbs|%partition]:rowid
   – or a partnum:rowid combination,
     e.g. from a log record
   – What would it take to get to that row manually?


• First let’s learn what’s to be done under the hood

• Let’s assume the partnum is known already
   – Can be obtained from systables or sysfragments
   – Let’s say: partnum 0x400079, rowid 0x00000a01
   – Or obtain e.g. from systables.partnum
So what’s a Rowid ?
                                                A Partition
• A rowid describes the precise                                    Bitmap
  location of a row within a           1st extent                   Page
                                       Page 0
  partition/fragment:                                               Page
                                       2nd extent
   – 0xppppppss - 4byte integer        page 4                      header
   – High 3 bytes: logical page
                                       3rd extent       slot 1
                                                                   Rowid:
     number within partition           page 8
                                                        slots ….
                                                        slot n
                                                                   0xa01
   – Low byte: slot number with page
                                       4th extent

• Not to be confused with the
  “WITH ROWID” shadow                                              Slots
  column (frag’d table)                         ….
   – A real number assigned to a row
                                       5th Extent ...
Paths to Our Row’s Page (1)
    So we need extent info for our partition (identified by partnum)
    – Want to physically locate the page containing our row
    – Either walk all the way by foot, via the partition pages
    – Or use pick from a formated extent list

• Crawling:
  Find partition page for partnum and use its extent list for translation
    • Dump Tblspace Tblspace partition page:
      4th page in space’s first chunk - this is fixed
    • Slot 5 has the extent list - we’re on Linux, sorry for wrong endianess
    • Take partnum’s “logical page” portion
    • Convert to physical address using raw extent list found
    • Determine location of target partition page and dump it as well
    • Use that page’s raw extent list for translating your rowid into a physical page
Paths to Our Row’s Page (2)
• Walking:
  Using formatted extent list
   • Obtain an extent list (oncheck –pe)
   • Determine table name (from system catalog)
   • Find extent matching your matching
     (can be confusing if table is fragmented)
   • OR: use extent list in ‘oncheck –pt <partnum>’ output
   • Calculate precise phys. location
     (extent start plus log. page difference)

• Driving:
   – oncheck -pp <partnum> <logical_page>
The Row Finally
• oncheck will dump the page’s slots in raw hex format
   – Pick the one your rowid is pointing to

• What’s easy to determine
   – Does the row exist? No, if slot is missing or zero length.
   – Does the slot length fit the partition’s row length?
      • Might be shorter in case of variable length data types.

• If you need to know what’s in this row
   – E.g. page can’t be read any more (inconsitent)
   – No way around applying the table’s schema byte by byte
   – Way beyond this 1 hour talk ;-)
Indirect / Incomplete Rows
• Row not fitting your schema?
    – Too short somehow?

• Strange looking slot length – way too large?
• High bit set in a DATA page slot length means
    – first 4 bytes in slot are no DATA
    – Instead they’re a forward pointer
    – In the form of another 4byte rowid (0xppppppss)

• An indirect row or an initial piece of a row obviously
    – Need to look up its next/remainder piece
    – Located on so called REMAINDER pages
    – Row can consist of multiple such pieces (32k max row length)

• What fun looking at such rows in their entirety!
Watch out for IPA!
•   Row still not fitting our schema??
•   DATA page header having strange value in its ‘page next’ field??

•   Then we’re on an old version page!
     –   What’s that again?
     –   And can this be combined even with row indirection (multi-piece rows)? Sure it can!

•   All rows on such page don’t fit the table’s current schema
     –   Instead they’re in the shape of a previous schema this table had
     –   Before potentially a whole series of ALTER TABLE statements
     –   These ALTERs have been performed in in-place fashion – no real changes yet

•   Some real dirt work starting here, again at our partition page
     –   There we learn about a series of secondary partition pages
     –   Keeping a memory of all outstanding in-place ALTERs
     –   Partition page’s pg_next field has the TBLSpace TBLSpace log. page# of the first such ALTER page
Compression
•   Neither row indirection nor IPA can explain what my row’s looking like?
     – Moreover it does look like real garbage!
     – And that slot length is an oddity – way too big

•   Is this partition compressed?
     – Consult ‘oncheck -pt’ output, it would tell

•   Is this row compressed?
     – The slot length field would have its second highest bit set

•   Again next step would be our partition page
     – Slot 7 has the pointer to the current compression dictionary
     – Also oncheck -pt should show this information

•   Then uncompress the row using the uncompress dictionary
     – Not here, not now …
Questions?!?




11/16/2012        Template Presentation - Session Z99   37
Beauty of
Informix Disk Structures
      Andreas Legner
  andreas.legner@de.ibm.com

Mais conteúdo relacionado

Mais procurados

Pldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsPldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsmysqlops
 
How mysql handles ORDER BY, GROUP BY, and DISTINCT
How mysql handles ORDER BY, GROUP BY, and DISTINCTHow mysql handles ORDER BY, GROUP BY, and DISTINCT
How mysql handles ORDER BY, GROUP BY, and DISTINCTSergey Petrunya
 
Inno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structureInno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structurezhaolinjnu
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentFarzad Nozarian
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basicHafizur Rahman
 
8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab8b. Column Oriented Databases Lab
8b. Column Oriented Databases LabFabio Fumarola
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with DockerFabio Fumarola
 
Apache HBase - Lab Assignment
Apache HBase - Lab AssignmentApache HBase - Lab Assignment
Apache HBase - Lab AssignmentFarzad Nozarian
 
Mysql Fulltext Search 1
Mysql Fulltext Search 1Mysql Fulltext Search 1
Mysql Fulltext Search 1johnymas
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYEmanuel Calvo
 
MySQL database replication
MySQL database replicationMySQL database replication
MySQL database replicationPoguttuezhiniVP
 
Mysql database basic user guide
Mysql database basic user guideMysql database basic user guide
Mysql database basic user guidePoguttuezhiniVP
 

Mais procurados (19)

Pldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsPldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internals
 
How mysql handles ORDER BY, GROUP BY, and DISTINCT
How mysql handles ORDER BY, GROUP BY, and DISTINCTHow mysql handles ORDER BY, GROUP BY, and DISTINCT
How mysql handles ORDER BY, GROUP BY, and DISTINCT
 
Inno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structureInno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structure
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Hive commands
Hive commandsHive commands
Hive commands
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
 
8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
 
Apache HBase - Lab Assignment
Apache HBase - Lab AssignmentApache HBase - Lab Assignment
Apache HBase - Lab Assignment
 
Shark - Lab Assignment
Shark - Lab AssignmentShark - Lab Assignment
Shark - Lab Assignment
 
Presentation day5 oracle12c
Presentation day5 oracle12cPresentation day5 oracle12c
Presentation day5 oracle12c
 
Mysql Fulltext Search 1
Mysql Fulltext Search 1Mysql Fulltext Search 1
Mysql Fulltext Search 1
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
 
HDFS_Command_Reference
HDFS_Command_ReferenceHDFS_Command_Reference
HDFS_Command_Reference
 
MySQL database replication
MySQL database replicationMySQL database replication
MySQL database replication
 
Introduction to DSpace
Introduction to DSpaceIntroduction to DSpace
Introduction to DSpace
 
Mysql database basic user guide
Mysql database basic user guideMysql database basic user guide
Mysql database basic user guide
 

Semelhante a Ugif 10 2012 beauty ofifmxdiskstructs ugif

Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化YUCHENG HU
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)Ontico
 
Perforce BTrees: The Arcane and the Profane
Perforce BTrees: The Arcane and the ProfanePerforce BTrees: The Arcane and the Profane
Perforce BTrees: The Arcane and the ProfanePerforce
 
Creating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksCreating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksTim Callaghan
 
InnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter ZaitsevInnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter ZaitsevFuenteovejuna
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Ceph Tech Talk: Bluestore
Ceph Tech Talk: BluestoreCeph Tech Talk: Bluestore
Ceph Tech Talk: BluestoreCeph Community
 
W1.1 i os in database
W1.1   i os in databaseW1.1   i os in database
W1.1 i os in databasegafurov_x
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedTony Rogerson
 
Ganesh naik linux_kernel_internals
Ganesh naik linux_kernel_internalsGanesh naik linux_kernel_internals
Ganesh naik linux_kernel_internalsGanesh Naik
 
Ganesh naik linux_kernel_internals
Ganesh naik linux_kernel_internalsGanesh naik linux_kernel_internals
Ganesh naik linux_kernel_internalsnullowaspmumbai
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)referenceFromDual GmbH
 
9_Storage_Devices.pptx
9_Storage_Devices.pptx9_Storage_Devices.pptx
9_Storage_Devices.pptxJawaharPrasad3
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Secondarystoragedevices1 130119040144-phpapp02
Secondarystoragedevices1 130119040144-phpapp02Secondarystoragedevices1 130119040144-phpapp02
Secondarystoragedevices1 130119040144-phpapp02Seshu Chakravarthy
 
Working of Volatile and Non-Volatile memory
Working of Volatile and Non-Volatile memoryWorking of Volatile and Non-Volatile memory
Working of Volatile and Non-Volatile memoryDon Caeiro
 
An Efficient Backup and Replication of Storage
An Efficient Backup and Replication of StorageAn Efficient Backup and Replication of Storage
An Efficient Backup and Replication of StorageTakashi Hoshino
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsJavier González
 

Semelhante a Ugif 10 2012 beauty ofifmxdiskstructs ugif (20)

Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
 
Perforce BTrees: The Arcane and the Profane
Perforce BTrees: The Arcane and the ProfanePerforce BTrees: The Arcane and the Profane
Perforce BTrees: The Arcane and the Profane
 
Creating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksCreating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just Works
 
InnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter ZaitsevInnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter Zaitsev
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Ceph Tech Talk: Bluestore
Ceph Tech Talk: BluestoreCeph Tech Talk: Bluestore
Ceph Tech Talk: Bluestore
 
W1.1 i os in database
W1.1   i os in databaseW1.1   i os in database
W1.1 i os in database
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - Advanced
 
Ganesh naik linux_kernel_internals
Ganesh naik linux_kernel_internalsGanesh naik linux_kernel_internals
Ganesh naik linux_kernel_internals
 
Ganesh naik linux_kernel_internals
Ganesh naik linux_kernel_internalsGanesh naik linux_kernel_internals
Ganesh naik linux_kernel_internals
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)reference
 
9_Storage_Devices.pptx
9_Storage_Devices.pptx9_Storage_Devices.pptx
9_Storage_Devices.pptx
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Secondarystoragedevices1 130119040144-phpapp02
Secondarystoragedevices1 130119040144-phpapp02Secondarystoragedevices1 130119040144-phpapp02
Secondarystoragedevices1 130119040144-phpapp02
 
Working of Volatile and Non-Volatile memory
Working of Volatile and Non-Volatile memoryWorking of Volatile and Non-Volatile memory
Working of Volatile and Non-Volatile memory
 
9_Storage_Devices.pptx
9_Storage_Devices.pptx9_Storage_Devices.pptx
9_Storage_Devices.pptx
 
An Efficient Backup and Replication of Storage
An Efficient Backup and Replication of StorageAn Efficient Backup and Replication of Storage
An Efficient Backup and Replication of Storage
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDs
 

Mais de UGIF

UGIF 09 2013 Fy13 q3, corporate presentation the inflection point in the ap...
UGIF 09 2013 Fy13 q3, corporate presentation   the inflection point in the ap...UGIF 09 2013 Fy13 q3, corporate presentation   the inflection point in the ap...
UGIF 09 2013 Fy13 q3, corporate presentation the inflection point in the ap...UGIF
 
Ugif 09 2013 open source - session tech
Ugif 09 2013   open source - session techUgif 09 2013   open source - session tech
Ugif 09 2013 open source - session techUGIF
 
Ugif 09 2013 new environment and dynamic setting in ids 12.10
Ugif 09 2013   new environment and dynamic setting in ids 12.10Ugif 09 2013   new environment and dynamic setting in ids 12.10
Ugif 09 2013 new environment and dynamic setting in ids 12.10UGIF
 
Ugif 09 2013 open source
Ugif 09 2013   open sourceUgif 09 2013   open source
Ugif 09 2013 open sourceUGIF
 
Ugif 09 2013
Ugif 09 2013Ugif 09 2013
Ugif 09 2013UGIF
 
Ugif 09 2013 psm
Ugif 09 2013   psmUgif 09 2013   psm
Ugif 09 2013 psmUGIF
 
Ugif 09 2013 friug 201309 axional web studio
Ugif 09 2013 friug 201309   axional web studioUgif 09 2013 friug 201309   axional web studio
Ugif 09 2013 friug 201309 axional web studioUGIF
 
Ugif 10 2012 ppt0000001
Ugif 10 2012 ppt0000001Ugif 10 2012 ppt0000001
Ugif 10 2012 ppt0000001UGIF
 
Ugif 10 2012 informix pssc-benchmark -l.revel_oct2012
Ugif 10 2012 informix pssc-benchmark -l.revel_oct2012Ugif 10 2012 informix pssc-benchmark -l.revel_oct2012
Ugif 10 2012 informix pssc-benchmark -l.revel_oct2012UGIF
 
Ugif 10 2012 lycia2 introduction in 45 minutes
Ugif 10 2012 lycia2 introduction in 45 minutesUgif 10 2012 lycia2 introduction in 45 minutes
Ugif 10 2012 lycia2 introduction in 45 minutesUGIF
 
Ugif 10 2012 genero ugif october 3, 2012 ibm france, français
Ugif 10 2012 genero   ugif october 3, 2012  ibm france, français Ugif 10 2012 genero   ugif october 3, 2012  ibm france, français
Ugif 10 2012 genero ugif october 3, 2012 ibm france, français UGIF
 
Ugif 10 2012 iiug paris-business-update
Ugif 10 2012 iiug paris-business-updateUgif 10 2012 iiug paris-business-update
Ugif 10 2012 iiug paris-business-updateUGIF
 
Ugif 10 2012 ppt0000002
Ugif 10 2012 ppt0000002Ugif 10 2012 ppt0000002
Ugif 10 2012 ppt0000002UGIF
 
Ugif 12 2011-smart meters-11102011
Ugif 12 2011-smart meters-11102011Ugif 12 2011-smart meters-11102011
Ugif 12 2011-smart meters-11102011UGIF
 
Ugif 12 2011-informix iwa
Ugif 12 2011-informix iwaUgif 12 2011-informix iwa
Ugif 12 2011-informix iwaUGIF
 
Ugif 12 2011-ibm cap-seine
Ugif 12 2011-ibm cap-seineUgif 12 2011-ibm cap-seine
Ugif 12 2011-ibm cap-seineUGIF
 
Ugif 12 2011-france ug12142011-tech_ts
Ugif 12 2011-france ug12142011-tech_tsUgif 12 2011-france ug12142011-tech_ts
Ugif 12 2011-france ug12142011-tech_tsUGIF
 
Ugif 12 2011-four js primer presentation - new graphic charter - short versio...
Ugif 12 2011-four js primer presentation - new graphic charter - short versio...Ugif 12 2011-four js primer presentation - new graphic charter - short versio...
Ugif 12 2011-four js primer presentation - new graphic charter - short versio...UGIF
 
Ugif 12 2011-discover informix keynote 2012
Ugif 12 2011-discover informix keynote 2012Ugif 12 2011-discover informix keynote 2012
Ugif 12 2011-discover informix keynote 2012UGIF
 
Ugif 04 2011 storage prov-pot_march_2011
Ugif 04 2011   storage prov-pot_march_2011Ugif 04 2011   storage prov-pot_march_2011
Ugif 04 2011 storage prov-pot_march_2011UGIF
 

Mais de UGIF (20)

UGIF 09 2013 Fy13 q3, corporate presentation the inflection point in the ap...
UGIF 09 2013 Fy13 q3, corporate presentation   the inflection point in the ap...UGIF 09 2013 Fy13 q3, corporate presentation   the inflection point in the ap...
UGIF 09 2013 Fy13 q3, corporate presentation the inflection point in the ap...
 
Ugif 09 2013 open source - session tech
Ugif 09 2013   open source - session techUgif 09 2013   open source - session tech
Ugif 09 2013 open source - session tech
 
Ugif 09 2013 new environment and dynamic setting in ids 12.10
Ugif 09 2013   new environment and dynamic setting in ids 12.10Ugif 09 2013   new environment and dynamic setting in ids 12.10
Ugif 09 2013 new environment and dynamic setting in ids 12.10
 
Ugif 09 2013 open source
Ugif 09 2013   open sourceUgif 09 2013   open source
Ugif 09 2013 open source
 
Ugif 09 2013
Ugif 09 2013Ugif 09 2013
Ugif 09 2013
 
Ugif 09 2013 psm
Ugif 09 2013   psmUgif 09 2013   psm
Ugif 09 2013 psm
 
Ugif 09 2013 friug 201309 axional web studio
Ugif 09 2013 friug 201309   axional web studioUgif 09 2013 friug 201309   axional web studio
Ugif 09 2013 friug 201309 axional web studio
 
Ugif 10 2012 ppt0000001
Ugif 10 2012 ppt0000001Ugif 10 2012 ppt0000001
Ugif 10 2012 ppt0000001
 
Ugif 10 2012 informix pssc-benchmark -l.revel_oct2012
Ugif 10 2012 informix pssc-benchmark -l.revel_oct2012Ugif 10 2012 informix pssc-benchmark -l.revel_oct2012
Ugif 10 2012 informix pssc-benchmark -l.revel_oct2012
 
Ugif 10 2012 lycia2 introduction in 45 minutes
Ugif 10 2012 lycia2 introduction in 45 minutesUgif 10 2012 lycia2 introduction in 45 minutes
Ugif 10 2012 lycia2 introduction in 45 minutes
 
Ugif 10 2012 genero ugif october 3, 2012 ibm france, français
Ugif 10 2012 genero   ugif october 3, 2012  ibm france, français Ugif 10 2012 genero   ugif october 3, 2012  ibm france, français
Ugif 10 2012 genero ugif october 3, 2012 ibm france, français
 
Ugif 10 2012 iiug paris-business-update
Ugif 10 2012 iiug paris-business-updateUgif 10 2012 iiug paris-business-update
Ugif 10 2012 iiug paris-business-update
 
Ugif 10 2012 ppt0000002
Ugif 10 2012 ppt0000002Ugif 10 2012 ppt0000002
Ugif 10 2012 ppt0000002
 
Ugif 12 2011-smart meters-11102011
Ugif 12 2011-smart meters-11102011Ugif 12 2011-smart meters-11102011
Ugif 12 2011-smart meters-11102011
 
Ugif 12 2011-informix iwa
Ugif 12 2011-informix iwaUgif 12 2011-informix iwa
Ugif 12 2011-informix iwa
 
Ugif 12 2011-ibm cap-seine
Ugif 12 2011-ibm cap-seineUgif 12 2011-ibm cap-seine
Ugif 12 2011-ibm cap-seine
 
Ugif 12 2011-france ug12142011-tech_ts
Ugif 12 2011-france ug12142011-tech_tsUgif 12 2011-france ug12142011-tech_ts
Ugif 12 2011-france ug12142011-tech_ts
 
Ugif 12 2011-four js primer presentation - new graphic charter - short versio...
Ugif 12 2011-four js primer presentation - new graphic charter - short versio...Ugif 12 2011-four js primer presentation - new graphic charter - short versio...
Ugif 12 2011-four js primer presentation - new graphic charter - short versio...
 
Ugif 12 2011-discover informix keynote 2012
Ugif 12 2011-discover informix keynote 2012Ugif 12 2011-discover informix keynote 2012
Ugif 12 2011-discover informix keynote 2012
 
Ugif 04 2011 storage prov-pot_march_2011
Ugif 04 2011   storage prov-pot_march_2011Ugif 04 2011   storage prov-pot_march_2011
Ugif 04 2011 storage prov-pot_march_2011
 

Ugif 10 2012 beauty ofifmxdiskstructs ugif

  • 1. The Beauty of Informix Disk Structures Presented by Frédéric Delest Written by Andreas Legner
  • 2. What to expect • On-disk persistence of an Informix Server instance • Touch on layout of spaces and chunks • Pages and page types • How’s your data stored in partitions • Ways to look at what’s on disk • Hands-on – Finding a spec. row of data in your server instance • Your questions answered – Many things only documented vaguely nowadays, so wonder what you still know ;-) • Hope there’s something new for everyone!
  • 3. We’ll be talking… • Partitions – What all your tables and indices consist of – Even things like sequences or timeseries • Pages – What a whole instance is based upon – Changed heavily over time – and still remained the same • Dbspaces & Chunks – Two very persisting species as well – through all evolution since earliest versions of Informix • Physical & Logical Logs – How old is your oldest phys. or log. log file? • All this supporting an ever growing, heavily expanding set of functionality – Allowing for extremely seamless, reliable, unexpensive and fast migration from v7 through v11.7 (and back) Ain’t this Beauty? Simplicity designed for sustanability.
  • 4. Test Environment • Informix Virtual Appliance – Same as used for other sessions • The main demo instance: – INFORMIXDIR=/opt/IBM/informix – INFORMIXSERVER=demo_on – ONCONFIG=onconfig.demo_on – ROOTPATH /data/IBM/informix/demo/demo_on/online_root
  • 5. Jump right into it • What makes up an Informix server instance when it’s down? – $INFORMIXDIR & $ONCONFIG – onconfig root chunk info – Chunks • A chunk – A device (“raw”) – A file (“cooked”) – Actually a contiguous portion of them • Starting at an offset • reaching <size> kiloBytes further • NOT initialized anyhow as a whole • Unless newly created as a (cooked) file: blown up with zero bytes Can’t use ‘sparse files’ – for obvious reasons • Only first and third pages (0 + 2) are initialized
  • 6. The Root Chunk • The Root Chunk is the only chunk initially – Making up the Root dbspace (“rootdbs” usually) – Holding everything required – In specific order • … and will remain the key entry point to e.g. all other chunks • Begins on so called “root reserved pages” – Starting from here anything else can be found • Followed by a single chunk free list page – Every chunk logically begins in a chunk free list page recording its free space – Only blob chunks (chunks of a blobspace) don’t have these – they are a totally different kind • Followed by the dbspace’s master partition – “partition partition” or “TBLSpace TBLSpace” • (Almost) anything beyond this can change – Database partition <– this would never move – The physical log – Initial logical logs – System and user databases …
  • 7. Dbspaces/Sb(lob)spaces • Up to m logical collections of 1 – n chunks each – We’ll see what m and n can be • Home of – Partitions in case of dbspaces and – partially – sbspaces – Sblobs in case of sbspaces – Blobs in case of blobspaces • Minimum entity of a backup or restore • ‘Critical’ – Rootdbs or – Dbspace containing physical or any logical log – Must be contained in any dbspace backup or L 0 restore
  • 8. A Fresh Instance For newbies (or others still wishing to know – do this whenever you want to test something): • Let’s create a new baby instance: – INFORMIXSERVER=baby – ONCONFIG=onconfig.$INFORMIXSERVER – Copy $INFORMIXDIR/etc/onconfig.std to $INFORMIXDIR/etc/$ONCONFIG – Edit new config file: • ROOTPATH /tmp/root_chunk.baby • Lower ROOTSIZE, PHYSFILE and LOGSIZE by factor 10 • MSGPATH $INFORMIXDIR/online.baby.log • SERVERNUM 123 • DBSERVERNAME baby – Add an entry to $INFORMIXDIR/etc/sqlhosts (unset INFORMIXSQLHOSTS): • baby onsoctcp localhost 9876 – oninit –ivy to initialize new instance on disk – onstat –d to see chunks and dbspaces we have - one only – onstat –m –r 2 to see when system databases creation is done
  • 9. oncheck -p… • oncheck’s -p option adds printing to checking – DBA’s first choice for looking at disk objects – -pr|R for printing reserved pages – -pP for locating pages physically, taking chunk# and page offset (base pages) – -pp for locating pages logically within a partition, taking partnum and log. page# – -pe for extent listing – -pt|T for printing partition pages – … pd|D|k|K|l|L for data and index pages • Some options only working when server is up – Esp. when needing more detail info than just a chunk • Others first attempting a connection – Might have to wait up to $INFORMIXCONTIME seconds (default: 60) – when server is down • When server is up it will always go through the server – Hence show you buffer cache content rather than reading from disk
  • 10. First Peek at a Chunk • Do an ‘oncheck -pe [rootdbs]’ – Extent listing • we’ll clarify “extents” later – Can limit output to specific space • not any further … so can be big – Only available online (or quiescent) – And with all the space’s chunks online (!) • Won’t work if one chunk in space is down • Try and locate the objects mentioned so far
  • 11. oncheck -pe DBspace Usage Report: rootdbs Owner: informix Created: 01/26/2011 Chunk Pathname Pagesize(k) Size(p) Used(p) Free(p) 1 /tmp/rootchunk.baby 2 100000 52256 47744 Description Offset(p) Size(p) ------------------------------------------------------------- -------- -------- RESERVED PAGES 0 12 CHUNK FREELIST PAGE 12 1 rootdbs:'informix'.TBLSpace 13 250 PHYSICAL LOG 263 15000 LOGICAL LOG: Log file 1 15263 500 LOGICAL LOG: Log file 2 15763 500 ... • ‘p’ is pages – base unit of a chunk • First 3 items always the same – Root reserverd pages – Chunk’s first chunk free list page – TBLSpace TBLSpace’s first extent • All 3 can have “extension”
  • 12. Pages and Page Sizes • A chunk is made up of pages • Base i/o unit is a page – Also data and index buffering occurs in pages • 2kB entities (4kB on AIX and Windows) by default – Mandatory page size on “critical dbspaces”: root dbspace or dbspace holding any phys. or log. logs • Configurable page size for other, non-critical dbspaces – Per dbspcace – At dbspace creation time – In multiples of default page size, up to 16k • Different game in blobspaces and sbspaces – Blobsspaces always had freely choosable pages sizes (multiples of base page size) – Sbspaces use default (base) page size … no matter what people (or Informix installers) keep telling you ;-)
  • 13. How to look at a page? • oncheck -pP <chunk_no> <page_offset> [#pgs] [-h] – Prints page header – Prints page slot table and slots if applicable • Unless -h (headers only) specified – <#pgs> to see multiple pages • (not working yet with non-default page size) – Requires <page_offset> specified in base (default) pages ! • SMI: – sysrawdsk look at pages as raw space – syspaghdr look at page headers only – Both indexed, but not very smart – e.g. can’t well use <=/</>/>= – Use base pages for offset! – Use carefully – not too safe, esp. with non-default page size! • onstat: when pages in memory • dd / od / … – Latter two provide more ‘natural’ image of a page
  • 14. Page Structure • (Almost) every used page has – a 24byte page header – a trailing stamp (last 4 bytes) • When header and stamp match, the page is considered consistent in itself – At least it has been written completely – A checksum mechanism used nowadays – used to be two stamps that needed to match • Page content usually is organized in slots • Slot table – growing from page end – Entries describing slots • Unused pages – no structure or consistency assumed • What is ‘unused’ ? – Not allocated to any object, so FREE in the chunk – Or beyond it’s object’s “npused” (# pages used)
  • 15. Some Pages Now • Try this now: – oncheck -pP 1 0 12 > first12.pgs • Find – Page headers – Slot tables and entries – Slots • What is it what we’re looking at? • Try to dump the same using ‘dd’ and/or ‘od’ – dd if=$ROOTCHUNK bs=2k count=12 | od -A x -t x > first12.hex
  • 16. Page Header Fields • Page header size: 24 bytes: • Fields – no longer documented: – Chunk:Offset (OOOOOOOO CCCC) 4+2 – Checksum (ssss) 2 – N2k (n:5) 2:5 – Nslots (ssss:11) 2:11 – Flags/Type (FFFF) 2 – Free Pointer (ffff) 2 – Free Counter (cccc) 2 – Next Page (NNNNNNNN) 4 – Previous Page (PPPPPPPP) 4
  • 17. Page Types • Many different page types – oncheck -pp|P naming them in page header output portion – Encoded in lower bits of page flags • ROOTRSV: root (and extended) reserved pages recording system configuration • CHUNK: chunk free list pages, recording FREE extents first one always at fixed position 2 in a chunk chained if one doesn’t suffice • FREE: partition free bitmap, recording page’s use state within a partition at fixed intervals within a partition first one always logical page 0 • PARTN/SECPARTN: partition pages and secondary partition pages a partition’s details, incl. in-place alter history • DATA/REMAIN: table data row and overflow (remainder) pages • BTREE: btree index page (root/twig/leaf node) • PBLOB partition blob page • BLOB/BMAP/BBITblobspace pages
  • 18. Slots • Page content organized in slots normally – Only few page types don’t need real slots (chunk FREE list, bitmap, plog marker, any sort of blobspace pages …) • Slot – A contiguous range of bytes within a page – With a 2*2bytes slot table entry describing it • Slot begin and slot size, optional slot flags – Space consumption of a slot: slot size + 4 – Slot size can be zero – deleted slot – Slot table size, growing from page end: page’s #slots * 4 • Page can have up to 2k slots – E.g. large index pages can have this many – Certain pages have much lower limits, for various reasons • DATA, REMAINDER, PBLOB: max. 255 slots reason: ROWIDs (we’ll see later) • Reserved pages only few (tens) reason: slot vs. page sizes
  • 19. Reserved Pages • Try this: – oncheck -pr > first12.txt • compare to what we’ve dumped earlier – Formatting those 12 “reserved pages” • We’re seeing: – Page Zero: version information primarily – Onconfig params and values (not all) – Physical/Logical log definitions, and last Checkpoint details – Dbspace definitions – Chunk definitions – Archive details and Data Replication status – Yet not all of them are displayed • Some are paired – for recoverability reasons • Only more recent of pair is taken – In a larger instance many more are displayed … • But not mentioned individually, as extra (extended) reserved pages • Initial 12 can only hold very limited amount of details
  • 20. Reserved Pages Extension Root Reserved Pages Extended Reserved Pages • Log. logs, dbspaces and chunks can be many Zero Config More logical logs… • To accommodate their definitions reserved pages Ckpt1 More logical logs… can be extended Ckpt2 More space specs… • Extensions for each sort Dbsp1 always in contiguous blocks More space specs… – Within “rootdbs” chunks Dbsp2 PChunk1 • Root reserved page pointing More pchunk specs… to its extension PChunk2 More pchunk specs… – pg_next: start page MChunk1 – pg_prev: extension size MChunk2 More mchunk specs… Arch1 More mchunk specs… Arch2
  • 21. Extents • Contiguous sets of pages allocated to a certain purpose – E.g. to a partition, or forming a log file • Within one chunk • Arbitrary size: 1 page up to (almost) chunk size • Oncheck –pe: listing all extents of a dbspace (or whole instance) • S.a sysextents SMI table
  • 22. Sorts of Extents • Possible extents: – Reserved pages – root and extensions – Chunk free list pages – single page extents – Physical log – 1 large extent – Logical logs – 1 extent each – Partition extents – data/index partitions consist of 0 - many extents – Unused areas of a chunk: FREE extents • So what’s needed to read to compile a complete extent list? – Reserved pages (for log files) – Chunk free lists – Partition pages
  • 23. Partitions • Partitions form the containers for database objects recorded, by their Partnum or Fragid, in database catalogs – Tables (and their fragments) – Indices (and their fragments) – Sequences – relying on a partion’s ability to generate serial values – Even external tables possess a (dummy) partition – for having a partnum – Sbspace metadata • Thinking of a partition as a ‘file’ (containing the partition data) – partition (header) page would be the ‘inode’ – Partition extents would be ‘blocks’ – dbspace would be the ‘file system’
  • 24. Partitions (cont.) • A partition (“tablespace”) consists of – Its partition header page • Holding the details that describe the partition • Potentially extending to secondary partition pages – A collections of allocated extents • Partitions resides in a (db-/sb-)space, one abstraction level above chunks – Their extents reside in the space’s chunks • All partitions of a space are recorded, by their partition header pages, in the space’s Partition Partition – aka. “TBLSpace TBLSpace” – The space’s master partition - the very first one – Holding the spaces partition pages
  • 25. What’s a Partnum? • Visualizing a dbspace first: Dbspace: DbsNo rp off flags 1.chk #chks flags (b)pg_sz name 4 0 354 60001 Primary chunks: 4 3 N--BA 1 datadbs Reserved Pages chkno rp off dbsno nxchk offset fpage #bpages #freepgs ovhd f l a g s pg_sz path 4 0 39c 4 5 0 - 1000 0 30040 PO-B 2048 /data/IBM/informix/demo/demo_on/datadbs_1 5 0 4c8 4 6 0 - 2500 2 30040 PO-B 2048 /data/IBM/informix/demo/demo_on/datadbs_2 6 0 5f4 4 0 0 - 4000 270 10040 PO-B 2048 /data/IBM/informix/demo/demo_on/datadbs_3 0 1. chunk …/datadbs_1 99 2. chunk …/datadbs_2 3. chunk …/datadbs_3 … Partition Partnum … Tblspace tblspace 0x00400001 100 199 FREE + free list Table_1 0x0040005b Table_2 0x004000c2 Table_3 0x00400062 Table_4 0x00400005
  • 26. So … What’s a partnum? • A partnum is a 4bytes integer number – Uniquely identifying a partition – Falling into 1.5 bytes “dbspace number” – And 2.5 bytes “logical page number” – Hex representation: 0xdddlllll • What does this mean? – Each dbspace can hold partitions (TBLSpaces) – It always holds a master partition (TBLSpace TBLSpace) – All other partitions are recorded in this master partition – The master partition only contains partition header and secondary pages – Each partition header page describes one partition – The ‘lllll’ fraction of a partition’s partnum is the number (position) of its partition header page within the dbspace’s (‘ddd’) TBLSpace TBLSpace • What special partnum then is 0xN00001 ? – TBLSpace TBLSpace’s own partnum for dbspace ‘N’
  • 27. Looking at a Partition Page • oncheck –pt|T db:owner.table[,dbs] | partnum • Finds the desired partition header page(s) • Tells you the following recorded in those pages – General partition info – slot 1 • Partnum, date, flags, rowsize, … – Extents allocated to this partition – slot 5 – Evtl. a pointer to the partition’s current compression dictionary – slot 7 – Partition name printed is NOT taken from partition page – determined from catalogs instead • Specifying a partnum will target only this one partition page – Will attempt to resolve partition name querying systables • Otherwise all partitions of the specified table are targeted – Single data partition – or multiple in case of a fragmented table – Index partitions – each index normally has its own partition (detached) • -pT: will scan an entire (set of) partition(s) to gather page statistics • Index/Data/Bitmap page types and usage • Index usage reports • In-place alter versions • Only working with the server running
  • 28. Partition Page ‘raw’ • oncheck –pp 0x<N>00001 <L> – What’s the difference ? – Not formatted as a partition page – but “complete” instead ;-) • Try and compare the following: – oncheck -pt 0x100001 – oncheck -pp 0x100001 1 – In how far are these the same? – In how far different?
  • 29. Find a specific Data Row now • Given a specific row in a fragmented table – dbname:[owner.]tabname[,fragdbs|%partition]:rowid – or a partnum:rowid combination, e.g. from a log record – What would it take to get to that row manually? • First let’s learn what’s to be done under the hood • Let’s assume the partnum is known already – Can be obtained from systables or sysfragments – Let’s say: partnum 0x400079, rowid 0x00000a01 – Or obtain e.g. from systables.partnum
  • 30. So what’s a Rowid ? A Partition • A rowid describes the precise Bitmap location of a row within a 1st extent Page Page 0 partition/fragment: Page 2nd extent – 0xppppppss - 4byte integer page 4 header – High 3 bytes: logical page 3rd extent slot 1 Rowid: number within partition page 8 slots …. slot n 0xa01 – Low byte: slot number with page 4th extent • Not to be confused with the “WITH ROWID” shadow Slots column (frag’d table) …. – A real number assigned to a row 5th Extent ...
  • 31. Paths to Our Row’s Page (1) So we need extent info for our partition (identified by partnum) – Want to physically locate the page containing our row – Either walk all the way by foot, via the partition pages – Or use pick from a formated extent list • Crawling: Find partition page for partnum and use its extent list for translation • Dump Tblspace Tblspace partition page: 4th page in space’s first chunk - this is fixed • Slot 5 has the extent list - we’re on Linux, sorry for wrong endianess • Take partnum’s “logical page” portion • Convert to physical address using raw extent list found • Determine location of target partition page and dump it as well • Use that page’s raw extent list for translating your rowid into a physical page
  • 32. Paths to Our Row’s Page (2) • Walking: Using formatted extent list • Obtain an extent list (oncheck –pe) • Determine table name (from system catalog) • Find extent matching your matching (can be confusing if table is fragmented) • OR: use extent list in ‘oncheck –pt <partnum>’ output • Calculate precise phys. location (extent start plus log. page difference) • Driving: – oncheck -pp <partnum> <logical_page>
  • 33. The Row Finally • oncheck will dump the page’s slots in raw hex format – Pick the one your rowid is pointing to • What’s easy to determine – Does the row exist? No, if slot is missing or zero length. – Does the slot length fit the partition’s row length? • Might be shorter in case of variable length data types. • If you need to know what’s in this row – E.g. page can’t be read any more (inconsitent) – No way around applying the table’s schema byte by byte – Way beyond this 1 hour talk ;-)
  • 34. Indirect / Incomplete Rows • Row not fitting your schema? – Too short somehow? • Strange looking slot length – way too large? • High bit set in a DATA page slot length means – first 4 bytes in slot are no DATA – Instead they’re a forward pointer – In the form of another 4byte rowid (0xppppppss) • An indirect row or an initial piece of a row obviously – Need to look up its next/remainder piece – Located on so called REMAINDER pages – Row can consist of multiple such pieces (32k max row length) • What fun looking at such rows in their entirety!
  • 35. Watch out for IPA! • Row still not fitting our schema?? • DATA page header having strange value in its ‘page next’ field?? • Then we’re on an old version page! – What’s that again? – And can this be combined even with row indirection (multi-piece rows)? Sure it can! • All rows on such page don’t fit the table’s current schema – Instead they’re in the shape of a previous schema this table had – Before potentially a whole series of ALTER TABLE statements – These ALTERs have been performed in in-place fashion – no real changes yet • Some real dirt work starting here, again at our partition page – There we learn about a series of secondary partition pages – Keeping a memory of all outstanding in-place ALTERs – Partition page’s pg_next field has the TBLSpace TBLSpace log. page# of the first such ALTER page
  • 36. Compression • Neither row indirection nor IPA can explain what my row’s looking like? – Moreover it does look like real garbage! – And that slot length is an oddity – way too big • Is this partition compressed? – Consult ‘oncheck -pt’ output, it would tell • Is this row compressed? – The slot length field would have its second highest bit set • Again next step would be our partition page – Slot 7 has the pointer to the current compression dictionary – Also oncheck -pt should show this information • Then uncompress the row using the uncompress dictionary – Not here, not now …
  • 37. Questions?!? 11/16/2012 Template Presentation - Session Z99 37
  • 38. Beauty of Informix Disk Structures Andreas Legner andreas.legner@de.ibm.com