These are just basic storage features we’re testing, so no concerns about using either of these. Both are hosted in the cloud.
Now these are on my laptop- my Surface Pro 4 with 16G of memory and another 24G of swap from SSD.
This is how I role.
This started out on research done between Paul Randall’s work, which then created interest for me to compare to Oracle’s Richard Foote. Both focus on index performance and I was able to adapt their work to perform a comparison.
It isn’t a true apple to apple comparison, but I do think it helps to understand the differences with respect to each platform.
This is based off two major players in the Database realm, (I may introduce MySQL or another database to make it more interesting in the future.) but had fun moving it from my local VMs to Amazon and Azure. My Amazon environment is busy being used for a huge demonstration build for my Oracle Open World talk, so I had to scour my index play from it to ensure I didn’t impact what my partner from another company was building.
In Oracle every record has a unique (inside the table) 10 byte pseudocolumn ROWID representing the physical location on disk that the record is stored. When printed, each ROWID is displayed in BASE64 format (A-Za-z0-9+/):
Yes, we can query the row by ROWID.
Because of ROWID being so fast, Oracle has little to no knowledge of the power of clustered indexes. There simply is no reason for them in Oracle.
Our values for free space, (aka PCTFREE) vs. the default left on a page in SQL Server, leave space free that in SQL Server is utilized.
ROWID:
In BASE64 Format details the schema, table, tablespace, row and block that the data is stored in.
Rowids are the fastest way of accessing rows.
Rowids can be used to see how a table is organized.
Rowids uniquely identify rows in a given table.
Main translations that will be required for this session
This is what all DBAs think of when we hear IOT. Not Internet of Things, but Index Organized Table. The reuse of acronyms will be the death of us all.
It’s the closest thing to what SQL Server uses for it’s initial index on any table. Where data sorts are almost expected on the first, indexed column, Oracle invests highly on temp “tablespace”, (similar to filegroups in SQL Server) which performs all sorts and hashes, etc. that won’t fit inside the PGA, (Process Global Area) of the SGA, (System Global Area) which are separated areas of memory to perform tasks. SQL Server uses a Temp database to perform similar processing.
I’ll skip over the next one, as we’ll dig into these in a moment.
Although Oracle and SQL Server has Sequences, I used older terminology and syntax here and I wanted to point that out, as with dbms_random, which is a built in package, (group of procedures, etc.) to populate data in Oracle.
A page and a block is very similar and then we have the AWR and DMVs, which are very similar, but Oracle saves off aggregates of this performance data long term, 1 hr, 1 week, 30 days, 1 year and DMVs of course, have some data that is only cached and must be called with functions.
I won’t dig into the newest performance data added as of SQL Server 2016, the query store. The data is vast, but its’ still in its infancy and didn’t provide the basic information that I was looking for my comparisons.
It’s much more like tracing with wait events stored in the database, where I find the DMVs to be more like AWR with a limited retainment.
Along with standard backend processes that perform actions, we need to focus on how sorts are performed- PGA
SQL Server doesn’t have PGA. The indexes are assumed, for the PK or at least one per table, will be pre-sorted except for a few rows.
Anything that doesn’t fit inside PGA in Oracle will spill over to the TEMP Tablespace, which is often simple disk, (more often these days SSD or flash) offering faster performance on those sorts.
In SQL Server, non-sorted data will be sorted in memory, if not, then spills to a separate databse, the TEMP DB.
ROWID:
OOOOOO: The data object number that identifies the database segment (AAAAao in the example). Schema objects in the same segment, such as a cluster of tables, have the same data object number.
FFF: The tablespace-relative datafile number of the datafile that contains the row (file AAT in the example).
BBBBBB: The data block that contains the row (block AAABrX in the example). Block numbers are relative to their datafile, not tablespace. Therefore, two rows with identical block numbers could reside in two different datafiles of the same tablespace.
RRR: The row in the block.
The percentage you fill the page by default is set to 80, but many state, if the database isn’t an OLTP, set it to 100.
Keep in mind, the indexes are clustered or sorted, so why impact what is there when you’ll have to rebuild it at some point anyway…
The default for Oracle versions 11g and DB12c is 10%
Oracle rarely would recommend you change this, but many old school DBAs still mess with this setting and I’ve come across many databases that have been upgraded over the years that still have object with odd settings for individual objects, (tables and indexes.)
In a clustered index, the leaf rows of the index are the data rows of the table; therefore, the intermediate sort runs contain all the data rows. In a nonclustered index, the leaf rows may contain nonkey columns, but are generally smaller than a clustered index. If the index keys are large, or there are several nonkey columns included in the index, a nonclustered sort run can be large. And can often spill over to the TEMP DB
The Page Splits/sec counter is one of the "piece of string" counters for which no absolute value should be assumed a breaking point for performance challenges.
The counter can vary depending on table size, density, index depth, workload and workload type.
Page Splits are an occurance on clustered indexes, which are an advanced feature for indexing as a whole and a standard feature for SQL Server.
A split happens, as shown here, when there isn’t enough room in a page and the data needs to write to the beginning of a new page.
SQL Server needs to hold a latch for an extended period of time to alocate the new latch, copy data from the old page to the new page and then write the new data to the new page.
SQL Server can also take additional latches on the intermediate index pages while in wait.
This is the reason that rebuilds of indexes is still a common task in MSSQL.
This is a higher latency wait when concurrency is added to the mix.
From the power dynamic mgnt view, dm_db_index_usage, we can locate when there are indexes that are fragmented and impact performance in MSSQL.
Oracle leaves "dead" index nodes in the index when rows are deleted.
While rebuilding indexes is a topic that’s heavily debated, there are some poor coding choices or application processing that can offer a reason for index rebuilds.
You’ll come across an index that’s space consumption is larger than the table is sources from.
This is rare though and as I said, it can be resolved by fixing the code or the application code.
We’ll do this a number of times as we simulate poor processing that would cause issues in page splits, data storage and impact to performance.
Create table, primary key on C1 to be populated by a sequence and trigger and index on C2
This matches the default of the SQL Server side of 80% fillfactor.
Create Sequence and Trigger to insert before each row.
Insert in 7 rows to fill up one block.
Because I have a sequence and trigger in place, I only need to add the random data, 200 char and the current date.
We have one leaf block and no deleted lf rows.
This time we’re now using 8 leaf blocks, as each row requires one block each.
We’re also very little percentage.
Notice the fillfactor is now 10 and insert in one more row…
Insert 1 million rows
This is the first 8 rows from the initial test and the 1 million from the second test.
Now delete rows where c2 includes the value of 200
Notice our leaf blocks now and our used space.
Our table has grown substantially from where we’ve started with a number of load processes and transactions to create fragmentation, storage changes, etc.
And we show, we’ve deleted the rows we desired.
Let’s load another 1 million rows into our Azure table with the clustered index.
Thanks to Paul Randall for much of the original research that I was able to then build comparisons to Richard Foote’s research on the Oracle side.
Notice the inserted “junk data” we’re using for our examples. Using our IOT, we’ll remove rows from the middle of the table.
Vs. an index rebuild, we need to use a table rebuild statement here, since the INDEX IS a TABLE, as demonstrated here.
Considering the amount of rows and not just the rows need to be compressed, but also sorted for an IOT, this will take some time.
A columnstore index stores data in a column-wise (columnar) format, unlike the traditional B-tree structures used for clustered and nonclustered rowstore indexes, which store data row-wise (in rows). A columnstore index organizes the data in individual columns that are joined together to form the index.
SQL Server entered the Column Store Indexing business before Oracle with SQL Server 2012.
Their earlier version provided extensive performance gains but were read only.
Oracle placed heavy weight behind their inmemory product, even putting Maria Colgan as the voice for it, (now with Ask Tom)
Clustered column store indexes are no longer read only, fixed back in version 2014
This is a replacement for the current index to a columnstore index on the existing table.
Now we need to check what we have.
Once turned on, Oracle will build out most of the indexes, just point them to the tables that are of interest.
Indexes are dynamically created in memory, vs. SQL Server where it must be done manually.
We’re going to create our in-memory table from our c1 column in our ORA_INDEX_TST table and let Oracle In Memory manage it.
There were a few other steps that had to be performed before this would be ready, but you get the idea…
Oracle having ROWID adds space, but adds performance when using indexing.
B+ tree indexing, aka clustering indexes leaving a lot to be desired in Oracle. There are few use cases that require an IOT and less likely to take advantage of them.
SQL Server has considerable overhead in standard PK like indexing as Oracle does and it just doesn’t make sense.
The architectura difference have created index variations that take advantage of the features for each platform.
in SQL server, an Oracle DBA will underestimate the performance hit that comes with using a large composite primary key on a SQL server Clustered table.