Onetablespace

Why backup one tablespace at a time? Backup and Recovery Tips

Why backup one tablespace at a time?

All Oracle documentation (and mine, too!) recommends strongly that when performing
Operating System hot backups you should put one tablespace into hot backup mode at a
time, copy its files, and take it out of hot backup mode before moving on to the next
tablespace, and repeating the procedure.

The reason for this recommendation is simple: the ‘begin backup’ command has profound
effects on the amount of redo generated by the database (in plain English, it sends the
amount of redo generated by even the smallest transaction through the roof).

The trouble starts when you consider that an Oracle block is typically made up of several
Operating System blocks. Commonly, O/S blocks are 512 bytes in size –so your typical
Oracle block of 8K is made up of 16 O/S blocks.

When you perform an Operating System-based backup, the disk head is copying Operating
System blocks, oblivious of the fact that each O/S block it encounters has a relationship
with (say) 15 others. Given that we are talking about hot backups, those 15 others could
be having changes made to them whilst we are copying the first O/S block. The Operating
System doesn’t care about this, but Oracle most certainly does –because we now have part
of an Oracle block copied with its contents at time X, whilst the remainder of the block
has moved on to time Y. By the time the O/S has finished copying all 16 O/S blocks, it’s
clear that the copies are likely to all be all at different times from each other. Imagine
now trying to put all 16 back together again, and making a sensible Oracle block out of
them.

Put simply, it can’t be done … unless you snapshot the entire Oracle block every time a
part of the block changes. That way, we’d end up with a number of different snapshots,
the last one of which could be used as a picture of what the entire set of O/S blocks, when
put together to create an Oracle block, should look like.

This problem is known as ‘block fracturing’, and the way to get around it is to generate
complete Oracle block snapshots in the redo stream every time any part of the Oracle
block changes.

This is precisely what the ‘alter tablespace … begin backup’ command does for us. It
instructs Oracle to generate block-level snapshots of any block that is altered by a piece of
DML during the Operating System backup, and that snapshot is stored in the Redo Log
Buffer, and from thence, in the Online Redo Logs –and eventually, of course, in the
Archived Redo Logs.

This is in stark contrast to what the Redo system ordinarily stores. If you update a record,
the Redo system usually only records the before and after version of the actual change you
make. If you change Bob’s salary, for example, the Redo would comprise the rowid for

Copyright © Howard Rogers 2001 20/10/2001 Page 1 of 3


Bob’s record together with his original salary, swiftly followed by the same rowid and the
new salary. There is some additional housekeeping information stored, too, naturally –but
at the end of the update, you’d have generated around 40 or 50 bytes of redo information.

But in hot backup mode, the very same transaction is required to generate an entire block
snapshot –on a typical Unix box, that would be 8192 bytes of redo, not a mere 40 or 50.

Now, I’m simplifying a bit here: on a recent test, it took me 8 updates of a person’s salary
to induce a log switch on a 4K-block system whilst in hot backup mode (the logs were 16K
in size, so that suggest around 2K per transaction, or half a block). The same update when
NOT in hot backup mode had to be performed 14 times before the log switched (which
suggests around 1K per update). However: that’s still double the amount of redo for the
identical transaction when in hot backup mode compared to when not in hot backup mode.

Now, if you accept that the smallest transactions now generate at least double the amount
of redo, consider making that happen across the entire database! (And if you were to
place all tablespaces into hot backup mode simultaneously, that’s exactly what you’d be
doing).

What does doubling the amount of redo produced by the database mean in practice? Well,
for a start, LGWR is going to be flushing the Redo Log Buffer more frequently (because it
will hit the 1/3rd full or 1Mb of uncommitted redo limits more rapidly). That means redo
logs get filled up more quickly –which means log switches happen more frequently. Given
that a full database checkpoint is issued at every log switch, that means that the rate of
checkpointing goes up –and since a checkpoint is one big I/O-fest, you can confidently
expect general database performance to go down in about equal proportion.

What’s worse, you will be cycling through the available Online Redo Log groups at a rapid
rate of knots. You’d better pray that ARCH can keep up with his archiving activities,
otherwise you will soon end up looping back to a Log which has not yet been archived –at
which point the entire database will hang until ARCH catches up.

Some people get away with it: their Redo sub-system is geared up to cope with a doubling
of the amount of redo generated by the entire database, and the flood of redo induced by
a string of successive ‘begin backups’ is therefore not overwhelming enough for them to
worry about it. You’d have to weigh that up in the light of your own circumstances before
signing up to that approach, though.

There’s one final factor to consider. The ‘begin backup’ command has one other effect on
the database: it locks the SCN stored in the header of each of the database’s Data Files.
That’s so we can tell the earliest point from which to start applying redo in the event of a
recovery being required.

If you take the recommended approach of one tablespace at a time, then each Data File
backup copy will have a different SCN in its header. That’s not a problem for performing



recovery, provided you’ve retained all archives generated since the time the entire backup
started. But it does mean that a restore of a file which was copied late in the backup
cycle (say, 3 hours after the very first one was backed up) will require significantly less
redo to bring it up-to-date than one copied early on in the piece.

Now consider what it would look like if you issued a string of consecutive ‘begin backup’
commands, and then embarked on a mass copy of all Data Files which took (say) 3 hours to
complete. Every file would have more or less the same SCN stored in its header –and
hence, all files would require more or less the same amount of redo to be applied in the
event of a recovery. In practice, Oracle would have to run through 3 hours’-worth of
Archives, regardless of what file it was being recovered.

With some of the sizes of databases I’ve seen lately, the time taken to do the physical
copying of Data Files during a backup is not insignificant –and hence this business of being
forced to trawl through masses of unnecessary Archived Redo Logs is not insignificant
either. It’s therefore an awfully inefficient way of doing things, and if you follow the
recommended approach, there’s zero need for it in the first place.

For a small-ish database with a moderate amount of activity, the simplicity of throwing the
entire database into hot backup mode and then just doing a ‘copy *.*’ may well outweigh
the potential disadvantages I’ve explained here (always provided your Redo sub-system has
been configured to cope). But when the process of actually copying your Data Files starts
taking a reasonable length of time, and/or when the DML activity on the database is
significant, it’s probably going to be best to stick with the standard advice: one tablespace
at a time!


Onetablespace

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de oracle documents

Mais de oracle documents (20)

Onetablespace