SYSTEM RECOVERY
- The system must be prepared to recover, not only from purely local failures such as the occurrence of an overflow condition within' an individual transaction, but also from "global" failures such as power outage.
- A local failure, by definition, affects only the transaction in which the failure has actually occurred; such failures have already been discussed, A global failure, by contrasts, affects all of the transactions in progress at the time of the failure, and hence has significant system-wide implications.
- Such failures fall into two broad categories:
- System failures (e.9., power outage), which affect all transactions currently in progress but do not Physically damage the database. A system failure is sometimes called a soft crash.
- Media failures (e.9., head crash on the disk), which do cause damage to the database, or to some portion of . it, and affect at least those transactions currently using that portion A media failure is sometimes called a hard crash.
System Failures
- The key point regarding system failure is that the contents of main memory are lost (in particular, the database buffers are lost).
- The precise state of any transaction that was in progress at the time of the failure is therefore no longer known; such a transaction can therefore never be successfully completed, and so must be undone - i.e., rolled back - when the system restarts.
- Furthermore it might also be necessary to redo certain transactions at restatrt time that did successfully complete prior to the crash but did not manage to get their updates transferred froth the database buffers to the physical database.
- The obvious question therefore arises: How does the system know at restart time which transactions do undo and which do redo? The answer is as follows. At certain prescribed intervals - typically whenever some prescribed numbers of entries have been written to the log - the system automatically takes a checkpoint.
- Taking a checkpoint involves (a) physically writing ("force writing") the contents of the database buffers out to the physical database, and (b) physically writing a special checkpoint record out to physical log. The checkpoint record gives a list of all transactions that were in progress at the time the checkpoint was taken. To see how this information is used, consider the figure, which is read as follows:
- A system failure has occurred at time TF.
- The most recent checkpoint prior to time if was taken at time TC.
- Transactions of type I/ completed (successfully) prior to time TC.
- Transactions of type T2 started prior to time TC and completed successfully (after time TC and before time TF.)
- Transactions of type 13 also started prior to time TC but did not complete by time TF,
- Transactions of type 14 started after time TC and completed (successfully) before time TF.
- Finally transactions of type 15 also started after time TC but did not complete by time TF.
- lt should be clear that, when the system is restarted, transactions of types T3 and T5 must be undone, and transactions of types T2 and 14 must be redone.
- Note, however, that transactions of type T1 do not enter into the restart process at all, because their updates were forced to the database at time TC as part of the checkpoint process.
- Note too that transactions- that complete unsuccessfully (i.e., with a rollback) before time if also do not enter into the restart process at all.
- At restart times therefore, the system first goes through the following procedure in order to identify all transactions of types T2- 75.
Start with the two lists of transactions, the UNDO list and the REDO list. Set the UNDO list equal to the list of all transactions given in the in the most recent checkpoint record set the REDO list to empty.
Search forward through the log, starting from the checkpoint record;
If a BEGIN TRANSACTION log entry is found for transaction I move f from the UNDO list
If a COMMIT log entry is found for transaction T, move T from the UNDO list to REDO list
When the end of the log is reached, the UNDO and REDO lists identify, respectively, transactions of types T3 and T5 and transactions of types T2 and 74.
The system now works backward through the log, undoing the transactions in the UNDO list; then it works forward again, redoing the transactions in the REDO-list.
Note:-Restoring the database to a consistent state by undoing work is sometimes called backward recovery. Similarly, restoring it -to a-consistent state by redoing work is sometimes called forward recovery.
Finally when all such recovery activity is complete, then (and only then) the system is ready to accept new work.
MEDIA RECOVERY
- The topic of media recovery is somewhat different in kind from the topics of transaction and system recovery.
- A media failure is a failure such as a disk head crash, or a disk controller failure, in which some portion of the database has been physically destroyed.
- Recovery from such a failure basically involves reloading (or restoring) the database from a backup copy (or dump), and then using the - both active and archive portions.
- The need to be able to perform media recovery implies the need for a dump /restore (or unload reload) utility. The dump portion of that utility is used to make backup copies of the database on demand. (Such copies can be kept on a tape or other archival storage; it is not necessary that they be on direct access media-hard disk.) After a media failure, the restore portion of the utility is used to recreate the database from a specified backup copy.
No comments:
Post a Comment