Backup and Recovery Issues [备份与恢复所关心的问题]

  • Protect the database from numerous types of failures
  • Increase Mean-Time-Between-Failures(MTBF) [平均无故障时间]
  • Decrease Mean-Time-To-Reconver(MTTR) [平均抢修时间]
  • Minimize data loss

Categories of Failures

  • Statement failure(sql 语句
  • Causes of Statement Failures
  • Logic error in an application
  • Attempt to enter invalid data into the table
  • Attempt an operation with insufficient privileges
  • Attempt to create a table but exceed allotted quota limits
  • Attempt an INSERT or UPDATE to a table,causing an extent to be allocated,but with insufficient free space available in the tablespace.
  • Resolutions
  • Correct the logcial flow of the program
  • Modify and reissue the SQL statement
  • Provide the necessary database privileges
  • Change the user`s quota limit by using the ALTER USER command
  • Add file space to the tablespace
  • Enable resumable space allocation  
  • User process failure(即:客户端程序错误)
  • Causes User process failure
  • The user performed an abnormal disconnect in the session
  • The user`s session was abnormally terminated
  • The user`s program raised an address exception,which terminated the session
  • Resolutions
  • The PMON process detects an abnormally terminated user process
  • PMON rolls back the transaction and releases any resources and locks being held by it  
  • User error
  • Cause User error
  • DROP TABLE employees;
  • TRUNCATE TABLE employees;
  • DELETE FROM employees;COMMIT;
  • UPDATE employees SET salary = salary * 1.5;COMMIT;
  • Resolution
  • Train the database user
  • Recover from a valid backup
  • Import the table from an export file
  • Use LogMiner to determin the time of error(LogMiner是Oracle提供的一项工具;)
  • Recover with a point-in-time recovery
  • Use LogMiner to perform object-level recovery
  • Use FlashBack to view and repair historical data(FlashBack是Oracle提供的一项工具;)
  • Network failure 
  • Instance failure
  • Instance的本质,就是一堆进程加若干个内存块,最大的内存块就是SGA
  • 最典型的Instance failure就是断电
  • SGA包含两大部分1、Data Buffer Cach;2、Redo Log Buffer Cash;
  • Redo Log File里面的内容时实时更新的,Data Filel里的内容时滞后更新的.
  • Cause User error
  • Recovery from Instance Failures
  • No special recovery action is needed from DBA
  • Start the instance
  • Wait for the "database opened" notification
  • Notify users
  • Check the alert log to determine the reason for the failure
  • SMON进程每次在Instance启动的时候,就会检测Data File与Redo Log File 是否是synchronized,如果不同步,说明需要将Redo Log File里面的内容更新到Data File里,也需要将Undo表空间里的数据同步更新到Data File里面,确保Data File与Redo Log File及Undo Tablespace里面的数据同步.对于Instance Failure来讲,DBA不需要进行手工干预.DBA需要做的就是重新启动Instance,然后等待Instance自我修复(nomounted->mounted->opened).这个过程是SMON进程在操作.
  • Media failure(磁盘、介质、磁带错误)
  • Causes of Media Failures
  • Head crash on a disk drive
  • Physical problem in reading from or writing to database files
  • File was accidentally erased
  • Resolutions
  • The recovery strategy depends on which backup method was chosen and which files are affected.
  • If available,apply archived redo log files to recover data committed since the last backup.
  • Defining a Strategy
  • Business requirements
  • Mean time to recover
  • Mean time between failure
  • Evolutionary process
  • Operational requirements
  • 24-Hour operations
  • Testing and validating backups
  • Database volatility
  • Technical considerations
  • Resources:hardware,software,manpower,and time
  • Physical image copies of the operating system files
  • Logical copies of the objects in the database
  • Database configuration
  • Transaction volume that affects desired frequency of backups
  • Disaster Recovery issues
  • How will your business be affected in the event of a major disater,such as:
  • Earthquake,flood,or file
  • Complete loss of machine
  • Malfunction of storage hardware of software
  • Loss of key personnel,for example the database administrator
  • Do you have a plan for testing your strategy periodically?
  • Management concurrence
  • Summary
  • In this lesson,you should have learned how to:
  • Evaluate potential failures in your enviroment
  • Develop a strategy dicated by business,operational,and technical requirements
  • Consider a test plan for a backup and recovery strategy