Killing the Oracle DBMS_JOB
James F. Koopmann, www.dbdoctor.net

源地址:http://www.quest-pipelines.com/newsletter-v4/0403_C.htm

Take control of Oracle's queue with a step by step approach to getting rid of those pesky DBMS_JOBs.

Let's face it, Oracle's job scheduling facility is a wonderful tool for scheduling Oracle related jobs without having to maintain a cron job on Unix or an AT job in windows. It is also very robust and reliable. It is that very reliability and robustness that gives many of us our problems.

If you have any form of jobs running on your system, you will at one time or another come across the issue of a run-away job that just doesn't seem to want to end. Or maybe you will try and shutdown the database only to find out that it is waiting to complete a job. I would like to offer some help in the management of those job queues when they just don't seem to want to end or Go away.

A while back I needed to find information on how to clear the job queue for jobs running with no apparent end in sight. Some had hung, while others just were taking a bad access path to data. I needed to bring down these jobs, do a bit of tuning and then restart the jobs. Well, to my amazement, there wasn't very much information out on the web that gave good insight into this process. Basically the method suggested was to first break the job and then issue an ALTER SYTEM KILL SESSION command. This method does not always work and unfortunately--never on my system, for the jobs I had. I then called Oracle support and basically got the same answer as I found out on the web. They did give me one added piece of information. They said, if the ALTER SYSTEM KILL SESSION didn't work, I was supposed to bounce my database in order to bring down the job queue processes. First of all, this wasn't an option and when I did get the opportunity to bounce the database box, many of the jobs seemed to come right back as strong as ever.

Before writing this article I did another quick search on the topic of killing dbms_jobs and to my amazement there still wasn't much good information out there. This is why I want to share my method, so that you won't be stuck up against the wall with this problem and nowhere to turn, as I was.

Viewing Scheduled dbms_jobs

When looking at what jobs have been scheduled, there is really only one view that you need to go to. The dba_jobs view contains all of the information you need, to see what has been scheduled, when they were last run, and if they are currently running. Use the following simple script to take a look. Bear with me on the sub-select, I will build on this query as we go on in the presentation.

scheduled_dbms_jobs.sql

 

set linesize 250
col log_user       for a10
col job            for 9999999   head 'Job'
col broken         for a1        head 'B'
col failures       for 99        head "fail"
col last_date      for a18       head 'Last|Date'
col this_date      for a18       head 'This|Date'
col next_date      for a18       head 'Next|Date'
col interval       for 9999.000 head 'Run|Interval'
col what           for a60

select j.log_user,
     j.job,
     j.broken,
     j.failures,
     j.last_date||':'||j.last_sec last_date,
     j.this_date||':'||j.this_sec this_date,
     j.next_date||':'||j.next_sec next_date,
     j.next_date - j.last_date interval,
     j.what
from (select dj.LOG_USER, dj.JOB, dj.BROKEN, dj.FAILURES,
             dj.LAST_DATE, dj.LAST_SEC, dj.THIS_DATE, dj.THIS_SEC,
             dj.NEXT_DATE, dj.NEXT_SEC, dj.INTERVAL, dj.WHAT
         from dba_jobs dj) j;

What Jobs are Actually Running

A simple join to the dba_jobs_running view will give us a good handle on the scheduled jobs that are actually running at this time. This is done by a simple join through the job number. The new column of interest returned here is the sid which is the identifier of the process that is currently executing the job.

running_jobs.sql

set linesize 250
col sid             for 9999     head 'Session|ID'
col log_user        for a10
col job             for 9999999 head 'Job'
col broken          for a1       head 'B'
col failures        for 99       head "fail"
col last_date       for a18      head 'Last|Date'
col this_date       for a18      head 'This|Date'
col next_date       for a18      head 'Next|Date'
col interval        for 9999.000 head 'Run|Interval'
col what            for a60
select j.sid,
         j.log_user,
         j.job,
         j.broken,
         j.failures,
         j.last_date||':'||j.last_sec last_date,
         j.this_date||':'||j.this_sec this_date,
         j.next_date||':'||j.next_sec next_date,
         j.next_date - j.last_date interval,
         j.what
from (select djr.SID,
                   dj.LOG_USER, dj.JOB, dj.BROKEN, dj.FAILURES,
                   dj.LAST_DATE, dj.LAST_SEC, dj.THIS_DATE, dj.THIS_SEC,
                   dj.NEXT_DATE, dj.NEXT_SEC, dj.INTERVAL, dj.WHAT
           from dba_jobs dj, dba_jobs_running djr
         where dj.job = djr.job ) j;

What Sessions are Running the Jobs

Now that we have determined which jobs are currently running, we need to find which Oracle session and operating system process is accessing them. This is done through first joining v$process to v$session by way of paddr and addr which is the address of the processs that owns the sessions, and then joining the results back to the jobs running through the sid value. The new columns returned in our query are spid which is the operating system process identifier and serial# which is the session serial number.

session_jobs.sql

set linesize 250
col sid           for 9999     head 'Session|ID'
col spid                       head 'O/S|Process|ID'
col serial#       for 9999999 head 'Session|Serial#'
col log_user      for a10
col job           for 9999999 head 'Job'
col broken        for a1       head 'B'
col failures      for 99       head "fail"
col last_date     for a18      head 'Last|Date'
col this_date     for a18      head 'This|Date'
col next_date     for a18      head 'Next|Date'
col interval      for 9999.000 head 'Run|Interval'
col what          for a60
select j.sid,
s.spid,
s.serial#,
       j.log_user,
       j.job,
       j.broken,
       j.failures,
       j.last_date||':'||j.last_sec last_date,
       j.this_date||':'||j.this_sec this_date,
       j.next_date||':'||j.next_sec next_date,
       j.next_date - j.last_date interval,
       j.what
from (select djr.SID,
             dj.LOG_USER, dj.JOB, dj.BROKEN, dj.FAILURES,
             dj.LAST_DATE, dj.LAST_SEC, dj.THIS_DATE, dj.THIS_SEC,
             dj.NEXT_DATE, dj.NEXT_SEC, dj.INTERVAL, dj.WHAT
        from dba_jobs dj, dba_jobs_running djr
       where dj.job = djr.job ) j,
     (select p.spid, s.sid, s.serial#
          from v$process p, v$session s
         where p.addr = s.paddr ) s
where j.sid = s.sid;

Now that we have a good handle on how we can look at the jobs and the key columns involved, let's go through the steps needed to bring down a job. The following is a 5 to 11 step process that should solve all of your problems.

Bringing Down a DBMS_JOB

1. Find the Job You Want to Bring Down

In order to do anything you first need to find the job that is giving you a headache. Go ahead and run the running_jobs.sql. This will give you the prime information, job, sid, serial#, and spid, for the following actions in bringing down the job.

2. Mark the DBMS_JOB as Broken

Use the following command for the job that you have to deal with.

SQL> EXEC DBMS_JOB.BROKEN(job#,TRUE);

All this command does is mark the job so that if we get it to stop, it won't start again. Let's make one thing perfectly clear, after executing this command the job is still running.

As a side note, if you are trying to shut down a database with jobs that run throughout the day, they may hinder your attempts to bring down the database cleanly. This is a wonderful command to make sure no jobs are executing during the shutdown process. Just be aware that you will need to mark the jobs as unbroken when the database comes back up, more on that later.

3. Kill the Oracle Session

Since the job is still running and it isn't going to end soon, you will need to kill the Oracle session that is executing the job. Use the following command for to kill the job.

ALTER SYSTEM KILL SESSION 'sid,serial#';

4. Kill the O/S Process

More often than not the previous step will still leave the job attached to the database and still running. When this happens you will need to go out to the operating system level and get rid of the process that has spawned from the running job. In order to do this you must login to the database box and issue the following command, depending on the type of operating system you have.

For Windows, at the DOS Prompt: orakill sid spid

For UNIX at the command line> kill '9 spid

The orakill is an Oracle command, while kill is a Unix command.

5. Check if the Job is Still Running

Re-run the session_jobs.sql script to see if you have gotten rid of the job. If you have there is no reason to go further. Usually steps 1 through 4 will be sufficient to get rid of a job but when the job is running wild you will have to continue with steps 6 through 11 which describes a process for bouncing the job queue process.

6. Determine the Current Number of Job Queue Processes

SQL> col value for a10
SQL> select name,value from v$parameter where name = 'job_queue_processes';

7. Alter the Job Queue to Zero

SQL> ALTER SYSTEM SET job_queue_processes = 0;

This will bring down the entire job queue processes.

8. Validate that No Processes are Using the Job Queue

Re-run the session_jobs.sql script to see if any jobs are still running. Since we have given a hard stop to the job queue and issued the kill commands, you can now wait until no more jobs are running. After all the jobs have quit running, you can do whatever maintenance or tuning you need to do before proceeding.

9. Mark the DBMS_JOB as Not Broken

You can now reset the broken job to not broken so they can run again. Just issue the command.

SQL>EXEC DBMS_JOB.BROKEN(job#,FALSE):

10. Alter the Job Queue to Original Value

Set the job queue to its' original value so that the jobs can run again.

ALTER SYSTEM SET job_queue_processes = original_value;

11. Validate that DBMS_JOB Is Running

To make sure everything is back to normal, re-run the above scripts to validate that jobs are scheduled, not broken, and are executing with the next and last dates columns changing.

Oracle have given us a great tool for scheduling activities within the database. As with many things inside the database, not everything goes as planned, nor are we given adequate tools to fix some of the problems we encounter. With the eleven steps outlined here, hopefully you will have increased your arsenal to handle those run away jobs that have given the best of us a few tense moments.

--------------------------------------------------------------------------------------------------------------------------

注意:第四步 删除操作系统层次的进程或线程操作,Oracle在Linux/Unix中后台进程是以进程方式运作,在Windows系列操作系统中是以线程方式运作。

对于Windows,启动命令提示符:

orakill sid spid

其中sid是数据库实例的名称,而不是前面脚本中查出来的sid。

可以通过一下命令来得到:

select name from v$database;

select instance_name from v$instance;