read by other session导致oracle性能低

问题背景:客户反应oracle库很慢很慢   (read by other session可以结合db file sequential read等待事件一块优化)

 

1检查等待事件:

 

 
1 set linesize 200 
2 col username for a15 
3 col event for a35 
4 col program for a20
5 col cpu_p for 99.99
6  select ta.*, round(ta.cpu_time / tb.total_cpu * 100, 1) cpu_usage from (select s.username, s.program, s.event, s.sql_id, sum(trunc(m.cpu)) cpu_time, count(*) sum from v$sessmetric m, v$session s where (m.physical_reads > 100 or m.cpu > 100 or m.logical_reads > 100) and m.session_id = s.sid and m.session_serial_num = s.serial# and s.status = 'ACTIVE' and username is not null group by s.username, s.program, s.event, s.sql_id order by 5 desc) ta, (select sum(cpu) total_cpu from v$sessmetric) tb where rownum < 11;
 

 

1 select event,count(1) from  v$session_wait group by event order by  2 desc;

发现 read by other session 排第一。

 

2找到read by other session的SQL,同时可以取一个AWR报告看看TOP SQL,都指向同一SQL。

 

 
 1 select sid,
 2        s.username,
 3        s.program,
 4        s.action,
 5        logon_time,
 6        q.sql_text,
 7        q.SQL_FULLTEXT,
 8        q.sql_id
 9   from v$session s
10   left join v$sql q on s.sql_hash_value = q.hash_value
11  where s.sid in (select sid
12                    from v$session_wait
13                   where event in ('read by other session'));
 

 

3、执行一下SQL,看看SQL是执行计划。

    但是SQL很明显是走了一个错误的索引。

 
1 select count(*) as pageno
2   from table1
3  where targetid = :"SYS_B_0"
4    and msgId in
5        (select msgId from table2 where userId = :"SYS_B_1")
6    and classname not in (:"SYS_B_2", :"SYS_B_3", :"SYS_B_4")
7    and dateTime
 

 

4、错误的执行计划很可能是表统计信息不准确。经查询,果然是表2统计信息不准确。收集统计信息或者加hint解决问题。

1     execute dbms_stats.gather_table_stats(ownname => 'owner', tabname =>
2     'table2', estimate_percent =>
3     DBMS_STATS.AUTO_SAMPLE_SIZE, method_opt => 'FOR ALL COLUMNS SIZE
4     AUTO', cascade => TRUE);

 

备注:

read by other session这个等待事件其实是oracle IO问题一个比较常见的场景,会话a在进行把磁盘上的数据块读到内存(data buffer cache)中这个操作,

会话b,会话c 同时也请求这个数据块。因为会话a还完全读入内存(data buffer cache),就导致了b,c read by other session。所以会话a一般是db file sequential read  或 db file scattered read。

​也是一种热块现象。

 

当出现该问题如何解决?

一般出现该问题是由于sql导致的,或者是由于磁盘设备可能导致。

 

当出现该问题的时候,首先需要定位sql。

方法一:通过ash获得细粒度的报告,查看top sql statement 获得sql。

方法二:通过sql语句直接获得:

 

1、当前正在发生的问题:

1 select sql_fulltext from v$sql a,v$session b where a.sql_id=b.sql_id and b.event='read by other session';

 

2、历史曾经发生的

1 select a.sql_id,sql_fulltext from v$sql a,dba_hist_active_sess_history b where a.sql_id=b.sql_id and b.event='read by other session';

往往read by other session伴随着db file sequential read事件的出现。

 

另外可以查看涉及对象信息,此处就是p1,p2,p3

1 SELECT p1 "file#", p2 "block#", p3 "class#"
2 FROM v$session_wait WHERE event = 'read by other session';

 

通过p1,p2,p3获得热点对象:

1 SELECT relative_fno, owner, segment_name, segment_type FROM dba_extents
2 WHERE file_id = &file
3 AND &block BETWEEN block_id AND block_id + blocks - 1;
4 
5  

另外,也可以 直接查看热点块的信息,如查看热点块导致的sql语句:

 
 1 select sql_text
 2 from v$sqltext a,
 3 (select distinct a.owner, a.segment_name, a.segment_type
 4 from dba_extents a,
 5 (select dbarfil, dbablk
 6 from (select dbarfil, dbablk from x$bh order by tch desc)
 7 where rownum < 11) b
 8 where a.RELATIVE_FNO = b.dbarfil
 9 and a.BLOCK_ID <= b.dbablk
10 and a.block_id + a.blocks > b.dbablk) b
11 where a.sql_text like '%' || b.segment_name || '%'
12 and b.segment_type = 'TABLE'
13 order by a.hash_value, a.address, a.piece;
 

查看热点块对象:

 
 1 SELECT E.OWNER, E.SEGMENT_NAME, E.SEGMENT_TYPE
 2 FROM DBA_EXTENTS E,
 3 (SELECT *
 4 FROM (SELECT ADDR, TS#, FILE#, DBARFIL, DBABLK, TCH
 5 FROM X$BH
 6 ORDER BY TCH DESC)
 7 WHERE ROWNUM < 11) B
 8 WHERE E.RELATIVE_FNO = B.DBARFIL
 9 AND E.BLOCK_ID <= B.DBABLK
10 AND E.BLOCK_ID + E.BLOCKS > B.DBABLK;
 

 

找到sql之后需要做的就是查看执行计划,判断问题所在,并进行优化。

 1、对于在shared pool存在的cursor可以通过如下命令查看执行计划

1 select * from table(dbms_xplan.display_cursor('sql_id',null,'allstats'));

2、对于历史可以通过查看awr信息获得:

1 select * from table(dbms_xplan.display_awr('sql_id'));

另外对于设备引起的需要查看磁盘读写信息,可以通过vmstat 2 200进行判断。