事件介绍
read by other session 是在 Oracle 10g (10.1.0.2 and later) 新引入的一个等待事件,在 10g 以前版本,等待为 buffer busy waits,10g以后做的细分,所以才有了 read by other session。
Oracle官方解释如下:
This event occurs when a session requests a buffer that is currently being read into the buffer cache by another session. Prior to release 10.1, waits for this event were grouped with the other reasons for waiting for buffers under the 'buffer busy wait' event。
当b会话要访问buffer cache并且pin住相应的block的时候,它发现a会话也需要这些数据块,并且正在将相关的数据块从硬盘读取到buffer cache中,这个时候b会话必须等待a会话读取完成,b会话此时出现read by other session等待事件。
问题分析
该事件的P1,P2,P3参数可以通过查询V$SESSION / V$SESSION_WAIT视图获取:
P1: 数据文件号
P2: 正在读取的数据块号
P3:读取的数据块类型
类型=1表示数据块
类型=4表示数据段头
类型>15表示undo块
通过以下查询确认热块信息
SELECT relative_fno, owner, segment_name, segment_type
FROM dba_extents
WHERE file_id = &file
AND &block BETWEEN block_id AND block_id + blocks - 1;
问题处理
通常情况下出现该等待事件意味着存在热块争用,可以通过以下方式消除争用:
1、优化效率低的SQL。
查看对应SQL的执行计划是否最优,必要时可以通过 DBMS_SQLTUNE 包迚行优化,通过SQL_PROFILE 文件稳固执行计划。
2、重新打散热块数据分布。
删除和重新插入热块行通常会将其移动到新的数据块。这将有助于减少热块争用并提高性能。
3、优化索引。
4、调整PCTFREE和PCTUSED的值。
案例1:
问题原因
1,先检查数据库等待事件
select event,count(*) from dba_hist_active_sess_history where sample_time<=to_date('20230414 11:35:00','yyyymmdd hh24:mi:ss') and sample_time<=to_date('20230414 11:40:00','yyyymmdd hh24:mi:ss')
group by event order by 2;
EVENT COUNT(*)
----------------------------------- ----------
......
free buffer waits 150
log file sync 189
local write wait 194
db file sequential read 376
db file scattered read 7704
read by other session 1889233 rows selected.
read by other session等待事件在5分钟采样接近2万,还伴有db file scattered read,db file sequential read 等待事件,基本可以断定是全表扫描引起的热快,下面看看具体的SQL。
2,分析read by other session等待的对象
select event,p1,p2,p3 from dba_hist_active_sess_history where sample_time<=to_date('20230414 11:35:00','yyyymmdd hh24:mi:ss') and sample_time<=to_date('20230414 11:40:00','yyyymmdd hh24:mi:ss')
and event='read by other session';
EVENT P1 P2 P3
---------------------------------------------------------------- ---------- ---------- ----------
read by other session 10 1920307 1
read by other session 19 204354 1
read by other session 19 204354 1
read by other session 13 2453648 1
read by other session 13 2453648 1
read by other session 23 332761 1
read by other session 23 332761 1
read by other session 16 1422792 1
read by other session 7 27338 1
read by other session 24 142105 1
read by other session 10 427245 1
read by other session 10 427245 1
read by other session 33 222105 1
read by other session 33 222105 1
......
read by other session 23 81025 1
read by other session 12 225064 1
read by other session 32 346957 1
read by other session 32 346957 1
read by other session 8 193194 1
read by other session 8 193194 1
read by other session 10 293012 1
read by other session 10 293012 1
read by other session 6 552737 1
read by other session 12 1916222 1
read by other session 19 266305 1
read by other session 19 266305 1
read by other session 15 1446872 1
read by other session 15 1446872 1
read by other session 18 1887272 1
read by other session 18 1887272 1
read by other session 16 632213 1
read by other session 23 1236914 1
read by other session 23 1236914 1
read by other session 15 241677 1
read by other session 15 241677 1
read by other session 33 226596 1
read by other session 33 226596 1
read by other session 10 285870 1
read by other session 10 285870 1
18892 rows selected.
Parameters含义:
P1 = file# Absolute File# (AFN)
P2 = block#
P3 = class# Block class
file# Absolute File Number (AFN) This is the file number of the data file that contains the block that the waiting session wants.
block# This is the block number in the above file# that the waiting session wants access to. See Note:181306.1 to determine the tablespace, filename and object for this file#,block# pair.
class# Block class#
This is the class of block being waited on. In particular:
class 1 indicates a "data block", which could be table or index
class 4 indicates a "segment header"
class >=15 indicate undo blocks
查看等待的对象,都是一张表DS_IVMS_PER_ACC_FS_DAILY
select relative_fno,owner,segment_name,segment_type from dba_extents where file_id=&FILE_ID AND
&BLOCK between block_id and block_id+blocks-1;
Enter value for file_id: 10
old 1: select relative_fno,owner,segment_name,segment_type from dba_extents where file_id=&FILE_ID AND
new 1: select relative_fno,owner,segment_name,segment_type from dba_extents where file_id=10 AND
Enter value for block: 285870
old 2: &BLOCK between block_id and block_id+blocks-1
new 2: 285870 between block_id and block_id+blocks-1
RELATIVE_FNO OWNER SEGMENT_NAME SEGMENT_TYPE
------------ ------------------------------ --------------------------------------------------------------------------------- ------------------
10 QDCCB DS_IVMS_PER_ACC_FS_DAILY TABLE
Enter value for file_id: 33
old 1: select relative_fno,owner,segment_name,segment_type from dba_extents where file_id=&FILE_ID AND
new 1: select relative_fno,owner,segment_name,segment_type from dba_extents where file_id=33 AND
Enter value for block: 226596
old 2: &BLOCK between block_id and block_id+blocks-1
new 2: 226596 between block_id and block_id+blocks-1
RELATIVE_FNO OWNER SEGMENT_NAME SEGMENT_TYPE
------------ ------------------------------ --------------------------------------------------------------------------------- ------------------
33 QDCCB DS_IVMS_PER_ACC_FS_DAILY TABLE
Enter value for file_id: 12
old 1: select relative_fno,owner,segment_name,segment_type from dba_extents where file_id=&FILE_ID AND
new 1: select relative_fno,owner,segment_name,segment_type from dba_extents where file_id=12 AND
Enter value for block: 225064
old 2: &BLOCK between block_id and block_id+blocks-1
new 2: 225064 between block_id and block_id+blocks-1
RELATIVE_FNO OWNER SEGMENT_NAME SEGMENT_TYPE
------------ ------------------------------ --------------------------------------------------------------------------------- ------------------
12 QDCCB DS_IVMS_PER_ACC_FS_DAILY TABLE
Enter value for file_id: 19
old 1: select relative_fno,owner,segment_name,segment_type from dba_extents where file_id=&FILE_ID AND
new 1: select relative_fno,owner,segment_name,segment_type from dba_extents where file_id=19 AND
Enter value for block: 3842853
old 2: &BLOCK between block_id and block_id+blocks-1
new 2: 3842853 between block_id and block_id+blocks-1
RELATIVE_FNO OWNER SEGMENT_NAME SEGMENT_TYPE
------------ ------------------------------ --------------------------------------------------------------------------------- ------------------
19 QDCCB DS_IVMS_PER_ACC_FS_DAILY TABLE
3,等待事件对应的SQL
select sql_id,count(*) from dba_hist_active_sess_history
where sample_time<=to_date('20230414 11:35:00','yyyymmdd hh24:mi:ss') and sample_time<=to_date('20230414 11:40:00','yyyymmdd hh24:mi:ss')
and event='read by other session'
group by sql_id order by 2;
SQL_ID COUNT(*)
------------- ----------
......
7krzad14cv4q5 20
gmrzbhqdzt7jg 20
1s4339s2yd7m2 20
gvux4r6f9f89z 21
a8s1thcg5src2 21
05qzpt8nnxx0k 22
48crvx6t8cp0s 22
0yvqj946319sk 22
75b07699xdzsx 23
fwcm3q06mbbhb 23
93rhmdjfa6y1b 23
6w3d0rcvbktm4 23
cfuvs5zvg1jhr 24SQL_ID COUNT(*)
------------- ----------
gmnqxht7fahpr 24
cf5jmbg5vqqzz 24
2jpcutpwb3cma 25
2sr0qb9ypfzhr 25
4kjt7k4wskzst 26
5fb7d172wz24u 29
5cbphanftq28g 30
1h2y77s00tnac 31
2857 rows selected.
4,查看SQL的执行计划
分析SQL,上面的SQL都是同一类SQL,看两个SQL的执行计划,没有使用绑定变量,并且都是全表扫描,该表大小为42G,表上没有索引
SQL_ID gmrzbhqdzt7jg
--------------------
(select '4' acctype,fd.tx_dt,fd.tx_tm,fd.confirm_dt,fd.txtyp_nm,fd.prodtyp_nm,fd.tx_amt
,fd.n_cfmvol1,fd.w_txprc,fd.w_tcost from ds_ivms_per_acc_fs_daily fd where fd.cif_org
='8100118907' and fd.tx_dt between date '2023-03-01' and date '2023-03-31') order by
tx_dt,tx_tmPlan hash value: 1076392112
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1122K(100)| |
| 1 | SORT ORDER BY | | 1 | 78 | 1122K (1)| 03:44:27 |
| 2 | TABLE ACCESS FULL| DS_IVMS_PER_ACC_FS_DAILY | 1 | 78 | 1122K (1)| 03:44:27 |
-----------------------------------------------------------------------------------------------
SQL_ID 1h2y77s00tnac
--------------------
(select '4' acctype,fd.tx_dt,fd.tx_tm,fd.confirm_dt,fd.txtyp_nm,fd.prodtyp_nm,fd.tx_amt
,fd.n_cfmvol1,fd.w_txprc,fd.w_tcost from ds_ivms_per_acc_fs_daily fd where fd.cif_org
='8200036136' and fd.tx_dt between date '2023-03-01' and date '2023-03-31') order by
tx_dt,tx_tmPlan hash value: 1076392112
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1122K(100)| |
| 1 | SORT ORDER BY | | 1 | 78 | 1122K (1)| 03:44:27 |
| 2 | TABLE ACCESS FULL| DS_IVMS_PER_ACC_FS_DAILY | 1 | 78 | 1122K (1)| 03:44:27 |
-----------------------------------------------------------------------------------------------
5,该表没有索引
SQL> select t.table_owner,t.table_name,t.column_name,t.index_owner,i.index_name,i.index_type from dba_ind_columns t,dba_indexes i
2 where t.index_name=i.index_name and t.table_owner=upper('&TAB_OWNER') and t.table_name=upper('&TAB_NAME');
Enter value for tab_owner: qdccb
Enter value for tab_name: DS_IVMS_PER_ACC_FS_DAILY
old 2: where t.index_name=i.index_name and t.table_owner=upper('&TAB_OWNER') and t.table_name=upper('&TAB_NAME')
new 2: where t.index_name=i.index_name and t.table_owner=upper('qdccb') and t.table_name=upper('DS_IVMS_PER_ACC_FS_DAILY')
6、表DS_IVMS_PER_ACC_FS_DAILY大小为42G
OWNER SEGMENT_NAME SIZE_G
------------------------------ --------------------------------------------------------------------------------- ----------
QDCCB DS_IVMS_PER_ACC_FS_DAILY 42
解决方案
1,将SQL改写使用绑定变量
2,创建索引
create index QDCCB.IDX$$_29EA60001 on
QDCCB.DS_IVMS_PER_ACC_FS_DAILY("CIF_ORG","TX_DT");
案例2:
问题概述
客户反映,其数据库在当天早晨业务出现异常,响应缓慢。同时,从监控中发现物理读和逻辑读访问量大幅上升。
问题原因
首先,我们获取了问题时段的AWR报告,如下图所示
如上图所示,我们可以看到在问题时段,数据库的每秒DB TIME的值为82.4,但该数据库所在主机上的逻辑CPU的数量仅为24颗,82.4远超24,表明该时段数据库已处于非常繁忙的状态。
其次,我们可以发现,数据库的每秒逻辑读和物理读,分别达到了约750K个块次和78K个块次。而该数据库的数据块的大小为8K。换算下来,其每秒逻辑读达到了6GB个字节,每秒物理读达到了近560MB字节。从经验上看,此值偏高。
而从TOP 5的前台等待事件上来看,IO方面的等待事件,占据了绝大部分。如下图所示:
其中排名第1的,且占据了全部时间约70%的等待事件为“read by other session”。该等待事件发生的场景如下:
当会话A需要访问某个数据块时,发生其并不在内存中,于是就去外部存储中读取并放入内存。但是,会话A发现在自己之前,正有一个会话,也在对同样的数据块,做着同样的操作,而且还没有完成。这时,会话A就会等待这个会话,把相应的数据块读入内存。其因此而产生的等待,就是“read by other session”。
而发生大量的该等待事件,通常是由于以下几个原因:
- 存在热点对象,且该热点对象不在内存中。
- SQL效率不佳,需要访问过多的数据,且相应SQL同一时期被多个会话执行。
- 存储设备IO性能不佳,导致读入内存的过程过长,加剧了这个等待事件发生的机率。
为了定位问题,我们又做了相应时段的ASH报告。
如上图所示,我们发现有多条SQL文本近似的SQL,发生了“read by other session”的等待事件。查看该类SQL的执行计划,发现其执行计划如下所示:
如上图所示,该SQL单次执行,需要337K个块次的逻辑读,约95K个物理读。按每个数据块8K大小计算。即逻辑读需要访问约2600MB的数据,物理读需要访问约760M的数据。但最终返回的数据只有29条。同时,我们也可以看到,这个执行计划虽然用到索引,但却是相对低效的索引skip scan的访问方式,且通过索引之后,仍然返回了大量的数据,通过后续的进一步处理,才降到29条数据。这说明,索引的选择性非常不好,其过滤出来大量的无用数据。
此外,通过对当天(周一)与上周一的同一时段,同一类SQL的执行次数的比较,我们发现故障当天,其执行的次数有66次,而上周一同一时段,则只有7次,增长了约8倍。
综上,我们可以初步认定,是由于在故障时段,突然增多的这类低效SQL,在数据库上的集中执行,导致产生了大量的逻辑读和物理读的开销。并且由于访问的数据重复,导致发生了大量“read by other session”的等待事件。
解决方案
综上所述,我们的建议如下:
- 优化该类SQL,提高其处理效率,避免访问大量的无效数据。因此,我们建议在目标表的适当列上创建一个适合的组合索引。
其他建议
该类SQL对WHERE子句中的条件值,未使用绑定变量的处理方法,造成每执行一次该类SQL,都要做一次硬解析,生成一次执行计划,虽然其新生成的执行计划与之前同类SQL是相同的。但由于条件值不同,导致SQL文本不尽相同,从而无法复用此前的执行计划。而硬解析操作,是资源密集型操作,对系统资源开销较大。
参考文档
“Read By Other Session” Wait Event (Doc ID 732891.1)
Troubleshooting I/O Related Waits (Doc ID 223117.1)