cache buffers chains锁存器的争用原因一:低效率的SQL语句
在某些环境中,应用程序打开执行相同的低效率SQL语句的多个并发会话,这些SQL语句都设法得到相同的数据集。较小的逻辑读意味着较少的latch get操作,从而减少锁存器争用并改善性能。
每次执行都带有高 BUFFER_GETS(逻辑读取)的SQL语句是主要的原因。
>> cache buffers chains锁存器的争用原因二:热块
当多个会话重复访问一个或多个由同一个子cache buffers chains锁存器保护的块时,热块就会产生。当多个会话争用cache buffers chains锁存器时,找出是否有热块的最好的方法是检查latch free等待事件的P1RAW参数值。
XXXX数据库优化调整建议
运行现象和故障描述
10月22日中午之后,XXXX数据库系统性能急剧恶化,出现大量的并发性冲突,表现为等待大量的latch free,latch: cache buffer chain以及latch: library cache 。操作系统CPU消耗达到100%,系统性能响应极为缓慢。
下面是运行期间的AWR报告:
Per Second |
Per Transaction |
|
Redo size: |
114,459.47 |
6,101.89 |
Logical reads: |
488,049.46 |
26,018.15 |
Block changes: |
444.92 |
23.72 |
Physical reads: |
2,637.96 |
140.63 |
Physical writes: |
507.44 |
27.05 |
User calls: |
616.09 |
32.84 |
Parses: |
388.48 |
20.71 |
Hard parses: |
14.67 |
0.78 |
Sorts: |
154.18 |
8.22 |
Logons: |
5.88 |
0.31 |
Executes: |
651.57 |
34.74 |
Transactions: |
18.76 |
|
% Blocks changed per Read: |
0.09 |
Recursive Call %: |
69.88 |
Rollback per transaction %: |
2.05 |
Rows per Sort: |
126.60 |
Instance Efficiency Percentages (Target 100%)
Buffer Nowait %: |
100.00 |
Redo NoWait %: |
100.00 |
Buffer Hit %: |
99.55 |
In-memory Sort %: |
100.00 |
Library Hit %: |
98.80 |
Soft Parse %: |
96.22 |
Execute to Parse %: |
40.38 |
Latch Hit %: |
88.19 |
Parse CPU to Parse Elapsd %: |
7.39 |
% Non-Parse CPU: |
97.93 |
Shared Pool Statistics
|
Begin |
End |
Memory Usage %: |
85.74 |
88.10 |
% SQL with executions>1: |
42.86 |
44.45 |
% Memory for SQL w/exec>1: |
70.86 |
78.28 |
Top 5 Timed Events
Event |
Waits |
Time(s) |
Avg Wait(ms) |
% Total Call Time |
Wait Class |
latch: cache buffers chains |
385,694 |
58,616 |
152 |
25.5 |
Concurrency |
latch free |
151,833 |
32,221 |
212 |
14.0 |
Other |
CPU time |
|
13,749 |
|
6.0 |
|
latch: library cache |
45,698 |
3,226 |
71 |
1.4 |
Concurrency |
direct path write temp |
238,321 |
2,299 |
10 |
1.0 |
User I/O |
从latch信息可以看到:
cache buffers chains 2,498,511,076 13.36 0.00 58671 21,510,689 0.40
cache buffers lru chain 378,882 0.55 0.08 9 22,568,763 1.07
library cache 34,822,373 0.59 0.22 3226 194,268 1.79
row cache objects 147,765,275 2.52 0.00 84 2,277 0.04
simulator lru latch 46,830,334 1.20 0.27 32216 2,168,754 0.16
这些相关的latch冲突比较高,尤其对于cache buffer chain和simulator lru latch冲突几乎频繁发生,并且获取时间很长,通过大量spin消耗CPU时间。
以下表格消耗大量CPU并且引发LATCH冲突:
Segments by Logical Reads
- Total Logical Reads: 1,758,696,980
- Captured Segments account for 94.8% of Total
Owner |
Tablespace Name |
Object Name |
Subobject Name |
Obj. Type |
Logical Reads |
%Total |
NBAIC |
GSNEW |
PK_BM_GLWG |
|
INDEX |
996,334,272 |
56.65 |
NBAIC |
PARTITION_TBS |
HK_JD |
HK_JD_P12 |
TABLE PARTITION |
270,731,328 |
15.39 |
NBAIC |
GSNEW |
HK_SSJH_HK |
|
TABLE |
159,548,544 |
9.07 |
WLJJJG |
GSNEW |
WL_SSJG |
|
TABLE |
58,100,944 |
3.30 |
NBAIC |
GSNEW |
SYS_SPLC |
|
TABLE |
19,291,264 |
1.10 |
以下SQL语句消耗大量CPU并且引发LATCH FREE,消耗CPU资源。
另外从SQL信息可以看到引发大量cache buffer chain以及simulator lru latch冲突几乎都由相同的SQL语句引起,这些语句极度消耗CPU资源,从而导致系统响应缓慢。
Select * from (select row_.*, rownum rownum_ from (select a.xh, a.jhmc, a.kssj, a.jssj, a.fbr, a.fbsj, a.zrr, a.jhnr, a.gxdw, (select count(b.nbxh) from hk_ssjh_hk b, hk_jd m, bm_glwg n where a.xh=b.ssjhxh and b.wcbj=’0′ and b.nbxh=m.nbxh and m.glwg = n.co
类似的语句几乎消耗了系统80%以上的CPU资源,这些语句同样引起了大量buffer cache冲突。
该语句写法存在问题,存在着大量过多的cache访问,频繁进行latch锁定,从而导致系统故障。
故障诊断:
从以上现象和数据中,SQL语句是引起性能问题的最主要原因。高消耗语句的高并发导致系统资源极度紧张,从而引发一些Oracle在高负载下面的Oracle bug,使业务系统性能更加缓慢。
Oracle在10.2.0.3下面的相关bug:
Latch Contention
6455161+ |
Higher CPU / Higher “cache buffer chains” latch gets / Higher “consistent gets” after truncate/Rebuild |
4691237 |
High “library cache” latch gets from SQL using objects in PLSQL |
5749075 |
High Requests on dc_rollback_segments. latch / US enqueue contention |
5918642 |
Heavy latch contention with DB_CACHE_ADVICE on |
6333663 |
Shared pool latch contention due to fragmentation of large pool |
6356566 |
Memory leak / high CPU selecting from V$SQL_PLAN (affects statspack) |
解决方案:
(一)、简单控制该类语句并发规模
该方法简单可靠,可以迅速实施。
(二)、系统优化方案和步骤
(1)、修改参数
关闭缓存建议:
Alter system set db_cache_advice=off scope=both
关闭内存自动调整:
Alter system set db_cache_size=3g scope=both
Alter system set shared_pool_size=2g
Alter system set sga_target=0 scope=both
调整dbwr参数
Alter system set db_writer_processes = 8 scope=both
修改一些内部参数:
Alter system set “_cursor_plan_unparse_enabled” = false scope=both
alter system set "_smu_debug_mode"=33554432 scope=both;
以上参数都可以在线修改。
(2)、检查是否存在truncate或者rebuild的索引
select owner||'.'||Object_name||'.'||subobject_name
from dba_objects
where object_id<>da
and object_type like 'INDEX%'
order by owner,object_name,subobject_name;
如果存在这些索引,删除索引并且重新创建这些索引。
(3)、修正SQL语句
Select * from (select row_.*, rownum rownum_ from (select a.xh, a.jhmc, a.kssj, a.jssj, a.fbr, a.fbsj, a.zrr, a.jhnr, a.gxdw, (select count(b.nbxh) from hk_ssjh_hk b, hk_jd m, bm_glwg n where a.xh=b.ssjhxh and b.wcbj=’0′ and b.nbxh=m.nbxh and m.glwg = n.co
主要该语句引起,需要选择另外一种写法。
(4)、调度时间应用Oracle patchset。
在合适的时间段给Oracle应用patchset。
总结:
虽然oracle 10.2.0.3中存在类似的bug不少,但是引发的根本在SQL语句效率低下引起。