程序错误日志大量的报死锁错误,去数据库错误日志查看确实有很多死锁(应在数据库实例启动时执行dbcc traceon(1222,-1)开启死锁跟踪):
04/29/2016 14:07:51,spid33s,δ֪,waiter id=process71da6bb88 mode=IX requestType=wait
04/29/2016 14:07:51,spid33s,δ֪,waiter-list
04/29/2016 14:07:51,spid33s,δ֪,owner id=process5b534c8 mode=S
04/29/2016 14:07:51,spid33s,δ֪,owner-list
04/29/2016 14:07:51,spid33s,δ֪,pagelock fileid=1 pageid=22056721 dbid=5 objectname=ADCDB.dbo.NA_TrafficMemberOrderRelation id=lock1263c7180 mode=S associatedObjectId=72057595181989888
04/29/2016 14:07:51,spid33s,δ֪,waiter id=process5b534c8 mode=S requestType=wait
04/29/2016 14:07:51,spid33s,δ֪,waiter-list
04/29/2016 14:07:51,spid33s,δ֪,owner id=process71da6bb88 mode=IX
04/29/2016 14:07:51,spid33s,δ֪,owner-list
04/29/2016 14:07:51,spid33s,δ֪,pagelock fileid=1 pageid=22075507 dbid=5 objectname=ADCDB.dbo.NA_TrafficMemberOrderRelation id=lock115c85a80 mode=IX associatedObjectId=72057595181989888
04/29/2016 14:07:51,spid33s,δ֪,resource-list
04/29/2016 14:07:51,spid33s,δ֪,(@p0 int<c/>@p1 int<c/>@p2 int<c/>@p3 int<c/>@p4 nvarchar(4000)<c/>@p5 nvarchar(4000)<c/>@p6 nvarchar(4000)<c/>@p7 nvarchar(4000)<c/>@p8 nvarchar(4000)<c/>@p9 nvarchar(4000)<c/>@p10 decimal(28<c/>5)<c/>@p11 decimal(28<c/>5)<c/>@p12 nvarchar(4000)<c/>@p13 nvarchar(4000)<c/>@p14 datetime<c/>@p15 datetime<c/>@p16 int<c/>@p17 datetime<c/>@p18 datetime<c/>@p19 datetime<c/>@p20 int)INSERT INTO NA_TrafficMemberOrderRelation (ApplySource<c/> OrderMode<c/> OrderSource<c/> OrderState<c/> ECCode<c/> ECPrdCode<c/> UserMobile<c/> MemberOrderNum<c/> MainProductCode<c/> TrafficPackageCode<c/> BillPrice<c/> FlowValue<c/> MemberApplyCode<c/> CrmApplyCode<c/> EffectTime<c/> ExprieTime<c/> UserId<c/> CreateTime<c/> OrderTime<c/> ModifyTime<c/> DiscountRate) VALUES (@p0<c/> @p1<c/> @p2<c/> @p3<c/> @p4<c/> @p5<c/> @p6<c/> @p7<c/> @p8<c/> @p9<c/> @p10<c/> @p11<c/> @p12<c/> @p13<c/> @p14<c/> @p15<c/> @p16<c/> @p17<c/> @p18<c/> @p19<c/> @p20); select SCOPE_IDENTITY()
04/29/2016 14:07:51,spid33s,δ֪,inputbuf
04/29/2016 14:07:51,spid33s,δ֪,unknown
04/29/2016 14:07:51,spid33s,δ֪,frame procname=unknown line=1 sqlhandle=0x000000000000000000000000000000000000000000000000
04/29/2016 14:07:51,spid33s,δ֪,INSERT INTO NA_TrafficMemberOrderRelation (ApplySource<c/> OrderMode<c/> OrderSource<c/> OrderState<c/> ECCode<c/> ECPrdCode<c/> UserMobile<c/> MemberOrderNum<c/> MainProductCode<c/> TrafficPackageCode<c/> BillPrice<c/> FlowValue<c/> MemberApplyCode<c/> CrmApplyCode<c/> EffectTime<c/> ExprieTime<c/> UserId<c/> CreateTime<c/> OrderTime<c/> ModifyTime<c/> DiscountRate) VALUES (@p0<c/> @p1<c/> @p2<c/> @p3<c/> @p4<c/> @p5<c/> @p6<c/> @p7<c/> @p8<c/> @p9<c/> @p10<c/> @p11<c/> @p12<c/> @p13<c/> @p14<c/> @p15<c/> @p16<c/> @p17<c/> @p18<c/> @p19<c/> @p20);
04/29/2016 14:07:51,spid33s,δ֪,frame procname=adhoc line=1 stmtstart=626 stmtend=1494 sqlhandle=0x020000003e44d0396c586db1098e79558ceb28c112e75ea8
04/29/2016 14:07:51,spid33s,δ֪,executionStack
04/29/2016 14:07:51,spid33s,δ֪,process id=process71da6bb88 taskpriority=0 logused=2964 waitresource=PAGE: 5:1:22056721 waittime=3741 ownerId=43666495205 transactionname=user_transaction lasttranstarted=2016-04-29T14:07:45.920 XDES=0x3023d7950 lockMode=IX schedulerid=15 kpid=21872 status=suspended spid=518 sbid=0 ecid=0 priority=0 trancount=2 lastbatchstarted=2016-04-29T14:07:47.827 lastbatchcompleted=2016-04-29T14:07:47.823 clientapp=.Net SqlClient Data Provider hostname=ADC-INTFE hostpid=10192 loginname=daemon_adcuser isolationlevel=read committed (2) xactid=43666495205 currentdb=5 lockTimeout=4294967295 clientoption1=671088672 clientoption2=128056
04/29/2016 14:07:51,spid33s,δ֪,(@p0 int<c/>@p1 nvarchar(4000)<c/>@p2 nvarchar(4000)<c/>@p3 int<c/>@p4 nvarchar(4000))select TOP (@p0) trafficmem0_.TrafficMemberOrderRelationId as TrafficM1_433_<c/> trafficmem0_.ApplySource as ApplySou2_433_<c/> trafficmem0_.OrderMode as OrderMode433_<c/> trafficmem0_.OrderSource as OrderSou4_433_<c/> trafficmem0_.OrderState as OrderState433_<c/> trafficmem0_.ECCode as ECCode433_<c/> trafficmem0_.ECPrdCode as ECPrdCode433_<c/> trafficmem0_.UserMobile as UserMobile433_<c/> trafficmem0_.MemberOrderNum as MemberOr9_433_<c/> trafficmem0_.MainProductCode as MainPro10_433_<c/> trafficmem0_.TrafficPackageCode as Traffic11_433_<c/> trafficmem0_.BillPrice as BillPrice433_<c/> trafficmem0_.FlowValue as FlowValue433_<c/> trafficmem0_.MemberApplyCode as MemberA14_433_<c/> trafficmem0_.CrmApplyCode as CrmAppl15_433_<c/> trafficmem0_.EffectTime as EffectTime433_<c/> trafficmem0_.ExprieTime as ExprieTime433_<c/> trafficmem0_.UserId as UserId433_<c/> trafficmem0_.CreateTime as CreateTime433_<c/> trafficmem0_.OrderTime as OrderTime433_<c/> trafficmem0_.ModifyTime as ModifyTime433_<c/> trafficmem0
04/29/2016 14:07:51,spid33s,δ֪,inputbuf
04/29/2016 14:07:51,spid33s,δ֪,unknown
04/29/2016 14:07:51,spid33s,δ֪,frame procname=unknown line=1 sqlhandle=0x000000000000000000000000000000000000000000000000
04/29/2016 14:07:51,spid33s,δ֪,select TOP (@p0) trafficmem0_.TrafficMemberOrderRelationId as TrafficM1_433_<c/> trafficmem0_.ApplySource as ApplySou2_433_<c/> trafficmem0_.OrderMode as OrderMode433_<c/> trafficmem0_.OrderSource as OrderSou4_433_<c/> trafficmem0_.OrderState as OrderState433_<c/> trafficmem0_.ECCode as ECCode433_<c/> trafficmem0_.ECPrdCode as ECPrdCode433_<c/> trafficmem0_.UserMobile as UserMobile433_<c/> trafficmem0_.MemberOrderNum as MemberOr9_433_<c/> trafficmem0_.MainProductCode as MainPro10_433_<c/> trafficmem0_.TrafficPackageCode as Traffic11_433_<c/> trafficmem0_.BillPrice as BillPrice433_<c/> trafficmem0_.FlowValue as FlowValue433_<c/> trafficmem0_.MemberApplyCode as MemberA14_433_<c/> trafficmem0_.CrmApplyCode as CrmAppl15_433_<c/> trafficmem0_.EffectTime as EffectTime433_<c/> trafficmem0_.ExprieTime as ExprieTime433_<c/> trafficmem0_.UserId as UserId433_<c/> trafficmem0_.CreateTime as CreateTime433_<c/> trafficmem0_.OrderTime as OrderTime433_<c/> trafficmem0_.ModifyTime as ModifyTime433_<c/> trafficmem0_.DiscountRate as Discoun22_433_
from NA_TrafficMemberOrderRelation traffic
04/29/2016 14:07:51,spid33s,δ֪,frame procname=adhoc line=1 stmtstart=148 sqlhandle=0x0200000043e4d6202dd7a9505f76a2897ed4c6cada411021
04/29/2016 14:07:51,spid33s,δ֪,executionStack
04/29/2016 14:07:51,spid33s,δ֪,process id=process5b534c8 taskpriority=0 logused=0 waitresource=PAGE: 5:1:22075507 waittime=3659 ownerId=43666495407 transactionname=SELECT lasttranstarted=2016-04-29T14:07:46.007 XDES=0x58b16cc90 lockMode=S schedulerid=16 kpid=18860 status=suspended spid=653 sbid=0 ecid=0 priority=0 trancount=0 lastbatchstarted=2016-04-29T14:07:46.007 lastbatchcompleted=2016-04-29T14:07:46.007 clientapp=.Net SqlClient Data Provider hostname=ADC-INTFL hostpid=8820 loginname=daemon_adcuser isolationlevel=read committed (2) xactid=43666495407 currentdb=5 lockTimeout=4294967295 clientoption1=671088672 clientoption2=128056
04/29/2016 14:07:51,spid33s,δ֪,process-list
04/29/2016 14:07:51,spid33s,δ֪,deadlock victim=process5b534c8
04/29/2016 14:07:51,spid33s,δ֪,deadlock-list
从死锁日志的process-list部份分析,这里有两个进程 :
process id=process5b534c8正在执行对表NA_TrafficMemberOrderRelation的查询
process id=process71da6bb88正在执行对表NA_TrafficMemberOrderRelation的插入
由于死锁日志只记录当前线程开启事务正在执行的最后一句SQL,所以要想获取事务的完整SQL语句列表,必须通过Sql Sever Profiler跟踪或阅读程序代码。获得的答案是:
查询语句是单独的事务
select TOP (1) *
from NA_TrafficMemberOrderRelation trafficmem0_
where trafficmem0_.UserMobile=@p1 and trafficmem0_.TrafficPackageCode=@p2 and trafficmem0_.OrderState=@p3 and trafficmem0_.ECPrdCode=@p4
插入操作的事务大概是
begin tran
INSERT INTO NA_TrafficMemberOrderRelation values(...)
INSERT INTO NA_TrafficMemberOrderRelation values(...)
commit
接着往下看resource-list部份,死锁的过程大概是
1、插入线程process71da6bb88已经获得页面(Page)22075507上的意向排他锁IX(第一个插入语句执行成功后,在页面上加的IX锁),
2、查询线程process5b534c8获得页面22056721上的共享锁
3、插入线程process71da6bb88需要获得页面22056721上的IX以执行第二条插入语句,但查询进程process5b534c8已经获得了页面22056721上的S锁,而IX和S锁是不兼容的,所以插入线程在执行第二句插入语句时,被查询进程阻塞
4、查询线程process5b534c8需要获得页面22075507上的S以继续执行,但插入线程已经获得了该页面上的IX锁,此时结合第3步,两个进程进入死锁等待
那么问题来了,表NA_TrafficMemberOrderRelation在UserMobile字段是有索引的,按正常的索引查找,只需要在索引键上加S锁,所在的页面加IS锁就可以了,而IS锁和IX锁是兼容的,就不会产生死锁。而这里查询语句确在PAGE上加了锁,那只能说明这个查询使用了索引扫描而不是索引查找(select语句会申请哪些锁,请参考(点击打开链接).
于是通过dbcc page查看页面22056721的内容发现,些页面属于表NA_TrafficMemberOrderRelation上ECPrdCode字段的索引,因此断定这个查询语句的执行计划是通过扫描ECPrdCode索引来获取结果,执行计划是根据统计信息来生成的,所以事情的经过大概是这样的:
1、表NA_TrafficMemberOrderRelation插入了大量数据,但还没有达到自动更新统计信息的阙值(表NA_TrafficMemberOrderRelation现在6500万条记录).
2、Sql server查询优化器为查询语句选择了扫描ECPrdCode索引的执行计划(此处Sql Server应有BUG)
3、通过Sql Server Profiler跟踪该语句锁的申请和释放过程发现,是先申请第一个页面的S锁,再申请第二个页面的S锁,再释放第一个页面的S锁(不知微软是出于何考虑),如此循环,导致查询语句持有一个页面的S锁,又去申请下一个页面(上一个页面的S锁还没有释放)的S锁,最终形成死锁
知道了来龙去脉,解决起来就简单了,为查询语句加上ForceSeek提示:
select TOP (1) *
from NA_TrafficMemberOrderRelation trafficmem0_ with(forceseek)
where trafficmem0_.UserMobile=@p1 and trafficmem0_.TrafficPackageCode=@p2 and trafficmem0_.OrderState=@p3 and trafficmem0_.ECPrdCode=@p4