跨年夜问题：一句并不复杂的delete竟然在delete statement处cost飙升，在数据量上升的十万级就像进入了死循环，执行后久久没有结果

转载

mb5fe18e9fef50b 2020-01-25 09:17:00

文章标签 sql 数据库执行计划随机数 oracle 文章分类 代码人生

笔者使用的环境：

#	类别	版本
1	操作系统	Win10
2	数据库	Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
3	硬件环境	T440p
4	内存	8G

有这样一张表：

CREATE TABLE tb_sc
(
    id NUMBER not null primary key,
    studentid int not null,
    courseid int not null,
    score int not null
)

用下面sql为其充值：

Insert into tb_sc
select rownum,dbms_random.value(0,10000),dbms_random.value(1,5),dbms_random.value(0,150) from dual
connect by level<=10000
order by dbms_random.random

充值完了commit。

充值的目的是为了模拟学生的高考考分，但由于随机数的关系，会存在studentid和courseid相同，而score不同的记录，即同一考生同一科考了多次，这在现实中是不可能发生的，因此需要把多余记录剔除，只保留studentid和courseid相同，而score最高的那条记录。

而问题就在剔除过程中产生了！

通过下面sql能知道要剔除掉多少数据：

select count(*)  from tb_sc where (studentid,courseid,score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid);

在我这边得到998条：

SQL> select count(*)  from tb_sc where (studentid,courseid,score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid);

  COUNT(*)
----------
       998

然后把这句稍微改写下:

delete from tb_sc where (studentid,courseid,score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid);

然后执行发现，多余记录确实被删除了，但耗时有些不正常，万条记录居然花了一阵子！

再开执行计划看看：

SQL> set autotrace trace exp;
SQL> delete from tb_sc where (studentid,courseid,score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid);

已删除998行。


执行计划
----------------------------------------------------------
Plan hash value: 710125525

--------------------------------------------------------------------------------
------

| Id  | Operation                | Name      | Rows  | Bytes | Cost (%CPU)| Time
     |

--------------------------------------------------------------------------------
------

|   0 | DELETE STATEMENT         |           |   100K|  1464K|  8438K  (9)| 28:0
7:45 |

|   1 |  DELETE                  | TB_SC     |       |       |            |
     |

|*  2 |   FILTER                 |           |       |       |            |
     |

|   3 |    TABLE ACCESS FULL     | TB_SC     |   100K|  1464K|   104   (2)| 00:0
0:02 |

|*  4 |    FILTER                |           |       |       |            |
     |

|   5 |     HASH GROUP BY        |           |     1 |    11 |    89   (8)| 00:0
0:02 |

|   6 |      INDEX FAST FULL SCAN| IND_TB_SC |   100K|  1074K|    83   (2)| 00:0
0:01 |

--------------------------------------------------------------------------------
------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( NOT EXISTS (SELECT 0 FROM "TB_SC" "TB_SC" GROUP BY
              "STUDENTID","COURSEID" HAVING LNNVL("STUDENTID"<>:B1) AND
              LNNVL("COURSEID"<>:B2) AND LNNVL(MAX("SCORE")<>:B3)))
   4 - filter(LNNVL("STUDENTID"<>:B1) AND LNNVL("COURSEID"<>:B2) AND
              LNNVL(MAX("SCORE")<>:B3))

这里面最让人震惊的就是在第0行 DELETE STATEMENT处，cost从104 一下子飙升到了8438K，涨了八万倍！！！

而更大的问题是，把tb_sc表的记录扩大，到十万级别，delete from tb_sc where (studentid,courseid,score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid);就跑不下来了，就那么僵着，一动不动，像进入死循环一样。大家可以拿下面语句充值再试试效果：

Insert into tb_sc
select rownum,dbms_random.value(0,100000),dbms_random.value(1,5),dbms_random.value(0,150) from dual
connect by level<=100000
order by dbms_random.random

我就不贴图了，反正是sql plus窗口就那么僵着，直到你主动关闭它！

目前替代这句sql的其它方案我有，都能正常运行，在工程上能把这个问题绕过去。

但自己这关绕不过去，我就想知道，为什么cost在delete statment处飙升，为什么在十万百万级别就会运行不下去，究竟是什么原因？

--2020.01.25--

2020.2.2 8：37补记

今天试了下面SQL：

SQL> delete from tb_sc where (studentid,courseid,score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid) and rownum<101;

已删除100行。

这个能执行，而且也没僵住。

看来，delete from tb_sc where (studentid,courseid,score) not in (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid) 这种语句执行导致卡死，是因为要一次性删除的内容过多，消耗资源过大，回滚段不堪重负导致的。我尝试了 delete from table where created_datetime>XX, delete from table where id between min and mx这样更简单的删除语句，如果删除的数据量多起来，也容易发生类似故障。

限制每次的删除份额，就能让语句从僵死状态变成通达，循环多次，也能达到目的了。如下面伪代码所描述的：

while(剩余数量>0){

删除符合条件的

}

另外，如果要用in，最好简短一些，拿主键去查找，如：

delete from tb_sc where id not in (select tb_sc.id from tb_sc,( select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid ) tb_sc2
where tb_sc.studentid=tb_sc2.studentid and tb_sc.courseid=tb_sc2.courseid and tb_sc.score=tb_sc2.score)

或是换exists：

delete from tb_sc where not exists (
select null from tb_sc a,
                 (select studentid,courseid,max(score) as score from tb_sc group by studentid,courseid) b
where a.studentid=b.studentid and a.courseid=b.courseid and a.score=b.score and tb_sc.id=a.id)

再配上rownum限制和循环删除，就可以达到顺畅删除又不给数据库太大压力了。

--2020年2月2日--

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。