在系统升级的过程中,准备了大量的脚本,分成几个窗口来分别执行。
在碰到问题的时候,一定要很细心和冷静,不经意的错误可以需要几倍,几十倍的努力来挽回。
准生产环境中有一个表。TREATMENT_ACTIVITY,现在需要从另外一个临时的schema中insert一部分数据。

Name Null? Type
----------------------------------------- -------- ----------------------------
ENTITY_KEY NOT NULL NUMBER(4)
ENTITY_ID NOT NULL NUMBER(9)
ENTITY_TYPE NOT NULL VARCHAR2(10)
ACTIVITY_CODE NOT NULL VARCHAR2(25)
ACTIVITY_DATE NOT NULL TIMESTAMP(6)
SYS_CREATION_DATE NOT NULL DATE
SYS_UPDATE_DATE DATE
OPERATOR_ID NUMBER(9)
APPLICATION_ID CHAR(6)
DL_SERVICE_CODE CHAR(5)
DL_UPDATE_STAMP NUMBER(4)
APPROVAL_INDICATOR CHAR(1)
MEMO_ID NOT NULL VARCHAR2(100)
REASON_CODE NOT NULL VARCHAR2(6)
STEP_ID VARCHAR2(31)
POLICY_CODE VARCHAR2(128)
PA_IND CHAR(1)

值得一提的是,这个表没有主键,只创建了一些相关的索引,但是执行的时候发现数据有很大的偏差。
插入数据前,有500多万,需要插入数据400多万,预计数据应该是1千万,但是实际却是1千五百万。
Initial data of this on before import : 5747945
Data imported from temp schema: 4662877
so we expected to see : 10410822
But the count of this table after imported is 15073699

最后经过测试发现,是在使用Nohup执行一个批量脚本后,发现有一个表执行的很慢,就手动kill掉了session.手工插入,结果这个插入操作还在后台运行,结果手工又执行了一遍,因为没有主键,所以没有发现任何异常,做最后的数据统计的时候才发现。
找到了问题的原因,想把多余的数据删除。
首先尝试使用如下的方式来查看数据条数。但是竟然给我返回了0条数据。
SQL> select count(*) from TREATMENT_ACTIVITY where (
ENTITY_KEY ,
ENTITY_ID ,
ENTITY_TYPE ,
ACTIVITY_CODE ,
ACTIVITY_DATE ,
SYS_CREATION_DATE ,
SYS_UPDATE_DATE ,
OPERATOR_ID ,
APPLICATION_ID ,
DL_SERVICE_CODE ,
DL_UPDATE_STAMP ,
APPROVAL_INDICATOR ,
MEMO_ID ,
REASON_CODE ,
STEP_ID ,
POLICY_CODE ,
PA_IND ) in (select
19 ENTITY_KEY ,
ENTITY_ID ,
ENTITY_TYPE ,
ACTIVITY_CODE ,
ACTIVITY_DATE ,
SYS_CREATION_DATE ,
SYS_UPDATE_DATE ,
OPERATOR_ID ,
APPLICATION_ID ,
DL_SERVICE_CODE ,
DL_UPDATE_STAMP ,
APPROVAL_INDICATOR ,
MEMO_ID ,
REASON_CODE ,
STEP_ID ,
POLICY_CODE ,
PA_IND from mig_tmp.TREATMENT_ACTIVITY_ext)
/

COUNT(*)
----------
0

我是明明知道数据有重复,但是查看却匹配不出来。那使用rowid来做一把测试。
select count(*) from dr_TREATMENT_ACTIVITY t1 where t1.rowid != (select max(rowid) from dr_TREATMENT_ACTIVITY t2
where t1.ENTITY_KEY = t2.ENTITY_KEY
and t1.ENTITY_ID = t2.ENTITY_ID
and t1.ENTITY_TYPE = t2.ENTITY_TYPE
and t1.ACTIVITY_CODE = t2.ACTIVITY_CODE
and t1.ACTIVITY_DATE = t2.ACTIVITY_DATE
and t1.SYS_CREATION_DATE = t2.SYS_CREATION_DATE
and t1.SYS_UPDATE_DATE = t2.SYS_UPDATE_DATE
and t1.OPERATOR_ID = t2.OPERATOR_ID
and t1.APPLICATION_ID = t2.APPLICATION_ID
and t1.DL_SERVICE_CODE = t2.DL_SERVICE_CODE
and t1.DL_UPDATE_STAMP = t2.DL_UPDATE_STAMP
and t1.APPROVAL_INDICATOR = t2.APPROVAL_INDICATOR
and t1.MEMO_ID = t2.MEMO_ID
and t1.REASON_CODE = t2.REASON_CODE
and t1.STEP_ID = t2.STEP_ID
and t1.POLICY_CODE = t2.POLICY_CODE
and t1.PA_IND = t2.PA_IND )


COUNT(*)
----------
0

最后灵机一动,拼成一个字符串来。
SQL> select count(*) from TREATMENT_ACTIVITY where (
2 ENTITY_KEY ||
3 ENTITY_ID ||
ENTITY_TYPE ||
ACTIVITY_CODE ||
ACTIVITY_DATE ||
SYS_CREATION_DATE ||
SYS_UPDATE_DATE ||
OPERATOR_ID ||
APPLICATION_ID ||
DL_SERVICE_CODE ||
DL_UPDATE_STAMP ||
APPROVAL_INDICATOR ||
MEMO_ID ||
REASON_CODE ||
STEP_ID ||
POLICY_CODE ||
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 PA_IND ) in (select
19 ENTITY_KEY ||
ENTITY_ID ||
ENTITY_TYPE ||
ACTIVITY_CODE ||
ACTIVITY_DATE ||
SYS_CREATION_DATE ||
SYS_UPDATE_DATE ||
OPERATOR_ID ||
APPLICATION_ID ||
DL_SERVICE_CODE ||
DL_UPDATE_STAMP ||
APPROVAL_INDICATOR ||
MEMO_ID ||
REASON_CODE ||
20 21 22 23 24 25 26 27 28 29 30 31 32 33 STEP_ID ||
34 POLICY_CODE ||
PA_IND from mig_tmp.TREATMENT_ACTIVITY_ext) 35
36 /

COUNT(*)
----------
9325754
Elapsed: 00:01:41.22

这样就可以直接使用delete来先删除重复插入的数据。然后重新插入。
delete from TREATMENT_ACTIVITY where (
ENTITY_KEY ||
ENTITY_ID ||
ENTITY_TYPE ||
ACTIVITY_CODE ||
ACTIVITY_DATE ||
SYS_CREATION_DATE ||
SYS_UPDATE_DATE ||
OPERATOR_ID ||
APPLICATION_ID ||
DL_SERVICE_CODE ||
DL_UPDATE_STAMP ||
APPROVAL_INDICATOR ||
MEMO_ID ||
REASON_CODE ||
STEP_ID ||
POLICY_CODE ||
PA_IND ) in (select
ENTITY_KEY ||
ENTITY_ID ||
ENTITY_TYPE ||
ACTIVITY_CODE ||
ACTIVITY_DATE ||
SYS_CREATION_DATE ||
SYS_UPDATE_DATE ||
OPERATOR_ID ||
APPLICATION_ID ||
DL_SERVICE_CODE ||
DL_UPDATE_STAMP ||
APPROVAL_INDICATOR ||
MEMO_ID ||
REASON_CODE ||
STEP_ID ||
POLICY_CODE ||
PA_IND from mig_tmp.TREATMENT_ACTIVITY_ext)

9325754 rows deleted.

Elapsed: 00:02:24.24
SQL> commit;
Commit complete.

SQL> insert into TREATMENT_ACTIVITY select *from mig_tmp.TREATMENT_ACTIVITY_ext;
4662877 rows created.
Elapsed: 00:00:18.44
SQL> commit;
Commit complete.

这样数据就能完全对上了。这个问题根本原因就是null导致的。可以使用如下的例子来简单演示一下。
SQL> create table test(id number,name varchar2(10));
insert into test values(1,null);
insert into test values(2,1);
insert into test values(3,1)


create table test2 (id number,name varchar2(10));
insert into test values(1,null);
insert into test values(2,1);
insert into test values(3,1)

Table created.
SQL>
1 row created.
SQL>
1 row created.
SQL> 2 SQL>
Table created.
SQL>
1 row created.
SQL>
1 row created.
SQL>

现在我们来使用相同的方式来查询。可以看到根本的原因就是null
SQL> select count(*)from test where (id,name) in (select id,name from test2)
2 /
COUNT(*)
----------
0