java 数据大批量的同步入库有什么方法 java批量导入避免重复数据

转载

imking 2024-04-02 16:01:37

文章标签 java批量插入过滤重复数据数据 sql 获取数据 文章分类 Java 后端开发

用储存过程批量抽取一个视图的数据,插入到一个新建的表,视图数据有2.4亿,昨天抽取到6千万就卡住了,不知道什么原因,想继续执行这个存储过程,想请问加什么条件来避免插入那些已经插入过的数据

视图上有唯一性字段 XH

储存过程如下

create or replace procedure up_table as
type a is table of new_table%rowtype;
in_data a;
i number;
cursor c is select * from fcd_ci_gps@dblink;
begin
open c;
loop
fetch c bulk collect into in_data limit 5000;
forall i in 1..in_data.count
insert into new_table values in_data(i);
commit;
exit when in_data.count=0;
end loop;
close c;
end;

最近刚做了一个你说的类似需求：

我的业务需求是，

从oracle数据库中获取数据，然后同步到sqlserver中。

首先是配置两个数据库之间的连接设置。

我是sqlserver 连接oracle 配置sqlserver的链路服务器就OK。

下面是存储过程的内容了：

1. 创建临时表。

通过远程连接，insert into 临时表 select 远程表。

获取数据先到本地，。

然后用临时表的数据，跟你本地业务表的数据进行对比。

查询不通的数据。

Java代码

java 数据大批量的同步入库有什么方法 java批量导入避免重复数据_sql

-- (1) 远程读取NC需求计划，分组汇总数据后，插入到临时表 #tmp_pl_plan中。

set @InsertStrSQL = @InsertStrSQL+ @tmpStrSQl;
print(@InsertStrSQL) ;
exec(@InsertStrSQL);
select @tmpCont = count(1) from #tmp_pl_plan ;

-- state:0新增、1修改、2删除

-- (2) 用本地数据与临时表中的数据，进行对比，更新本地表中计划数量与临时表中不相等的记录.

update t set t.plnum  = a.plnum ,t.state = 1
from  #tmp_pl_plan a,NC_PL_PLAN t
where a.factorycode = t.factorycode and a.weldingdate = t.weldingdate
and a.divisions = t.divisions and a.zzmadeline = t.zzmadeline
and a.zzweldingwayCode  = t.zzweldingwayCode and a.zzmadelinetypeCode = t.zzmadelinetypeCode
and a.convertedcode = t.convertedcode and  a.ncfprocode = t.ncfprocode
and t.plnum != a.plnum
and t.weldingdate >=  @fbegdate and t.weldingdate <= @fenddate

-- (3) 对比数据，查找本地表中存在，但是临时表中不存在的记录，然后修改本地表中的数量=0 ,state = 3 表示删除

update t set t.plnum = 0 ,t.state = 2

from NC_PL_PLAN t

where t.weldingdate between @fbegdate and @fenddate

and not exists (

select 1 from #tmp_pl_plan a where a.factorycode = t.factorycode and a.weldingdate = t.weldingdate

and a.divisions = t.divisions and a.zzmadeline = t.zzmadeline

and a.zzweldingwayCode = t.zzweldingwayCode and a.zzmadelinetypeCode = t.zzmadelinetypeCode

and a.convertedcode = t.convertedcode and a.ncfprocode = t.ncfprocode

);

-- (4) 对比数据，新增临时表中不存在于当前表的数据

--delete NC_PL_PLAN;

insert into NC_PL_PLAN

select * from #tmp_pl_plan t

where t.weldingdate between @fbegdate and @fenddate

and not exists (

select 1 from NC_PL_PLAN a where a.factorycode = t.factorycode and a.weldingdate = t.weldingdate

and a.divisions = t.divisions and a.zzmadeline = t.zzmadeline

and a.zzweldingwayCode = t.zzweldingwayCode and a.zzmadelinetypeCode = t.zzmadelinetypeCode

and a.convertedcode = t.convertedcode and a.ncfprocode = t.ncfprocode

and a.weldingdate >= @fbegdate and a.weldingdate<= @fenddate

)

order by t.weldingdate desc ;

最近刚做了一个你说的类似需求：

我的业务需求是，

从oracle数据库中获取数据，然后同步到sqlserver中。

首先是配置两个数据库之间的连接设置。

我是sqlserver 连接oracle 配置sqlserver的链路服务器就OK。

下面是存储过程的内容了：

1. 创建临时表。

通过远程连接，insert into 临时表 select 远程表。

获取数据先到本地，。

然后用临时表的数据，跟你本地业务表的数据进行对比。

查询不通的数据。

Java代码

java 数据大批量的同步入库有什么方法 java批量导入避免重复数据_sql

-- (1) 远程读取NC需求计划，分组汇总数据后，插入到临时表 #tmp_pl_plan中。

set @InsertStrSQL = @InsertStrSQL+ @tmpStrSQl;

print(@InsertStrSQL) ;

exec(@InsertStrSQL);

select @tmpCont = count(1) from #tmp_pl_plan ;

-- state:0新增、1修改、2删除

-- (2) 用本地数据与临时表中的数据，进行对比，更新本地表中计划数量与临时表中不相等的记录.

update t set t.plnum = a.plnum ,t.state = 1

from #tmp_pl_plan a,NC_PL_PLAN t

where a.factorycode = t.factorycode and a.weldingdate = t.weldingdate

and a.divisions = t.divisions and a.zzmadeline = t.zzmadeline

and a.zzweldingwayCode = t.zzweldingwayCode and a.zzmadelinetypeCode = t.zzmadelinetypeCode

and a.convertedcode = t.convertedcode and a.ncfprocode = t.ncfprocode

and t.plnum != a.plnum

and t.weldingdate >= @fbegdate and t.weldingdate <= @fenddate

-- (3) 对比数据，查找本地表中存在，但是临时表中不存在的记录，然后修改本地表中的数量=0 ,state = 3 表示删除

update t set t.plnum = 0 ,t.state = 2

from NC_PL_PLAN t

where t.weldingdate between @fbegdate and @fenddate

and not exists (

select 1 from #tmp_pl_plan a where a.factorycode = t.factorycode and a.weldingdate = t.weldingdate

and a.divisions = t.divisions and a.zzmadeline = t.zzmadeline

and a.zzweldingwayCode = t.zzweldingwayCode and a.zzmadelinetypeCode = t.zzmadelinetypeCode

and a.convertedcode = t.convertedcode and a.ncfprocode = t.ncfprocode

);

-- (4) 对比数据，新增临时表中不存在于当前表的数据

--delete NC_PL_PLAN;

insert into NC_PL_PLAN

select * from #tmp_pl_plan t

where t.weldingdate between @fbegdate and @fenddate

and not exists (

select 1 from NC_PL_PLAN a where a.factorycode = t.factorycode and a.weldingdate = t.weldingdate

and a.divisions = t.divisions and a.zzmadeline = t.zzmadeline

and a.zzweldingwayCode = t.zzweldingwayCode and a.zzmadelinetypeCode = t.zzmadelinetypeCode

and a.convertedcode = t.convertedcode and a.ncfprocode = t.ncfprocode

and a.weldingdate >= @fbegdate and a.weldingdate<= @fenddate

)

order by t.weldingdate desc ;

第一：merge into 就是最快的表更新方案了，merge into 能够去除重复数据，插入新数据，我不知道你为什么不用merge into。

第二：如果你不信任merge into，那么你可以在被更新的数据表中对唯一标识的列建立索引(index)，这样你在直接使用游标将一个表对另外一个表更新的时候会快很多很多。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：increment作用 inclear有什么作用

下一篇：mysql管理规范 mysql使用规范

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

java 数据大批量的同步入库有什么方法 java批量导入避免重复数据

java 数据大批量的同步入库有什么方法 java批量导入避免重复数据

51CTO博客