这是他们一个开发写的SQL,目的是删除重复数据,且id是最小值的行不删除:
- delete from jd_chapter a where a.`id` in
- (select `id` from jd_chapter group by book_id,chapter_id having count(*)>1)
- and a.`id` not in
- (select min(`id`) from jd_chapter group by book_id,chapter_id having count(*)>1);
因为表大(千万级别),且使用了两个子查询,执行了很久没有执行完。
--------------------------思路----------------------------
采用临时表做关联,以下是步骤:
一、先到Slave库上把重复数据导出来,避免造成主库压力过大。
- select id from jd_chapter group by book_id,chapter_id having count(*)>1 order by id asc
- into outfile '/tmp/jd_chapter.sql' FIELDS TERMINATED BY ',';
二、拷贝导出的SQL到Master主库的/tmp/目录下
三、在Master主库上,建立一张临时表,并创建主键:
- mysql> create TEMPORARY table tmp(id int,primary key(id));
- Query OK, 0 rows affected (0.07 sec)
四、在Master主库上,LOAD方式导入至临时表里
- load data infile '/tmp/jd_chapter.sql' into table tmp FIELDS TERMINATED BY ',';
五、在Master主库上,删除临时表最小的id
- delete from tmp limit 1;
六、用临时表做关联,删除jd_chapter表重复数据
- delete a from jd_chapter join tmp b on a.id=b.id;
本文出自 “贺春旸的技术专栏” 博客,请务必保留此出处http://hcymysql.blog.51cto.com/5223301/1129629