前序:
由于无论在什么架构下,都会不可避免的出现人为误操作的事故出现,本文就对可能出现的误操作问题的解决办法进行测试,这些都是本人想到的解决办法并加以测试实验
架构:Replica set(1Primary+1Secondary+1slaveDelay)
延时时间:600秒
Primary:192.168.1.100:27017
Secondary:192.168.1.100:27018
SlaveDelay:192.168.1.100:27019 #延时节点
步骤:
一、误删除collection的部分数据记录时:
我认为的两种情况:删除的数据存在于延时节点中、删除的数据未存在于延时节点中
删除的数据存在于延时节点中
这个时候直接从延时节点将误删除的数据导出来,再导入Primary节点即可
1、Primary库中目前数据情况:
trs1:PRIMARY>use tt
trs1:PRIMARY>db.t1.find()
{"_id" : ObjectId("583667ab6268d1913b424a9a"), "a": 1 }
{"_id" : ObjectId("583667ab6268d1913b424a9b"), "a": 2 }
{"_id" : ObjectId("583667ab6268d1913b424a9c"), "a": 3 }
{"_id" : ObjectId("583667ab6268d1913b424a9d"), "a": 4 }
{"_id" : ObjectId("583667ab6268d1913b424a9e"), "a": 5 }
{"_id" : ObjectId("583667ab6268d1913b424a9f"), "a": 6 }
{"_id" : ObjectId("583667ab6268d1913b424aa0"), "a": 7 }
{"_id" : ObjectId("583667ab6268d1913b424aa1"), "a": 8 }
{"_id" : ObjectId("583667ab6268d1913b424aa2"), "a": 9 }
{"_id" : ObjectId("583667ab6268d1913b424aa3"), "a": 10 }
延时节点中也是此数据
2、删除部分数据:
trs1:PRIMARY>db.t1.remove({a:{$gte:4,$lte:7}})
WriteResult({"nRemoved" : 4 })
trs1:PRIMARY>db.t1.find()
{"_id" : ObjectId("583667ab6268d1913b424a9a"), "a": 1 }
{"_id" : ObjectId("583667ab6268d1913b424a9b"), "a": 2 }
{"_id" : ObjectId("583667ab6268d1913b424a9c"), "a": 3 }
{"_id" : ObjectId("583667ab6268d1913b424aa1"), "a": 8 }
{"_id" : ObjectId("583667ab6268d1913b424aa2"), "a": 9 }
{"_id" : ObjectId("583667ab6268d1913b424aa3"), "a": 10 }
3、在延时节点中将被删除数据导出:
[mongo@localhost~]$ mongoexport --host 10.25.161.15:27019 -d tt -c t1 -q '{a:{$gte:4,$lte:7}}'--out ~/backups/myRecords.json
2016-11-24T12:24:17.643+0800 connected to: 10.25.161.15:27019
2016-11-24T12:24:17.644+0800 exported 4 records
4、将数据导入到Primary库集合中:
[mongo@localhost~]$ mongoimport --host 10.25.161.15:27017 -d tt -c t1 --file ~/backups/myRecords.json
2016-11-24T12:25:09.962+0800 connected to: 10.25.161.15:27017
2016-11-24T12:25:09.972+0800 imported 4 documents
5、查询数据恢复结果:
trs1:PRIMARY>db.t1.find().sort({a:1})
{"_id" : ObjectId("583667ab6268d1913b424a9a"), "a": 1 }
{"_id" : ObjectId("583667ab6268d1913b424a9b"), "a": 2 }
{"_id" : ObjectId("583667ab6268d1913b424a9c"), "a": 3 }
{"_id" : ObjectId("583667ab6268d1913b424a9d"), "a": 4 }
{"_id" : ObjectId("583667ab6268d1913b424a9e"), "a": 5 }
{"_id" : ObjectId("583667ab6268d1913b424a9f"), "a": 6 }
{"_id" : ObjectId("583667ab6268d1913b424aa0"), "a": 7 }
{"_id" : ObjectId("583667ab6268d1913b424aa1"), "a": 8 }
{"_id" : ObjectId("583667ab6268d1913b424aa2"), "a": 9 }
{"_id" : ObjectId("583667ab6268d1913b424aa3"), "a": 10 }
查看已经恢复回来了,但是这算是在理想状态下:未有业务继续写入,导入导出数据量小,所以不知道在生产环境下还能否顺利实现,还需要待后续实验
删除的数据未存在于延时节点中
这个时候就需要使用oplog日志恢复了,将误删除的数据用oplog恢复回来
1、查看Primary库数据信息
trs1:PRIMARY>db.t1.find().sort({a:1})
{"_id" : ObjectId("583667ab6268d1913b424a9a"), "a": 1 }
{"_id" : ObjectId("583667ab6268d1913b424a9b"), "a": 2 }
{"_id" : ObjectId("583667ab6268d1913b424a9c"), "a": 3 }
{"_id" : ObjectId("583667ab6268d1913b424a9d"), "a": 4 }
{"_id" : ObjectId("583667ab6268d1913b424a9e"), "a": 5 }
{"_id" : ObjectId("583667ab6268d1913b424a9f"), "a": 6 }
{"_id" : ObjectId("583667ab6268d1913b424aa0"), "a": 7 }
{"_id" : ObjectId("583667ab6268d1913b424aa1"), "a": 8 }
{"_id" : ObjectId("583667ab6268d1913b424aa2"), "a": 9 }
{"_id" : ObjectId("583667ab6268d1913b424aa3"), "a": 10 }
2、查看时间戳s1
trs1:PRIMARY>rs.status().members[0].optime.ts
Timestamp(1479961509,4)
3、插入10条数据
trs1:PRIMARY>for(var i=1;i<11;i++){
...db.t1.insert({b:i})
...}
WriteResult({"nInserted" : 1 })
3、查看时间戳s2
trs1:PRIMARY>rs.status().members[0].optime.ts
Timestamp(1479972707,10)
4、删除5条数据(模拟误删除)
trs1:PRIMARY>db.t1.remove({b:{$gte:6}})
WriteResult({"nRemoved" : 5 })
5、再插入1条数据(模拟误操作之后又有业务进行了insert动作)
trs1:PRIMARY>db.t1.insert({c:1})
WriteResult({"nInserted" : 1 })
6、mongobackup备份导出时间戳s1之后的oplog
mongobackup-h 10.25.161.15 -port 27017 --backup -s 1479961509,4
7、使用mongobackup进行oplog恢复s1与s2时间段内的数据
mongobackup-h 10.25.161.15 -port 27017 --recovery -s 1479961509,4 -t 1479972707,10
connectedto: 10.25.161.15:27017
ThuNov 24 15:36:34.005 Replaying file oplog.bson
ThuNov 24 15:36:34.006 Only applying oplog entries matching this criteria: {"ts" : { "$gte" : { "$timestamp" : {"t" : 1479961509, "i" : 4 } }, "$lte" : {"$timestamp" : { "t" : 1479972707, "i" : 10 } } }}
16objects found
ThuNov 24 15:36:34.006 Successfully Recovered.
8、查询恢复后的数据
trs1:PRIMARY>db.t1.find({b:10})
{"_id" : ObjectId("58369763c95ee8a72cfab654"), "b": 10 }
trs1:PRIMARY>db.t1.find({b:{$gte:1,$lte:10}})
{"_id" : ObjectId("58369763c95ee8a72cfab64b"), "b": 1 }
{"_id" : ObjectId("58369763c95ee8a72cfab64c"), "b": 2 }
{"_id" : ObjectId("58369763c95ee8a72cfab64d"), "b": 3 }
{"_id" : ObjectId("58369763c95ee8a72cfab64e"), "b": 4 }
{"_id" : ObjectId("58369763c95ee8a72cfab64f"), "b": 5 }
{"_id" : ObjectId("58369763c95ee8a72cfab650"), "b": 6 }
{"_id" : ObjectId("58369763c95ee8a72cfab651"), "b": 7 }
{"_id" : ObjectId("58369763c95ee8a72cfab652"), "b": 8 }
{"_id" : ObjectId("58369763c95ee8a72cfab653"), "b": 9 }
{"_id" : ObjectId("58369763c95ee8a72cfab654"), "b": 10 }
trs1:PRIMARY>db.t1.find({c:1})
{"_id" : ObjectId("583697d1c95ee8a72cfab655"), "c": 1 }
查询之后我们发现原来删除的{b:6}~{b:10}的数据都恢复了,并且{c:1}的这笔数据也没有被清除,达到了我们预计的结果,其实实际中整个操作的难点在于时间戳的确认,我先在理想情况下把实验做完,后续再进行补充说明实际情况下的时间戳确认。
二、误删除collection
有时候误删除整个collection的话,就没办法仅通过延迟节点恢复了,因为延迟节点不存在最近一次同步之后Primary更新的数据,这就需要oplog闪亮登场了
在我目前的认知看来,可以有两种恢复方法:延时节点collection+oplog恢复、mongodump备份+oplog恢复
延时节点collection+oplog恢复:
1、查看表数据量
trs1:PRIMARY>db.t1.find()
{"_id" : ObjectId("5836a4c7c95ee8a72cfab656"), "a": 1 }
{"_id" : ObjectId("5836a4c7c95ee8a72cfab657"), "a": 2 }
{"_id" : ObjectId("5836a4c7c95ee8a72cfab658"), "a": 3 }
{"_id" : ObjectId("5836a4c7c95ee8a72cfab659"), "a": 4 }
{"_id" : ObjectId("5836a4c7c95ee8a72cfab65a"), "a": 5 }
2、插入一条新记录(模拟延迟节点没有的数据)
trs1:PRIMARY>db.t1.insert({a:6})
WriteResult({"nInserted" : 1 })
3、Drop collection
trs1:PRIMARY>db.t1.drop()
true
4、查看延时节点最近一次同步的时间戳s1
trs1:SECONDARY>rs.status().members[2].optime.ts
1479976135, 6)
5、mongoexport导出延时节点中collection的所有数据
[mongo@localhostbackups]$ mongoexport --host 10.25.161.15:27019 -d tt -c t1
2016-11-24T17:31:24.970+0800 connected to: 10.25.161.15:27019
{"_id":{"$oid":"5836a4c7c95ee8a72cfab656"},"a":1.0}
{"_id":{"$oid":"5836a4c7c95ee8a72cfab657"},"a":2.0}
{"_id":{"$oid":"5836a4c7c95ee8a72cfab658"},"a":3.0}
{"_id":{"$oid":"5836a4c7c95ee8a72cfab659"},"a":4.0}
{"_id":{"$oid":"5836a4c7c95ee8a72cfab65a"},"a":5.0}
2016-11-24T17:31:24.971+0800 exported 5 records
6、mongobackup导出时间戳s1之后的oplog
[mongo@localhostbackups]$ mongobackup -h 10.25.161.15 -port 27017 --backup -s 1479976135,6
connectedto: 10.25.161.15:27017
ThuNov 24 17:32:14.457 local.oplog.rs to backup/oplog.bson
ThuNov 24 17:32:14.458 2objects
7、mongoimport第④步中的json数据
[mongo@localhostbackups]$ mongoimport --host 10.25.161.15:27017 -d tt -c t1 --file ./myRecords.json
2016-11-24T17:34:55.200+0800 connected to: 10.25.161.15:27017
2016-11-24T17:34:55.229+0800 imported 5 documents
8、查询drop前的最后一个时间戳s2
因为如果选择drop的时间戳的话,mongobackup恢复时仍然是会重放drop动作的(我已实验过,注意第⑨步中的红色字体就明白了)
[mongo@localhostbackups]$ bsondump backup/oplog.bson | grep -B 1'"drop":"t1"'
。。。。。。。
1479979785,"i":1}},"t":{"$numberLong":"4"},"h":{"$numberLong":"-1488414770385429463"},"v":2,"op":"i","ns":"tt.t1","o":{"_id":{"$oid":"5836b309c95ee8a72cfab65b"},"a":6.0}}
{"ts":{"$timestamp":{"t":1479979814,"i":1}},"t":{"$numberLong":"4"},"h":{"$numberLong":"-1645421116110840065"},"v":2,"op":"c","ns":"tt.$cmd","o":{"drop":"t1"}}
9、mongobackup恢复collection到drop之前状态
mongobackup-h 10.25.161.15 -port 27017 --recovery -s 1479976135,6 -t 1479979785,1
connectedto: 10.25.161.15:27017
ThuNov 24 17:39:49.289 Replaying file oplog.bson
$gte" : {"$timestamp" : { "t" : 1479976135, "i" : 6 } },"$lte" : { "$timestamp" : {"t" : 1479979785, "i" : 1 } } } }
2objects found
ThuNov 24 17:39:49.290 Successfully Recovered.
10、验证结果
trs1:PRIMARY>db.t1.find()
{"_id" : ObjectId("5836a4c7c95ee8a72cfab658"), "a": 3 }
{"_id" : ObjectId("5836a4c7c95ee8a72cfab656"), "a": 1 }
{"_id" : ObjectId("5836a4c7c95ee8a72cfab657"), "a": 2 }
{"_id" : ObjectId("5836a4c7c95ee8a72cfab65a"), "a": 5 }
{"_id" : ObjectId("5836a4c7c95ee8a72cfab659"), "a": 4 }
{"_id" : ObjectId("5836b309c95ee8a72cfab65b"), "a": 6 }
mongodump备份+oplog恢复:
1. 开启实时oplog备份
[mongo@localhostbackups]$ pwd
/home/mongo/backups
[mongo@localhostbackups]mongobackup -h 10.25.161.15 --port 27017 --backup --stream
1. 插入10万笔数据(模拟dump备份时有新数据生成)
trs1:PRIMARY>for(var i=0;i<100000;i++){
...db.t1.insert({a:i})}
WriteResult({"nInserted" : 1 })
1. mongodump执行全量备份
[mongo@localhostbackups]$ pwd
/home/mongo/backups
[mongo@localhostbackups]$mongodump --host 10.25.161.15:27017 --oplog
2016-11-25T11:24:56.731+0800 writing tt.t1 to
2016-11-25T11:24:56.762+0800 done dumping tt.t1 (4239 documents)
2016-11-25T11:24:56.763+0800 writing captured oplog to
2016-11-25T11:24:56.769+0800 dumped 36 oplog entries
1. 待insert完成后,drop掉此collection(模拟误删除collection动作)
trs1:PRIMARY>db.t1.count()
100000
trs1:PRIMARY>db.t1.drop()
true
1. 停止mongobackup的实时oplog备份
使用Ctrl+C就可以停止
1. 查看第③步备份中最后一笔oplog的时间戳s1
[mongo@localhostbackups]$ pwd
/home/mongo/backups
[mongo@localhostbackups]$ ll dump/
total4
-rw-rw-r--.1 mongo mongo 3816 Nov 25 11:24oplog.bson
drwxrwxr-x.2 mongo mongo 43 Nov 25 11:24 tt
[mongo@localhostbackups]$ bsondump dump/oplog.bson | tail -1
2016-11-25T11:35:26.735+0800 36 objects found
{"ts":{"$timestamp":{"t":1480044296,"i":776}},"t":{"$numberLong":"4"},"h":{"$numberLong":"4007452313334291247"},"v":2,"op":"i","ns":"tt.t1","o":{"_id":{"$oid":"5837af0823358cbda370c7b0"},"a":4267.0}}
注:因为mongodump在备份时,数据一直在变化,所以为了备份数据一致性使用了--oplog参数,用于备份在mongodump过程中新生成的数据,所以oplog.bson最后一笔数据的时间戳就是mongodump结束时的时间戳
1. 首先使用全量备份进行恢复
[mongo@localhostbackups]$ mongorestore --host10.25.161.15:27017 --oplogReplay ./dump
2016-11-25T16:15:15.149+0800 building a list of dbs and collections torestore from dump dir
2016-11-25T16:15:15.151+0800 reading metadata for tt.t1 fromdump/tt/t1.metadata.json
2016-11-25T16:15:15.167+0800 restoring tt.t1 from dump/tt/t1.bson
2016-11-25T16:15:15.368+0800 restoring indexes for collection tt.t1 frommetadata
2016-11-25T16:15:15.369+0800 finished restoring tt.t1 (4239
2016-11-25T16:15:15.369+0800 replaying oplog
2016-11-25T16:15:15.387+0800 done
trs1:PRIMARY>db.t1.count()
4268
1. 通过bsondump查看drop collection之前的最后一个时间戳s2
$bsondump backup/oplog000000.bson | grep -B 1'"drop":"t1"' | sort
{"ts":{"$timestamp":{"t":1480044395,"i":807}},"t":{"$numberLong":"4"},"h":{"$numberLong":"-7281969221530393253"},"v":2,"op":"i","ns":"tt.t1","o":{"_id":{"$oid":"5837af6b23358cbda3723da4"},"a":99999.0}}
{"ts":{"$timestamp":{"t":1480044421,"i":1}},"t":{"$numberLong":"4"},"h":{"$numberLong":"8681873880357004177"},"v":2,"op":"c","ns":"tt.$cmd","o":{"drop":"t1"}}
注:如果有多行drop记录,则使用sort选取最后一个drop动作的前一个时间戳(即上述红色数字)
1. mongobackup恢复时间戳s1~s2之间的数据
[mongo@localhostbackups]$ mongobackup -h 10.25.161.15 --port 27017 --recovery -s 1480044296,776 -t 1480044395,807 ./backup/
connectedto: 10.25.161.15:27017
FriNov 25 16:17:35.343 Replaying file oplog.bson
FriNov 25 16:17:35.343 Only applying oplog entries matching this criteria: {"ts" : { "$gte" : { "$timestamp" : {"t" : 1480044296, "i" : 776 } }, "$lte" : {"$timestamp" : { "t" : 1480044395, "i" : 807 } }} }
100103objects found
FriNov 25 16:17:35.962 Successfully Recovered.
1. 验证恢复情况
trs1:PRIMARY>db.t1.count()
100000
centos mongodb 数据丢失 mongodb误删除恢复
转载本文章为转载内容,我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题,欢迎原作者联系我们进行内容更正或删除文章。
提问和评论都可以,用心的回复会被更多人看到
评论
发布评论
相关文章
-
【Netapp数据恢复】Netapp存储lun被误删除的数据恢复案例
某单位一台Netapp存储,该Netapp存储内共有数十块SAS硬盘。工作人员误操作删除了Netapp存储中12个lun,删除的数据包括客户信息和其他重要数据。
数据恢复 netapp数据恢复 存储数据恢复 服务器数据恢复 -
被误删除的 GitLab 群组和项目该如何恢复?
本文分享如何使用极狐GitLab 的项目延迟删除功能来避免仓库被用户误操作。该功能设置了删除延时时间,在延时期间内,用户还可以对项目进行恢复。
GitLab DevOps DevSecOps -
crontab 误删除恢复
事故原因分析:回忆自己操作过程中,未进行crontab的清空,网上查了下原因,并且复现了下。可能原因如下
crontab 数据 计划任务 程序异常