某核心交易库,报警IOwait超过30%,看似很普通的一条告警,实则暗藏玄机;登陆主机查看到有很多RMAN备份脚本在跑;

平时不到一小时的任务,跑了6个多小时了。 该备份任务通过nfs挂载的方式,怀疑是nfs有问题,

果然,进入到mount目录,ll都无法正常显示结果,卡住不动,随即到备份服务器上面,也就是nfs server查看有没有异常,发现之前部署的一个监控脚本,在平时load和io都很低。与今天故障时间段不太一样。

返回到备份客户端服务器,ps -ef |grep nfs发现有很多cat进程 [root@trandb1 log]# ps -ef |grep nfs root 9700 2 0 2017 ? 00:00:00 [nfsv4.0-svc] oracle 88889 88888 0 10:05 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190619_9868_1 oracle 90224 90223 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9888_1 oracle 90566 90565 0 10:06 ? 00:00:00 cat ./nfs/full_data_TRANDB_20190619_9872_1 oracle 90571 90570 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190619_9869_1 oracle 90576 90575 0 10:06 ? 00:00:00 cat ./nfs/full_data_TRANDB_20190619_9872_1 oracle 90584 90583 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190619_9868_1 oracle 90588 90587 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9884_1 oracle 90593 90592 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9885_1 oracle 90597 90596 0 10:06 ? 00:00:00 cat ./nfs/full_data_TRANDB_20190619_9865_1 oracle 90606 90605 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9881_1 oracle 90616 90615 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190619_9871_1 oracle 90626 90625 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9887_1 oracle 90631 90630 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9888_1 oracle 90641 90640 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190619_9871_1 oracle 90645 90644 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9880_1 oracle 91999 91998 0 10:06 ? 00:00:00 cat ./nfs/full_data_TRANDB_20190620_9883_1 oracle 92488 92487 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9880_1 oracle 93837 93836 0 10:07 ? 00:00:00 cat ./nfs/arch_TRANDB_20190620_9890_1 oracle 94011 94010 0 10:07 ? 00:00:00 cat ./nfs/full_data_TRANDB_20190620_9886_1 oracle 94238 94237 0 10:07 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190619_9865_1 root 98024 17863 0 10:09 pts/7 00:00:00 grep nfs root 130976 2 0 2017 ? 00:00:00 [nfsiod]

通过操作系统kill掉这些pid,但是立马就会派生出来。后umount掉目录后,没有了。暂时没有找出原因,记录一下。