公司的服务器挂了,此服务器上部署着mysql集群的从库,主库是编译安装,从库是docker安装的。记录一次启动问题。问题已经解决了,但是不是正确的解决方式,希望能在评论区讨论。
1.问题发现
发现服务器挂了立马重启,所有功能都可以使用了,测试mysql从库时发现问题,使用可视化工具连接失败,连接不到mysql服务,上服务器发现镜像启动成功了,准备进入容器查看。
容器内部使用mysql命令行连接数据库报错。
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysql/mysql.sock' (111)
此文件是启动数据库以后自动生成的,所以考虑是mysql启动失败了。但是docker显示启动成功了,进入容器以后一直被踢出容器,开始以为是其他同事在操作,后来发现容器中的mysql服务一直在尝试启动导致的自动重启容器。
因为经历过多次停电数据库集群一直运行平稳,这次抽查了一下真的发现了问题。
docker logs name
显示以下内容(截取错误部分)
2022-06-06T04:21:07.672350Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2022-06-06T04:21:07.672375Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2022-06-06T04:21:07.672377Z 0 [Note] InnoDB: Retrying to lock the first data file
2022-06-06T04:21:08.672532Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2022-06-06T04:21:08.672577Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2022-06-06T04:21:09.672975Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2022-06-06T04:22:47.706756Z 0 [Note] InnoDB: Unable to open the first data file
2022-06-06T04:22:47.706778Z 0 [ERROR] InnoDB: Operating system error number 11 in a file operation.
2022-06-06T04:22:47.706803Z 0 [ERROR] InnoDB: Error number 11 means 'Resource temporarily unavailable'
2022-06-06T04:22:47.706811Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2022-06-06T04:22:47.706823Z 0 [ERROR] InnoDB: Cannot open datafile './ibdata1'
2022-06-06T04:22:47.706857Z 0 [ERROR] InnoDB: Could not open or create the system tablespace. If you tried to add new data files to the system tablespace, and it failed here, you should now edit innodb_data_file_path in my.cnf back to what it was, and remove the new ibdata files InnoDB created in this failed attempt. InnoDB only wrote those files full of zeros, but did not yet use them in any way. But be careful: do not remove old data files which contain your precious data!
2022-06-06T04:22:47.706868Z 0 [ERROR] InnoDB: Plugin initialization aborted with error Cannot open a file
2022-06-06T04:22:48.308050Z 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-06-06T04:22:48.308098Z 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2022-06-06T04:22:48.308110Z 0 [ERROR] Failed to initialize builtin plugins.
2022-06-06T04:22:48.308115Z 0 [ERROR] Aborting
2022-06-06T04:22:48.308124Z 0 [Note] Binlog end
2022-06-06T04:22:48.308238Z 0 [Note] Shutting down plugin 'CSV'
2022-06-06T04:22:48.311617Z 0 [Note] mysqld: Shutdown complete
至此启动失败,彻底shutdown。
2.解决问题
网上给出的三种解决方案
- 删除对应文件
- kill掉争抢文件的服务
- 移动对应文件
考虑到docker启动,不可能存在争抢的问题,直接排除这一条。
最后直接使用mv命令移动了对应文件,相当于删除和备份,再 cp -a 拷贝回来强行解锁了它。
其实此问题应该排查锁出现的地方,其中有一篇文章中写道主库的问题,我想过主库重启但这是不可能的。
此时copy回来以后从库可以正常连接访问了
show slave status \G;
主从状态也正常了。