简短地描述:
过程1:机器无缘故异常宕机。
过程2:机器重启之后发现起不来。
过程3:然后更换硬件,更换了cpu之后机器起来了。
过程4:然后作者开心的把数据库起来了。然后登陆数据库,妥妥地,没有毛病。
过程5:该数据库之前是主库,机器宕机之后,自动发生主从切换了。所以,准备验证一下数据,验证新主库跟老主库之间的数据是否一致,新主库是否丢失数据。 验证完毕之后,新主库接管时没丢失数据,妥妥滴。
过程6:因为数据没多没少,所以准备直接将该数据库作为新主库的从库。所以执行了change master 命令。
过程7:执行start slave 命令,然后瞬间发现mysqld 狗带了,自动重启。 虽然是在mysql一线运维(干苦力)很多年的老dba, 但这种情况还真是蛮少遇到滴--因为机器宕机直接把mysql数据库搞歇菜。
然后看mysqld 的error log .
2018-09-06T18:33:47.475065+08:00 5 [Warning] Slave SQL for channel '': If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0 2018-09-06T18:33:47.475172+08:00 5 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'mysql-bin.000005' at position 1063042206, relay log '/mysqldata/myinst1/binlog/relay-log.000002' position: 425121 2018-09-06T18:33:47.478127+08:00 5 [Note] Slave for channel '': MTS Recovery has completed at relay log /mysqldata/myinst1/binlog/relay-log.000002, position 473555 master log mysql-bin.000005, position 1063090640. 2018-09-06T18:33:52.516543+08:00 6 [Warning] Timeout waiting for reply of binlog (file: mysql-bin.000015, pos: 7595), semi-sync up to file , position 0. 2018-09-06T18:33:52.516580+08:00 6 [Note] Semi-sync replication switched OFF. 2018-09-06 18:33:52 0x7fbce99d4700 InnoDB: Assertion failure in thread 140449349977856 in file fut0lst.ic line 85 InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 10:33:52 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. Attempting to collect some information that could help diagnose the problem. As this is a crash and something is definitely wrong, the information collection process might fail. key_buffer_size=268435456 read_buffer_size=8388608 max_used_connections=1 max_threads=2000 thread_count=23 connection_count=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 20768831 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.
上面的日志一大堆,但有用的信息就两行:
InnoDB: Assertion failure in thread 140449349977856 in file fut0lst.ic line 85
InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA
根据 in file fut0lst.ic line 85 找到下面的函数:
Reads a file address.
@return file address */
UNIV_INLINE
fil_addr_t
flst_read_addr(
/*===========*/
const fil_faddr_t* faddr, /*!< in: pointer to file
faddress */
mtr_t* mtr) /*!< in: mini-transaction handle
*/
{
fil_addr_t addr;
ut_ad(faddr && mtr);
addr.page = mtr_read_ulint(faddr + FIL_ADDR_PAGE, MLOG_4BYTES,
mtr);
addr.boffset = mtr_read_ulint(faddr + FIL_ADDR_BYTE,
MLOG_2BYTES,
mtr);
ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA);
ut_a(ut_align_offset(faddr, UNIV_PAGE_SIZE) >= FIL_PAGE_DATA);
return(addr);
}
问题出在“ ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA);“ 这里。因获取到的addr 信息,不满足上面的条件。
为啥获取的文件地址跟需要的有差异了? 可能是服务器宕机时,破坏了这个一致性,问题在哪里?
继续捋代码。
/********************************************************************//**
Writes a file address. */
UNIV_INLINE
void
flst_write_addr(
/*============*/
fil_faddr_t* faddr, /*!< in: pointer to file faddress */
fil_addr_t addr, /*!< in: file address */
mtr_t* mtr) /*!< in: mini-transaction handle */
{
ut_ad(faddr && mtr);
ut_ad(mtr_memo_contains_page_flagged(mtr, faddr,
MTR_MEMO_PAGE_X_FIX
| MTR_MEMO_PAGE_SX_FIX));
ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA);
ut_a(ut_align_offset(faddr, UNIV_PAGE_SIZE) >= FIL_PAGE_DATA);
mlog_write_ulint(faddr + FIL_ADDR_PAGE, addr.page, MLOG_4BYTES,
mtr);
mlog_write_ulint(faddr + FIL_ADDR_BYTE, addr.boffset,
MLOG_2BYTES, mtr);
}
问题在上面这个函数中的这两行, 当执行完
mlog_write_ulint(faddr + FIL_ADDR_PAGE, addr.page, MLOG_4BYTES,
mtr); 这行代码,而下一行还没有执行时,服务器就宕机了。 则这个faddr 记录的信息就不完整了,导致了上面的 ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA); 判断不通过,造成mysqld crash .
如果没有搭建从库,也没有备份,大家会如何处理?请说说呗!