**mysql replication 中slave机器上有两个关键的进程,死一个都不行,一个是slavesqlrunning,一个是SlaveIORunning,一个负责与主机的io通信,一个负责自己的slave mysql进程。 下面写一下,这两个要是有no了,怎么恢复。。
如果是slaveiorunning no了,那么就我个人看有三种情况,一个是网络有问题,连接不上,像有一次我用虚拟机搭建replication,使用了nat的网络结构,就是死都连不上,第二个是有可能my.cnf有问题,配置文件怎么写就不说了,网上太多**了,最后一个是授权的问题,replication slave和file权限是必须的。如果不怕死就all咯。。 ** 一旦io为no了先看err日志,看看爆什么错,很可能是网络,也有可能是包太大收不了,这个时候主备上改maxallowedpacket这个参数。 ******
如果是slavesqlrunning no了,那么也有两种可能,一种是slave机器上这个表中出现了其他的写操作,就是程序写了,这个是会有问题的,今天我想重现,但是有时候会有问题,有时候就没有问题,现在还不是太明了,后面再更新,还有一种占绝大多数可能的是slave进程重启,事务回滚造成的,这也是mysql的一种自我保护的措施,像关键时候只读一样。
这个时候想恢复的话,只要停掉slave,set GLOBAL SQLSLAVESKIPCOUNTER=1;再开一下slave就可以了,这个全局变量赋值为N的意思是:
This statement skips the next N events from the master. This is useful for recovering from replication stops caused by a statement.
This statement is valid only when the slave thread is not running. Otherwise, it produces an error.
呵呵,讲的比我清楚。
MYSQL镜像服务器因错误停止的恢复
下午主服务器,由于一些原因,导致死机,重启后,发现从服务器的数据没有跟上。 配好MYSQL主从也才前几天的事,没多少经验,第一次碰上这问题,有点焦急。不过,自己试了下,还算解决了:)
从服务器上 MasterLogFile: mysqlhxmaster.000007 ReadMasterLogPos: 84285377
看一下主服务器:mysqlhxmaster.000007 | 84450528 | 已经过后很多了,确实没跟上。
show slave status\G SlaveIORunning: Yes SlaveSQLRunning: No
有问题了,SlaveSQLRunning应该是Yes才对。
再往下看,有错误的提示:
LastErrno: 1053 LastError: Query partially completed on the master (error on master: 1053) and was aborted. There is a chance that your master is inconsistent at this point. If you are sure that your master is ok, run this query manually on the slave and then restart the slave with SET GLOBAL SQLSLAVESKIPCOUNTER=1; START SLAVE; . Query: 'INSERT INTO hxstatrecord ......(一句SQL语句)'
这里有说明要怎么操作了:)
先stop slave,然后执行了一下提示的语句,再SET GLOBAL SQLSLAVESKIPCOUNTER=1; START SLAVE;
show slave status\G
SlaveIORunning: Yes SlaveSQLRunning: Yes
OK了,从服务器也在几分钟内把堆积的log处理完了,两边又同步了:)
从MYSQL服务器SlaveIORunning: No的解决2
早晨机房意外断电,导致了发现mysql从服务器同步异常。使用以前碰到的SlaveSQLRunning为No的解决办法无效,仍然无法同步。
查看一下状态show slave status MasterLogFile: mysqlmaster.000079 ReadMasterLogPos: 183913228 RelayLogFile: hx-relay-bin.002934 RelayLogPos: 183913371 RelayMasterLogFile: mysqlmaster.000079 SlaveIORunning: No SlaveSQLRunning: Yes
主服务器show master status\G File: mysqlmaster.000080 Position: 13818288 BinlogDoDB: BinlogIgnoreDB: mysql,test
mysql错误日志: 100512 9:13:17 [Note] Slave SQL thread initialized, starting replication in log 'mysqlmaster.000079' at position 183913228, relay log './hx-relay-bin.002934' position: 183913371 100512 9:13:17 [Note] Slave I/O thread: connected to master 'replicuser@192.168.1.21:3306', replication started in log 'mysqlmaster.000079' at position 183913228 100512 9:13:17 [ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( servererrno=1236) 100512 9:13:17 [ERROR] Got fatal error 1236: 'Client requested master to start replication from impossible position' from master when reading data from binary log 100512 9:13:17 [Note] Slave I/O thread exiting, read up to log 'mysqlmaster.000079', position 183913228
这次是SlaveIORunning为No,从日志上来看,服务器读mysqlmaster.000079这个Log的183913228这个位置时发生错误,这个位置不存在,于是无法同步。
查看一下这个Log的最后几行: /!40019 SET @@session.maxinsertdelayedthreads=0/; /!50003 SET @OLDCOMPLETIONTYPE=@@COMPLETIONTYPE,COMPLETIONTYPE=0/;
at 4
#100511 9:35:15 server id 1 endlogpos 98 Start: binlog v 4, server v 5.0.27-standard-log created 100511 9:35:15
Warning: this binlog was not closed properly. Most probably mysqld crashed writing it.
**尝试从损坏之前的位置开始 SLAVE STOP; CHANGE MASTER TO MASTERLOGFILE='mysqlcncnmaster.000079', MASTERLOGPOS=183913220; SLAVE START; 无效! 只好从新的日志开始 SLAVE STOP; **** CHANGE MASTER TO MASTERLOGFILE='mysqlcncnmaster.000080', MASTERLOGPOS=0; SLAVE START; 此时SlaveIORunning恢复为Yes,同步进行了!观察了会儿,没有任何出错迹象,问题解决。 **
另外,出现SlaveIORunning:NO还有一个原因是slave上没有权限读master上的数据。 转自:http://www.jb51.net/article/27220.htm****