前言:
在之前搭建MongoDB集群中,遇到过一些小问题作妖引起的初始化副本集失败,初学至今踩过来的坑,在此做个小结。
1、IP错误引起MongoDB副本集初始化失败
这个错误在另一篇文章已经描述过,这里略过不赘述。 详情见博客:IP错误引起MongoDB副本集初始化失败
2、PRIMARY与SECONDARY主机mongodb-keyfile文件内容不一致,导致在PRIMARY上添加副本集失败
问题描述:
搭建另外一个MongoDB副本集,主机和角色分配如下:
主机IP | 角色 | 系统 |
---|---|---|
131.10.11.106 | PRIMARY | centos7 |
131.10.11.111 | SECONDARY | centos7 |
131.10.11.114 | SECONDARY | centos7 |
MongoDB server version: 3.4.10.1
在PRIMARY上添加SECONDARY主机131.10.11.111,出现下面的报错:
mongotest:PRIMARY> rs.add("131.10.11.111:27017")
{
"ok" : 0,
"errmsg" : "Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 131.10.11.106:27017; the following nodes did not respond affirmatively: 131.10.11.111:27017 failed with Authentication failed.",
"code" : 74,
"codeName" : "NodeNotFound"
}
原因分析:
经过排查,发现131.10.11.111主机的mongodb-keyfile和主节点不一致,并且在131.10.11.111主机的配置文件mongo.conf文件没有配置安全认证,所以导致了初始化失败
解决方法:
1、将PRIMARY节点上的mongodb-keyfile文件复制到备节点131.10.11.111上,并且修改权限为400 2、并且修改配置文件/etc/mongodb/mongo.conf如下:
[root@mongodb111 mongodb]# cat mongo.conf
systemLog:
destination: file
path: "/opt/mongodbdata/mongod.log"
logAppend: true
storage:
journal:
enabled: true
dbPath: /opt/mongodbdata
setParameter:
enableLocalhostAuthBypass: true
processManagement:
fork: true
pidFilePath: "/opt/mongodbdata/mongod.pid"
replication:
replSetName: mongotest
#添加下面几行:
security:
authorization: enabled
keyFile: "/etc/mongodb/mongodb-keyfile"
[root@mongodb111 mongodb]#
重启131.10.11.111机器mongodb,然后重新在PRIMARY上执行 rs.add("131.10.11.111:27017"),成功。
3、备节点配置文件没有配置replSet,导致添加副本集失败
问题描述:
这个问题和问题2是在同一个环境中遇到的,在106主机上添加114主机的时候,报下面的错误:
mongotest:PRIMARY> rs.add("131.10.11.114:27017")
{
"ok" : 0,
"errmsg" : "Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 131.10.11.106:27017; the following nodes did not respond affirmatively: 131.10.11.114:27017 failed with not running with --replSet",
"code" : 74,
"codeName" : "NodeNotFound"
}
原因分析:
根据提示“the following nodes did not respond affirmatively: 131.10.11.114:27017 failed with not running with --replSe”,查看了114主机的配置文件mongo.conf,发现这是因为备节点上的配置文件里面没有配置副本集,所以无法添加
解决方法:
修改备节点的/etc/mongodb/mongo.conf配置文件如下,加上副本集配置:
[root@mongodb114 mongodb]# cat mongo.conf
systemLog:
destination: file
path: "/opt/mongodbdata/mongod.log"
logAppend: true
storage:
journal:
enabled: true
dbPath: /opt/mongodbdata
setParameter:
enableLocalhostAuthBypass: true
processManagement:
fork: true
pidFilePath: "/opt/mongodbdata/mongod.pid"
security:
authorization: enabled
keyFile: "/etc/mongodb/mongodb-keyfile"
replication: #加上副本集配置,
replSetName: mongotest #name要注意和主节点上保持一致
[root@mongodb114 mongodb]#
重启131.10.11.114机器mongodb,然后重新在PRIMARY上执行 rs.add("131.10.11.114:27017"),成功
4、bindIp默认127.0.0.1,导致MongoDB副本集初始化失败
问题描述:
有一次搭建一个MongoDB副本集,主机和角色分配如下:
主机IP | 角色 | 系统 |
---|---|---|
10.0.0.101 | PRIMARY | centos7 |
10.0.0.102 | SECONDARY | centos7 |
10.0.0.103 | SECONDARY | centos7 |
MongoDB server version: 4.0.2 在PRIMARY主机10.0.0.101上加入SECONDARY主机10.0.0.102的时候出现这个错误: 添加从节点失败:
CrystalTest:PRIMARY> rs.add("10.0.0.102:27017")
{
"operationTime" : Timestamp(1539054715, 1),
"ok" : 0,
"errmsg" : "Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 10.0.0.101:27017; the following nodes did not respond affirmatively: 10.0.0.102:27017 failed with Error connecting to 10.0.0.102:27017 :: caused by :: Connection refused",
"code" : 74,
"codeName" : "NodeNotFound",
"$clusterTime" : {
"clusterTime" : Timestamp(1539054715, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
原因分析:
看到 “failed with Error connecting to 10.0.0.102:27017 :: caused by :: Connection refused”的时候很疑惑,因为10.0.0.102主机上的27017端口是OK的,服务也能正常使用,防火墙什么的都是关掉了的,尝试在PRIMARY主机10.0.0.101主机上telnet,发现不通:
[root@test101 ~]# telnet 10.0.0.102 27017
Trying 10.0.0.102...
telnet: connect to address 10.0.0.102: Connection refused
然后到102主机上查看端口,发现bindIp是127.0.0.1,问题应该就是这里了。bindIp是127.0.0.1,因此导致了10.0.0.101主机连不过去:
[root@test102 ~]# netstat -tlunp|grep mongo
tcp 0 0 127.0.0.1:27017 0.0.0.0:* LISTEN 1065/mongod #显示的是127.0.0.1:27017
解决方法:
修改102主机的mongo.conf加入“bindIp: 0.0.0.0 ”,然后重启102主机的MongoDB
[root@test102 bin]# cat /etc/mongodb/mongo.conf
systemLog:
destination: file
path: "/opt/mongodbdata/mongod.log"
logAppend: true
storage:
journal:
enabled: true
dbPath: /opt/mongodbdata
setParameter:
enableLocalhostAuthBypass: true
processManagement:
fork: true
pidFilePath: "/opt/mongodbdata/mongod.pid"
replication:
replSetName: CrystalTest
security:
authorization: enabled
keyFile: "/etc/mongodb/mongodb-keyfile"
net:
port: 27017
bindIp: 0.0.0.0 #加入这一行
再查看端口:
[root@test102 mongodb]# netstat -tlunp|grep 27017
tcp 0 0 0.0.0.0:27017 0.0.0.0:* LISTEN 3433/mongod #变成了0 0.0.0.0:27017
[root@test102 mongodb]#
然后在101主机上telnet,可以连过去了:
[root@test101 ~]# telnet 10.0.0.102 27017
Trying 10.0.0.102...
Connected to 10.0.0.102.
Escape character is '^]'.
^C^C
Connection closed by foreign host.
[root@test101 ~]#
重新在PRIMARY主机10.0.0.101添加102主机,就成功了:
CrystalTest:PRIMARY> rs.add("10.0.0.102:27017")
{
"ok" : 1,
"operationTime" : Timestamp(1539056959, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1539056959, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}