crm常用操作(一)

原创

冰冻火焰 2015-05-09 17:37:05 博主文章分类：集群 ©著作权

文章标签 crm corosync pacemaker 文章分类 数字化转型

©著作权归作者所有：来自51CTO博客作者冰冻火焰的原创作品，请联系作者获取转载授权，否则将追究法律责任

一）当前活动的配置文件（cib）备份
   第一种方法：
       crm(live)cib# new livebak # 直接把当前的配置保存到了livebak中
           INFO: cib.new: livebak shadow CIB created
   第二种方法：
       创建一个空的cib资源然后把当前配置copy到这一资源中
       crm(live)cib# new livebak2 empty
       crm(livebak2)cib# end
       crm(livebak2)# configure show # 没有任何数据
       crm(livebak2)# cib   # 把当前的信息直接copy到livebak2中
       crm(livebak2)cib# reset livebak2
       INFO: cib.reset: copied live CIB to livebak2
       crm(livebak2)# configure show   # 说明这样也可以实现
       node vm_test1 \
           attributes standby=off
       node vm_test2 \
           attributes standby=off
       primitive drbd_fs Filesystem \
       params device="/dev/drbd0" directory="/data/mysql" fstype=ext4 \
       op monitor interval=30s timeout=40s \
       op start timeout=60 interval=0 \
       op stop timeout=60 on-fail。。。。
   第三种方法：
       直接复制配置文件
       [root@vm_test1 cib]# cp -a cib.xml shadow.livebak3
       crm(livebak)cib# use livebak3 # 也可以
二）创建一个空的cib资源
   现在我们新建一个空的cib文件。从空文件开始
   crm(live)cib# new study_test empty # 空的配置文件
三）cib更换
   现在我们把当前cib配置文件备份一份为drbd_mysql_ip并切换当前的livecib为一个新的cib
   crm(live)cib# new drbd_mysql_ip
   INFO: cib.new: drbd_mysql_ip shadow CIB created

   在切换livecib时一定要停止当前pacemaker中运行的资源，不然会产生孤儿资源（也就是没人管理的资源）
   先停掉现在运行的所有资源再切换
   crm(live)# resource   # 注意了资源的启动和关闭一定要按照特定的顺序
   crm(live)resource# stop mysql_ip
   crm(live)resource# stop mysqld
   crm(live)resource# stop drbd_fs
   crm(live)resource# stop ms_mysql_drbd
   crm(live)resource# stop mysql_drbd
   crm(live)resource# end
   crm(live)# cib
   crm(live)cib# use study_test
   crm(study_test)# configure show
   crm(study_test)cib# commit # 此时是的cib中是没有定义节点和资源的
四）先禁用stonith功能
   INFO: cib.commit: committed 'study_test' shadow CIB to the cluster
   crm(study_test)cib# use live
   crm(study_test)cib# end
   禁用stonith功能，corosync默认是启用stonith功能的，没有stonith设备，若不禁用stonith直接去配置资源的话，verify会报错，并且无法commit。
   crm(live)configure# property stonith-enabled=false
   crm(live)configure# verify
   crm(live)configure# commit

五）节点定义
   crm(live)# configure
   crm(live)configure# edit # 也可以通过重启corosync服务来自动生成节点（如果使用service corosync restart 重启服务时一直处于等待状态的话，可以使用kill -9 直接杀死进程号来重启）。这里我手动定义
   node vm_test1
   node vm_test2
   crm(live)configure# verify
   crm(live)configure# commit
   crm(live)configure# end
   crm(live)# status
   Last updated: Thu May 7 21:14:04 2015
   Last change: Thu May 7 21:07:39 2015
   Current DC: vm_test2 - partition with quorum
   2 Nodes configured
   0 Resources configured

   Online: [ vm_test1 vm_test2 ] # 发现节点已经添加上

   # 如果出现下面这一段代表的是以前的drbd_mysql_ip配置中的资源现在变成了孤儿资源，没有配置来管理（也就代表以前在切换cib时没有把原来的cib中的资源stop掉）
   mysql_drbd   (ocf::linbit:drbd):   ORPHANED Master [ vm_test2 vm_test1 ]
   drbd_fs   (ocf::heartbeat:Filesystem):   ORPHANED Started vm_test2
   mysqld   (lsb:mysql):   ORPHANED Started vm_test2
   mysql_ip   (ocf::heartbeat:IPaddr):   ORPHANED Started vm_test2


六）添加资源
   添加一个webip资源:
   我们可以使用crm(live)ra# meta ocf:heartbeat:IPaddr 命令来查看配置的语法
   crm(live)ra# end
   crm(live)# configure
   primitive webip ocf:heartbeat:IPaddr params ip=192.168.1.215
   crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.1.215
   crm(live)configure# verify
   crm(live)configure# commit
   再添加一个webserver资源:
   crm(live)configure# primitive webserver lsb:httpd
   crm(live)configure# verify
   crm(live)configure# commit
七）资源约束定义
   [root@vm_test1 ~]# crm status
   。。。。。。

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test1
   webserver   (lsb:httpd):   Started vm_test2   # 默认情况下资源是均分在不同的节点上的

   1) 组约束
   我们可以把资源定义成一个组，这样组内的资源就会在同一节点上运行。比如我们把webip、webserver定义成web_server_ip_group
   crm(live)configure# group web_server_ip_group webip webserver
   crm(live)configure# verify
   crm(live)configure# commit
   crm(live)configure#
   [3]+ Stopped                 crm
   [root@vm_test1 ~]# crm status
   ......
   Online: [ vm_test1 vm_test2 ]

   Resource Group: web_server_ip_group
       webip   (ocf::heartbeat:IPaddr):   Started vm_test1
       webserver   (lsb:httpd):   Started vm_test1 # 资源马上就到了同一节点上

   crm(live)# node standby vm_test1 # vm_test1离线
   crm(live)#
   crm(live)#
   crm(live)# status
   ..............

   Node vm_test1: standby
   Online: [ vm_test2 ]

   Resource Group: web_server_ip_group
       webip   (ocf::heartbeat:IPaddr):   Started vm_test2
       webserver   (lsb:httpd):   Started vm_test2 # 资源全部转移到vm_test2上了

   crm(live)# node online vm_test1 # 节点重新上线

   2）排列约束---定义资源可以或者不可以在同一节点上运行
   先删除组
   crm(live)resource# stop web_server_ip_group # 先停止组
   crm(live)# configure
   crm(live)configure# delete web_server_ip_group # 删除组
   crm(live)configure# commit
   crm(live)configure# end
   # 让webserver 和 webip 永远在一起
   crm(live)# configure
   crm(live)configure# colocation webserver_with_webip inf: webserver webip # inf表示正无穷
   crm(live)configure# commit
   crm(live)configure# end
   crm(live)# status
   。。。。。。。

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test1
   webserver   (lsb:httpd):   Started vm_test1 # 可以看到两个资源又在同一节点上运行了

   3）定义资源的顺序约束
   也就是资源启动的先后顺序
   我们的资源是先启动webip再启动webserver
   crm(live)# configure
   crm(live)configure# order webip_before_webserver mandatory: webip webserver
   crm(live)configure# commit


   4) 位置约束
   没定义位置约束前，我们把vm_test2先转化为standby,然后再转化为online发现资源并不会切换回vm_test2。
   [root@vm_test1 ~]# crm status
   。。。。。。。。
   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test2
   webserver   (lsb:httpd):   Started vm_test2
   [root@vm_test1 ~]# crm node standby vm_test2
   [root@vm_test1 ~]# crm status
   。。。。。。
   Node vm_test2: standby
   Online: [ vm_test1 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test1
   webserver   (lsb:httpd):   Started vm_test1

   [root@vm_test1 ~]# crm node online vm_test2
   。。。。。。。。。。

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test1
   webserver   (lsb:httpd):   Started vm_test1

   # 也就是说在两个节点都可用的情况下资源更愿意留在当前节点，并没有因为vm_test2上线了而转移回vm_test2。
   # 但是如果我们两台服务器的性能不一样，比如vm_test2性能要好。我想在vm_test2上线时资源就切换回vm_test2该怎么做。这时候我们就要用到位置约束（默认每个资源在两台服务器的位置约束都为0）
   crm(live)configure# location webip_on_vm_test2 webip 200: vm_test2
   crm(live)configure# verify
   crm(live)configure# commit
   crm(live)configure#
   [1]+ Stopped                 crm
   [root@vm_test1 ~]# crm status
   ......

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test2
   webserver   (lsb:httpd):   Started vm_test2
   [root@vm_test1 ~]# crm node standby vm_test2
   [root@vm_test1 ~]# crm status
   .......

   Node vm_test2: standby
   Online: [ vm_test1 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test1
   webserver   (lsb:httpd):   Started vm_test1
   [root@vm_test1 ~]# crm node online vm_test2
   [root@vm_test1 ~]# crm status
   .........

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test2
   webserver   (lsb:httpd):   Started vm_test2

   # 这样只要vm_test2一上线它就会把资源争夺过来

   # 这时候我们如果把webserver在vm_test1上的位置约束定义为300会发生什么现象呢？且看
   crm(live)configure# location webserver_on_vm_test1 webserver 300: vm_test1
   crm(live)configure# commit
   [root@vm_test1 ~]# crm status
   .........

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test1
   webserver   (lsb:httpd):   Started vm_test1 # 由于webserver和webip是一定要运行在同一节点上的。所以当webserver和webip在vm_test1上位置约束之和300要大于在vm_test2上的粘性之和200则资源就会在vm_test1上运行


   5) 资源粘性
   资源粘性生效于当前运行节点。资源运行在哪里，即在哪里生效。粘性定义，无关任一node，只生效当前所运行节点;
   crm(live)configure# rsc_defaults resource-stickiness=350
   crm(live)configure# commit
   [root@vm_test1 ~]# crm status
   。。。。。。

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test1
   webserver   (lsb:httpd):   Started vm_test1
   [root@vm_test1 ~]# crm node standby vm_test1
   [root@vm_test1 ~]# crm status
   。。。。。。

   Node vm_test1: standby
   Online: [ vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test2
   webserver   (lsb:httpd):   Started vm_test2
   [root@vm_test1 ~]# crm node online vm_test1
   [root@vm_test1 ~]# crm status
   。。。。。。

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test2
   webserver   (lsb:httpd):   Started vm_test2

   # 由于资源粘性（350*2）大于节点vm_test1上的位置粘性之和（300），所以当资源切换到vm_test2上后并没有切换回来的意愿。所以当vm_test1重新上线后资源还是在vm_test2运行

   # 现在我们考虑一个问题如果我们把资源粘性设置为150会发生什么效果？且看
   crm(live)configure# rsc_defaults resource-stickiness=150
   crm(live)configure# commit
   [root@vm_test1 ~]# crm status
   。。。。。。。

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test1
   webserver   (lsb:httpd):   Started vm_test1
   [root@vm_test1 ~]# crm node standby vm_test1
   [root@vm_test1 ~]# crm status
   。。。。。。。

   Node vm_test1: standby
   Online: [ vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test2
   webserver   (lsb:httpd):   Started vm_test2

   [root@vm_test1 ~]# crm node online vm_test1
   [root@vm_test1 ~]# crm status
   。。。。。。。

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test2
   webserver   (lsb:httpd):   Started vm_test2

   # 我们发现只要节点的资源粘性（150*2=300一共有两资源）加上所有资源在这一节点上的位置粘性（200）500 大于在另外一节点上位置粘性（300）。资源也不会发生转移。
   # 为了验证这一观点我们把资源粘性定义为20 20*2 + 200 =240   240<300 所以理论上应该会切换回vm_test1

   crm(live)configure# rsc_defaults resource-stickiness=20
   crm(live)configure# commit

   [root@vm_test1 ~]# crm node standby vm_test1
   [root@vm_test1 ~]# crm status
   。。。。。。。。。。。。

   Node vm_test1: standby
   Online: [ vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test2
   webserver   (lsb:httpd):   Started vm_test2
   [root@vm_test1 ~]# crm node online vm_test1
   [root@vm_test1 ~]# crm status
   。。。。。。。。。。。。。

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test1
   webserver   (lsb:httpd):   Started vm_test1   # 上面的结论成立
八）资源监控
   手动停止httpd服务
   [root@vm_test2 ~]# service httpd stop
   Stopping httpd:                                            [ OK ]
   [root@vm_test2 ~]# crm status
   。。。。。。。。

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test2
   webserver   (lsb:httpd):   Started vm_test2
   [root@vm_test2 ~]# service httpd status
   httpd is stopped
   # 直接停止httpd服务。资源状态中显示httpd服务还是运行的。节点没有故障，所有资源不会转移，默认情况下，pacemaker不会对任何资源进行监控,所以，即便是资源关掉了，只要节点没有故障，资源依然不会转移;要想达到资源转移的目的，得定义监控(monitoer);

   # 现在我们监控webserver资源
   crm(live)# resource
   crm(live)resource# status # 此时发现webserver资源任然显示为started 状态
   webip   (ocf::heartbeat:IPaddr):   Started
   webserver   (lsb:httpd):   Started
   crm(live)resource# stop webserver # 停止资源
   crm(live)resource# stop webip
   crm(live)resource# status
   webip   (ocf::heartbeat:IPaddr):   Stopped
   webserver   (lsb:httpd):   Stopped
   crm(live)resource# cleanup webserver # 清理资源
   Cleaning up webserver on vm_test1
   Cleaning up webserver on vm_test2
   Waiting for 2 replies from the CRMd.. OK
   crm(live)resource# cleanup webip
   Cleaning up webip on vm_test1
   Cleaning up webip on vm_test2
   Waiting for 2 replies from the CRMd.. OK
   crm(live)resource# end
   crm(live)# configure
   crm(live)configure# monitor webserver 20s:15s # 定义监控webserver 监控时间：超时时间
   crm(live)configure# verify
   crm(live)configure# commit
   crm(live)configure# cd
   crm(live)# resource
   crm(live)resource# start webip
   crm(live)resource# start webserver
   crm(live)resource# end
   crm(live)# status
   Last updated: Sat May 9 15:47:50 2015
   Last change: Sat May 9 15:47:45 2015
   Stack: classic openais (with plugin)
   Current DC: vm_test1 - partition with quorum
   Version: 1.1.11-97629de
   2 Nodes configured, 2 expected votes
   2 Resources configured

   Online: [ vm_test1 vm_test2 ]

   webip   (ocf::heartbeat:IPaddr):   Started vm_test1
   webserver   (lsb:httpd):   Started vm_test1 # 现在两个资源都在vm_test1节点上运行

   [root@vm_test1 ~]# service httpd stop
   Stopping httpd:                                            [ OK ]
   [root@vm_test1 ~]# service httpd status # 此时发现httpd处于stoped状态
   httpd is stopped

   # 过15s后再次查看
   [root@vm_test1 ~]# service httpd status # 服务又在当前节点运行了
   httpd (pid 32207) is running...
   监控的意义：一旦发现服务没启动，就会尝试在当前主机上重启;

   当然可以直接在定义资源的同时就定义监控
   如：primitive vip ocf:heartbeat:Ipaddr params ip=192.168.1.218 op monitor interval=30s timeout=15s op start timeout=20s op stop timeout=20s