1、实验说明:

node1主机名:node1.abc.com,地址192.168.1.10;node2主机名:node2.abc.com,地址192.168.1.20;vip地址192.168.1.50;操作系统linux企业版5.4;内核版本号:2.6.18-164.el5
实验用到的软件包:
由于drbd内核模块代码只在linux内核2.6.3.33以后的版本中才有,所以我们要同时安装内核模块和管理工具:
drbd83-8.3.8-1.el5.centos.i386.rpm    #drbd的管理包
kmod-drbd83-8.3.8-1.el5.centos.i686.rpm    #drbd的内核模块
 
cluster-glue-1.0.6-1.6.el5.i386.rpm       #在群集中增加对更多节点的支持
cluster-glue-libs-1.0.6-1.6.el5.i386.rpm #库文件
corosync-1.2.7-1.1.el5.i386.rpm corosync #的主配置文件
corosynclib-1.2.7-1.1.el5.i386.rpm        #corosync的库文件
heartbeat-3.0.3-2.3.el5.i386.rpm          #做heartbeat四层的资源代理
heartbeat-libs-3.0.3-2.3.el5.i386.rpm    #heartbeat的库文件
ldirectord-1.0.1-1.el5.i386.rpm    #在高可用性群集中实验对后面realserver的探测
libesmtp-1.0.4-5.el5.i386.rpm
openais-1.1.3-1.6.el5.i386.rpm   #做丰富pacemake的内容
openaislib-1.1.3-1.6.el5.i386.rpm   #openais 的库文件
pacemaker-1.1.5-1.1.el5.i386.rpm   # pacemake的主配置文档
pacemaker-libs-1.1.5-1.1.el5.i386.rpm  #pacemaker的库文件
pacemaker-cts-1.1.5-1.1.el5.i386.rpm
perl-TimeDate-1.16-5.el5.noarch.rpm
resource-agents-1.0.4-1.1.el5.i386.rpm  # 开启资源代理
mysql-5.5.15-linux2.6-i686.tar.gz      # mysql的绿色安装包
资源的下载地址 http://down.51cto.com/data/402802
 
2、实验步骤:
 1、同步时间:
node1
[root@node1 ~]# hwclock –s
 node2
[root@node2 ~]# hwclock –s
 
2、修改hosts文件:
[root@node1 ~]# vim /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.1.10    node1.abc.com   node1
192.168.1.20    node2.abc.com   node2
 
将hosts文件拷贝都node2中:
[root@node1 ~]# scp /etc/hosts 192.168.1.20:/etc/hosts
The authenticity of host '192.168.1.20 (192.168.1.20)' can't be established.
RSA key fingerprint is d4:f1:06:3b:a0:81:fd:85:65:20:9e:a1:ee:46:a6:8b.
Are you sure you want to continue connecting (yes/no)? yes #覆盖原hosts文件
Warning: Permanently added '192.168.1.20' (RSA) to the list of known hosts.
root@192.168.1.20's password:   #输入node2的管理员密码
hosts        
 
3、在两个节点上生成密钥,实现无密码的方式通讯:
node1
[root@node1 ~]# ssh-keygen   -t rsa   #产生一个非对称加密的私钥对
enerating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):   #默认,直接回车
Enter passphrase (empty for no passphrase):   #默认,直接回车
Enter same passphrase again:   #默认,直接回车
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
61:b1:4a:c8:88:19:31:5d:cb:8f:91:0c:fe:38:bd:c3 root@node1.abc.com
 
[root@node1 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@node2.abc.com #拷贝到node2上
15
The authenticity of host 'node2.abc.com (192.168.1.20)' can't be established.
RSA key fingerprint is d4:f1:06:3b:a0:81:fd:85:65:20:9e:a1:ee:46:a6:8b.
Are you sure you want to continue connecting (yes/no)? yes #输入yes
Warning: Permanently added 'node2.abc.com' (RSA) to the list of known hosts.
root@node2.abc.com's password:   #node2的管理员密码
node2
 
[root@node2 ~]# ssh-keygen   -t rsa   #产生一个非对称加密的私钥对
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):   #默认,直接回车
Enter passphrase (empty for no passphrase):   #默认,直接回车
Enter same passphrase again:   #默认,直接回车
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
3f:0b:27:14:8a:ba:b1:c6:4d:02:2b:22:86:a3:46:0a root@node2.abc.com
[root@node2 ~]# ssh-copy-id -i .ssh/id_rsa.pub   root@node1.abc.com #拷贝到node1节点上
15
The authenticity of host 'node1.abc.com (192.168.1.10)' can't be established.
RSA key fingerprint is d4:f1:06:3b:a0:81:fd:85:65:20:9e:a1:ee:46:a6:8b.
Are you sure you want to continue connecting (yes/no)? yes   #输入yes
Warning: Permanently added 'node1.abc.com,192.168.1.10' (RSA) to the list of known hosts.
root@node1.abc.com's password: #node1的管理员密码
至此我们可以再两个节点之间实现无密码的通信了。
4、编辑yum客户端
node1
[root@node1 ~]# mkdir /mnt/cdrom
[root@node1 ~]# mount /dev/cdrom /mnt/cdrom/
[root@node1 ~]# vim /etc/yum.repos.d/rhel-debuginfo.repo
[rhel-server]
name=Red Hat Enterprise Linux server
baseurl=file:///mnt/cdrom/Server
enabled=1
gpgcheck=1
gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release
[rhel-cluster]
name=Red Hat Enterprise Linux cluster
baseurl=file:///mnt/cdrom/Cluster
enabled=1
gpgcheck=1
gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release
[rhel-clusterstorage]
name=Red Hat Enterprise Linux clusterstorage
baseurl=file:///mnt/cdrom/ClusterStorage
enabled=1
gpgcheck=1
gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release
[rhel-vt]
name=Red Hat Enterprise Linux vt
baseurl=file:///mnt/cdrom/VT
enabled=1
gpgcheck=1
gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release
 
将yum客户端文件拷贝到node2的/etc/ yum.repos.d/ 目录下
[root@node1 ~]# scp /etc/yum.repos.d/rhel-debuginfo.repo node2.abc.com:/etc/yum.repos.d/
 
node2
[root@node2 ~]# mkdir /mnt/cdrom
[root@node2 ~]# mount /dev/cdrom /mnt/cdrom/
 
5、将下载好的rpm包上传到linux上的各个节点
安装drbd:
node1
[root@node1 ~]# yum localinstall drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm -y –nogpgcheck
node2:
[root@node2 ~]# yum localinstall drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm -y --nogpgcheck
6、在各节点上增加一个大小类型都相关的drbd设备(sda4):
 
node1
 
[root@node1 ~]# fdisk /dev/sda
The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
 
Command (m for help): n   #增加一分区
Command action
   e   extended
   p   primary partition (1-4)
p    #主分区
Selected partition 4
First cylinder (1580-2610, default 1580):    #默认值,回车
Using default value 1580
Last cylinder or +size or +sizeM or +sizeK (1580-2610, default 2610): +1G #大小为1G
 
Command (m for help): w #保存并推出
The partition table has been altered!
 
Calling ioctl() to re-read partition table.
 
WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.
 
[root@node1 ~]# partprobe /dev/sda   #重新加载内核模块
[root@node1 ~]# cat /proc/partitions
major minor #blocks name
 
   8     0   20971520 sda
   8     1     104391 sda1
   8     2   10482412 sda2
   8     3    2096482 sda3
   8     4     987997 sda4
node2
 
[root@node2 ~]# fdisk /dev/sda
 
The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
 
Command (m for help): n n   #增加一分区
Command action
   e   extended
   p   primary partition (1-4)
p #主分区
Selected partition 4
First cylinder (1580-2610, default 1580):  #默认值,回车
Using default value 1580
Last cylinder or +size or +sizeM or +sizeK (1580-2610, default 2610): +1G  #大小为1G
 
Command (m for help): w #保存并推出
The partition table has been altered!
 
Calling ioctl() to re-read partition table.
 
WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.
[root@node2 ~]# partprobe /dev/sda   #重新加载内核模块
[root@node2 ~]# cat /proc/partitions
major minor #blocks name
 
   8     0   20971520 sda
   8     1     104391 sda1
   8     2   10482412 sda2
   8     3    2096482 sda3
   8     4     987997 sda4
 
7、配置drbd:
node1
复制配置文件drbd.conf
[root@node1 ~]# cp /usr/share/doc/drbd83-8.3.8/drbd.conf   /etc/
备份global_common.conf
[root@node1 ~]# cd /etc/drbd.d/
[root@node1 drbd.d]# cp global_common.conf global_common.conf.bak
编辑global_common.conf
[root@node1 drbd.d]# vim global_common.conf
 
global {
        usage-count no;    #不开启统计
}
 
common {
        protocol C;
 
        handlers {
                pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
        }
 
        startup {
                 wfc-timeout 120;    #等待连接的超时时间
                 degr-wfc-timeout 100;  #等待降级的节点连接的超时时间
        }
 
        disk {
                 on-io-error detach;   #当出现I/O错误,节点要拆掉drbd设备
        }
 
        net {
                cram-hmac-alg "sha1";  #使用sha1加密算法实现节点认证
                shared-secret "mydrbdlab"; #认证码,两个节点内容要相同
        }
syncer {
                rate 100M;   #定义同步数据时的速率
        }
}
 
定义mysql的资源:
[root@node1 drbd.d]# vim mysql.res
resource mysql {
      
        on node1.abc.com {                    
                device /dev/drbd0;           
                disk /dev/sda4;               
                address 192.168.1.10:7898;    
                meta-disk internal;           
        }
        on node2.abc.com {
                device /dev/drbd0;
                disk /dev/sda4;
                address 192.168.1.20:7898;
                meta-disk internal;
        }
}
 
8、将global_common.conf、mysql.res、drbd.conf拷贝到node2中:
[root@node1 drbd.d]# scp global_common.conf node2.abc.com:/etc/drbd.d/
[root@node1 drbd.d]# scp mysql.res node2.abc.com:/etc/drbd.d/
[root@node1 drbd.d]# scp /etc/drbd.conf node2.abc.com:/etc/
 
9、分别在node1和node2上初始化定义的mysql的资源并启动相应的服务
node1
[root@node1 drbd.d]# drbdadm create-md mysql
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
[root@node1 drbd.d]# service drbd start
 
node2
[root@node2 drbd.d]# drbdadm create-md mysql
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
[root@node2 drbd.d]# service drbd start
 
查看drbd的状态:
 
node1
[root@node1 drbd.d]# drbd-overview
 0:mysql Connected Secondary/Secondary Inconsistent/Inconsistent C r----
node2
[root@node2 drbd.d]# drbd-overview
 0:mysql Connected Secondary/Secondary Inconsistent/Inconsistent C r----
 
由此可知此时两个节点均为Secondary状态。
 
现将node1设为主节点,在node1上执行:
node1
[root@node1 drbd.d]# drbdadm -- --overwrite-data-of-peer primary mysql
 
再次查看:
node1
[root@node1 drbd.d]# drbd-overview
 0:mysql SyncSource Primary/Secondary UpToDate/Inconsistent C r----
   [========>...........] sync'ed: 45.9% (538072/987928)K delay_probe: 40
 
node2
[root@node2 drbd.d]# drbd-overview
 0:mysql Connected Secondary/Primary UpToDate/UpToDate C r----
此时node1为主节点,node2为备份节点。
 
10、格式化、创建挂载点(只在主节点,即在node1上执行):
 
[root@node1 ~]# mkfs -t ext3 /dev/drbd0
[root@node1 ~]# mkdir /mysqldata
[root@node1 ~]# mount /dev/drbd0 /mysqldata/
[root@node1 ~]# cd /mysqldata
[root@node1 mysql]# touch node1   #创建名为node1的文件
[root@node1 mysql]# ll
total 16
drwx------ 2 root root 16384 Mar 14 19:10 lost+found
-rw-r--r-- 1 root root     0 Mar 14 19:19 node1
 
11、卸载drbd设备
node1
[root@node1 mysql]# cd
[root@node1 ~]# umount /mysqldata
将node1设置为secondary节点:
[root@node1 ~]# drbdadm secondary mysql
[root@node1 ~]# drbd-overview
 0:mysql Connected Secondary/Secondary UpToDate/UpToDate C r----
 
12、将node2设置为primary节点
node2
[root@node2 drbd.d]# cd
[root@node2 ~]# drbdadm primary mysql
[root@node2 ~]# drbd-overview
 0:mysql Connected Primary/Secondary UpToDate/UpToDate C r----
[root@node2 ~]# mkdir /mysqldata
[root@node2 ~]# mount /dev/drbd0 /mysqldata
[root@node2 ~]# cd /mysqldata
[root@node2 mysql]# ll
total 16
drwx------ 2 root root 16384 Mar 14 19:10 lost+found
-rw-r--r-- 1 root root     0 Mar 14 19:19 node1
卸载:
[root@node2 mysql]# cd
[root@node2 ~]# umount /mnt/mysql/
至此/dev/drbd0已同步,drbd已经正常安装完成。
 
13、安装并配置mysql:
node1
添加用户和组:
[root@node1 ~]# groupadd -r mysql
[root@node1 ~]# useradd -g mysql -r mysql
由于主设备才能读写,挂载,故我们还要设置node1为主设备,node2为从设备:
 
node2
[root@node2 ~]# drbdadm secondary mysql
node1
[root@node1 ~]# drbdadm primary mysql
 
挂载drbd设备:
[root@node1 ~]# mount /dev/drbd0 /mysqldata
[root@node1 ~]# mkdir /mysqldata/data
data目录要用存放mysql的数据,故改变其属主属组:
[root@node1 ~]# chown -R mysql.mysql /mysqldata/data/
 
安装mysql:
[root@node1 ~]# tar -zxvf mysql-5.5.15-linux2.6-i686.tar.gz -C /usr/local/
[root@node1 ~]# cd /usr/local/
[root@node1 local]# ln -sv mysql-5.5.15-linux2.6-i686 mysql
[root@node1 local]# cd mysql
[root@node1 mysql]# chown -R mysql:mysql #修改当前目录写文件的权限
初始化mysql数据库:
[root@node1 mysql]# scripts/mysql_install_db --user=mysql --datadir=/mysqldata/data
[root@node1 mysql]# chown -R root .
为mysql提供主配置文件:
[root@node1 mysql]# cp support-files/my-large.cnf /etc/my.cnf
编辑my.cnf:
[root@node1 mysql]# vim /etc/my.cnf
39行  thread_concurrency = 2
添加如下行指定mysql数据文件的存放位置:
datadir = /mysqldata/data
 
为mysql提供sysv服务脚本,使其能使用service命令:
[root@node1 mysql]# cp support-files/mysql.server /etc/rc.d/init.d/mysqld
 
node2上的配置文件,sysv服务脚本和此相同,故直接复制过去:
[root@node1 mysql]# scp /etc/my.cnf node2.abc.com:/etc/
[root@node1 mysql]# scp /etc/rc.d/init.d/mysqld node2.abc.com:/etc/rc.d/init.d/
 
添加至服务列表:
[root@node1 mysql]# chkconfig --add mysqld
确保开机不能自动启动,我们要用CRM控制:
[root@node1 mysql]# chkconfig mysqld off
启动服务:
[root@node1 mysql]# service mysqld start
 
测试之后关闭服务:
 
查看data是否有文件
[root@node1 mysql]# ll /mnt/mysql/data/
total 29756
-rw-rw---- 1 mysql mysql 5242880 Mar 14 20:17 ib_logfile0
-rw-rw---- 1 mysql mysql 5242880 Mar 14 20:17 ib_logfile1
-rw-rw---- 1 mysql mysql 18874368 Mar 14 20:17 ibdata1
drwx------ 2 mysql root      4096 Mar 14 19:52 mysql
-rw-rw---- 1 mysql mysql    27017 Mar 14 20:16 mysql-bin.000001
-rw-rw---- 1 mysql mysql   996460 Mar 14 20:16 mysql-bin.000002
-rw-rw---- 1 mysql mysql      107 Mar 14 20:17 mysql-bin.000003
-rw-rw---- 1 mysql mysql       57 Mar 14 20:17 mysql-bin.index
-rw-rw---- 1 mysql root      1699 Mar 14 20:17 node1.abc.com.err
-rw-rw---- 1 mysql mysql        5 Mar 14 20:17 node1.abc.com.pid
drwx------ 2 mysql mysql     4096 Mar 14 20:16 performance_schema
drwx------ 2 mysql root      4096 Mar 14 19:51 test
 
[root@node1 mysql]# service mysqld stop
为了使用mysql的安装符合系统使用规范,并将其开发组件导出给系统使用,这里还需要进行如下步骤:
输出mysql的man手册至man命令的查找路径:
[root@node1 mysql]# vim /etc/man.config
48行  MANPATH /usr/local/mysql/man
 
输出mysql的头文件至系统头文件路径/usr/include,这可以通过简单的创建链接实现:
[root@node1 mysql]# ln -sv /usr/local/mysql/include/ /usr/include/mysql
 
输出mysql的库文件给系统库查找路径:(文件只要是在/etc/ld.so.conf.d/下并且后缀是.conf就可以)
[root@node1 mysql]# echo '/usr/local/mysql/lib/' > /etc/ld.so.conf.d/mysql.conf
 
而后让系统重新载入系统库:
[root@node1 mysql]# ldconfig
 
修改PATH环境变量,让系统所有用户可以直接使用mysql的相关命令:
[root@node1 mysql]# vim /etc/profile
PATH=$PATH:/usr/local/mysql/bin
 
重新读取环境变量
[root@node1 mysql]# . /etc/profile
[root@node1 mysql]# echo $PATH
/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/mysql/bin
 
卸载drbd设备:
[root@node1 mysql]# umount /mysqldata
 
 
node2
添加用户和组:
[root@node2 ~]# groupadd -r mysql          
[root@node2 ~]# useradd -g mysql -r mysql
由于主设备才能读写,挂载,故我们还要设置node2为主设备,node1为从设备:
node1上操作:
[root@node1 mysql]# drbdadm secondary mysql
 
node2上操作:
[root@node2 ~]# drbdadm primary mysql
 
挂载drbd设备:
[root@node2 ~]# mount /dev/drbd0 /mysqldata
 
查看:
[root@node2 ~]# ll /mnt/mysql/data/
total 29752
-rw-rw---- 1 mysql mysql 5242880 Mar 14 20:20 ib_logfile0
-rw-rw---- 1 mysql mysql 5242880 Mar 14 20:17 ib_logfile1
-rw-rw---- 1 mysql mysql 18874368 Mar 14 20:20 ibdata1
drwx------ 2 mysql root      4096 Mar 14 19:52 mysql
-rw-rw---- 1 mysql mysql    27017 Mar 14 20:16 mysql-bin.000001
-rw-rw---- 1 mysql mysql   996460 Mar 14 20:16 mysql-bin.000002
-rw-rw---- 1 mysql mysql      126 Mar 14 20:20 mysql-bin.000003
-rw-rw---- 1 mysql mysql       57 Mar 14 20:17 mysql-bin.index
-rw-rw---- 1 mysql root      2116 Mar 14 20:20 node1.abc.com.err
drwx------ 2 mysql mysql     4096 Mar 14 20:16 performance_schema
drwx------ 2 mysql root      4096 Mar 14 19:51 test
 
安装mysql:
[root@node2 ~]# tar -zxvf mysql-5.5.15-linux2.6-i686.tar.gz -C /usr/local/
[root@node2 ~]# cd /usr/local/
[root@node2 local]# ln -sv mysql-5.5.15-linux2.6-i686 mysql
[root@node2 local]# cd mysql
 
一定不能对数据库进行初始化,因为我们在node1上已经初始化了:
[root@node2 mysql]# chown -R root:mysql .
 
mysql主配置文件和sysc服务脚本已经从node1复制过来了,不用在添加。
添加至服务列表:
[root@node2 mysql]# chkconfig --add mysqld
 
确保开机不能自动启动,我们要用CRM控制:
[root@node2 mysql]# chkconfig mysqld off
而后就可以启动服务测试使用了:(确保node1的mysql服务停止)
[root@node2 mysql]# service mysqld start
测试之后关闭服务:
查看其中是否有文件
[root@node2 mysql]# ll /mysqldata/data/
total 29764
-rw-rw---- 1 mysql mysql 5242880 Mar 14 20:48 ib_logfile0
-rw-rw---- 1 mysql mysql 5242880 Mar 14 20:17 ib_logfile1
-rw-rw---- 1 mysql mysql 18874368 Mar 14 20:20 ibdata1
drwx------ 2 mysql root      4096 Mar 14 19:52 mysql
-rw-rw---- 1 mysql mysql    27017 Mar 14 20:16 mysql-bin.000001
-rw-rw---- 1 mysql mysql   996460 Mar 14 20:16 mysql-bin.000002
-rw-rw---- 1 mysql mysql      126 Mar 14 20:20 mysql-bin.000003
-rw-rw---- 1 mysql mysql      107 Mar 14 20:48 mysql-bin.000004
-rw-rw---- 1 mysql mysql       76 Mar 14 20:48 mysql-bin.index
-rw-rw---- 1 mysql root      2116 Mar 14 20:20 node1.abc.com.err
-rw-rw---- 1 mysql root       937 Mar 14 20:48 node2.abc.com.err
-rw-rw---- 1 mysql mysql        5 Mar 14 20:48 node2.abc.com.pid
drwx------ 2 mysql mysql     4096 Mar 14 20:16 performance_schema
drwx------ 2 mysql root      4096 Mar 14 19:51 test
 
[root@node2 mysql]# service mysqld stop
 
为了使用mysql的安装符合系统使用规范,并将其开发组件导出给系统使用,这里还需要进行与node1上相同的操作,这里不再阐述。
卸载设备:
[root@node2 mysql]# umount /dev/drbd0
 
 
14、corosync+pacemaker的安装和配置
node1
[root@node1 ~]# yum localinstall *.rpm -y –nogpgcheck
此处不需要安装ldirectord
node:2
[root@node2 ~]# yum localinstall *.rpm -y –nogpgcheck
此处不需要安装ldirectord
 
对各个节点进行相应的配置:
 
node1
[root@node1 ~]# cd /etc/corosync/
[root@node1 corosync]# cp corosync.conf.example corosync.conf
[root@node1 corosync]# vim corosync.conf
 
# Please read the corosync.conf.5 manual page
compatibility: whitetank
 
totem {
        version: 2
        secauth: off
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.1.0        #只需改动这里
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
}
 
logging {
        fileline: off
        to_stderr: no   #是否发送标准出错
        to_logfile: yes
        to_syslog: yes    #系统日志 (建议关掉一个),会降低性能
        logfile: /var/log/cluster/corosync.log #需要手动创建目录cluster
        debug: off #排除时可以启动
        timestamp: on   #日志中是否记录时间
以下是openais的东西
        logger_subsys {
                subsys: AMF
                debug: off
        }
}
amf {
        mode: disabled
        }
补充一些东西,前面只是底层的东西,因为要用pacemaker
service {
        ver: 0
        name: pacemaker
use_mgmtd: yes
        }
虽然用不到openais ,但是会用到一些子选项
aisexec {
        user: root
        group: root
        }
 
创建cluster目录
[root@node1 corosync]# mkdir /var/log/cluster
为了便面其他主机加入该集群,需要认证,生成一authkey
 
[root@node1 corosync]# corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Writing corosync key to /etc/corosync/authkey.
[root@node1 corosync]# ll
total 28
-rw-r--r-- 1 root root 5384 Jul 28 2010 amf.conf.example
-r-------- 1 root root 128 Mar 14 21:13 authkey
-rw-r--r-- 1 root root 563 Mar 14 21:08 corosync.conf
-rw-r--r-- 1 root root 436 Jul 28 2010 corosync.conf.example
drwxr-xr-x 2 root root 4096 Jul 28 2010 service.d
drwxr-xr-x 2 root root 4096 Jul 28 2010 uidgid.d
[root@node1 corosync]# ssh node2.abc.com 'mkdir /var/log/cluster'
 
将node1节点上的文件拷贝到节点node2上面(记住要带-p)
[root@node1 corosync]# scp -p authkey corosync.conf node2.abc.com:/etc/corosync/
 
在node1和node2节点上面启动 corosync 的服务
node1
[root@node1 corosync]# service corosync start
node2
[root@node2 corosync]# service corosync start
 
验证corosync引擎是否正常启动了:
node1(注意此时node1是备份节点):
[root@node1 corosync]# grep -i -e "corosync cluster engine" -e "configuration file" /var/log/messages
Mar 14 21:16:54 node1 corosync[6081]:  [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Mar 14 21:16:54 node1 corosync[6081]:   [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
node2
[root@node2 corosync]# grep -i -e "corosync cluster engine" -e "configuration file" /var/log/messages
Mar 14 21:17:03 node2 corosync[5876]:   [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Mar 14 21:17:03 node2 corosync[5876]:   [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Mar 14 21:17:03 node2 corosync[5876]:   [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397.
Mar 14 21:17:53 node2 corosync[5913]:   [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Mar 14 21:17:53 node2 corosync[5913]:   [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Mar 14 21:17:53 node2 corosync[5913]:   [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397.
Mar 14 21:19:53 node2 corosync[5978]:   [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Mar 14 21:19:53 node2 corosync[5978]:   [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
 
查看初始化成员节点通知是否发出:
node1
[root@node1 corosync]# grep -i totem /var/log/messages
Apr 3 14:13:16 node1 corosync[387]:   [TOTEM ] Initializing transport (UDP/IP).
Apr 3 14:13:16 node1 corosync[387]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Apr 3 14:13:16 node1 corosync[387]:   [TOTEM ] The network interface [192.168.1.10] is now up.
Apr 3 14:13:17 node1 corosync[387]:   [TOTEM ] Process pause detected for 565 ms, flushing membership messages.
Apr 3 14:13:17 node1 corosync[387]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Apr 3 14:13:19 node1 corosync[387]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
node2
[root@node2 ~]# grep -i totem /var/log/messages
Apr 3 14:13:19 node2 corosync[32438]:   [TOTEM ] Initializing transport (UDP/IP).
Apr 3 14:13:19 node2 corosync[32438]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Apr  3 14:13:19 node2 corosync[32438]:   [TOTEM ] The network interface [192.168.1.20] is now up.
Apr 3 14:13:21 node2 corosync[32438]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查过程中是否有错误产生:
[root@node1 ~]# grep -i error: /var/log/messages |grep -v unpack_resources
[root@node2 ~]# grep -i error: /var/log/messages |grep -v unpack_resources
 
检查pacemaker时候已经启动了:
node1
[root@node1 ~]# grep -i pcmk_startup /var/log/messages
Mar 14 21:16:55 node1 corosync[6081]:   [pcmk ] info: pcmk_startup: CRM: Initialized
Mar 14 21:16:55 node1 corosync[6081]:   [pcmk ] Logging: Initialized pcmk_startup
Mar 14 21:16:55 node1 corosync[6081]:   [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Mar 14 21:16:55 node1 corosync[6081]:   [pcmk ] info: pcmk_startup: Service: 9
Mar 14 21:16:55 node1 corosync[6081]:   [pcmk ] info: pcmk_startup: Local hostname: node1.abc.com
Mar 14 22:13:15 node1 corosync[3179]:   [pcmk ] info: pcmk_startup: CRM: Initialized
Mar 14 22:13:15 node1 corosync[3179]:   [pcmk ] Logging: Initialized pcmk_startup
Mar 14 22:13:15 node1 corosync[3179]:   [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Mar 14 22:13:15 node1 corosync[3179]:   [pcmk ] info: pcmk_startup: Service: 9
Mar 14 22:13:15 node1 corosync[3179]:   [pcmk ] info: pcmk_startup: Local hostname: node1.abc.com
 
node2
[root@node2 ~]# grep -i pcmk_startup /var/log/messages
Mar 14 21:19:55 node2 corosync[5978]:   [pcmk ] info: pcmk_startup: CRM: Initialized
Mar 14 21:19:55 node2 corosync[5978]:   [pcmk ] Logging: Initialized pcmk_startup
Mar 14 21:19:55 node2 corosync[5978]:   [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Mar 14 21:19:55 node2 corosync[5978]:   [pcmk ] info: pcmk_startup: Service: 9
Mar 14 21:19:55 node2 corosync[5978]:   [pcmk ] info: pcmk_startup: Local hostname: node2.abc.com
Mar 14 22:13:20 node2 corosync[3174]:   [pcmk ] info: pcmk_startup: CRM: Initialized
Mar 14 22:13:20 node2 corosync[3174]:   [pcmk ] Logging: Initialized pcmk_startup
Mar 14 22:13:20 node2 corosync[3174]:   [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Mar 14 22:13:20 node2 corosync[3174]:   [pcmk ] info: pcmk_startup: Service: 9
Mar 14 22:13:21 node2 corosync[3174]:   [pcmk ] info: pcmk_startup: Local hostname: node2.abc.com
 
在node2(主节点)上查看群集的状态
 
[root@node2 corosync]# crm status
============
Last updated: Tue Apr 3 15:26:56 2012
Stack: openais
Current DC: node1.abc.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ node1.abc.com node2.abc.com ]
 
15、配置群集的工作属性
corosync默认启用了stonith,而当前集群并没有相应的stonith设备,因此此默认配置目前尚不可用,这可以通过如下命令先禁用stonith:
node1
[root@node1 ~]# crm configure property stonith-enabled=false
node2
[root@node2 ~]# crm configure property stonith-enabled=false
 
对于双节点的集群来说,我们要配置此选项来忽略quorum,即这时候票数不起作用,一个节点也能正常运行:
node1
[root@node1 ~]# crm configure property no-quorum-policy=ignore
node2
[root@node2 ~]# crm configure property no-quorum-policy=ignore
 
定义资源的粘性值,使资源不能再节点之间随意的切换,因为这样是非常浪费系统的资源的。
资源黏性值范围及其作用:
0:这是默认选项。资源放置在系统中的最适合位置。这意味着当负载能力“较好”或较差的节点变得可用时才转移资源。此选项的作用基本等同于自动故障回复,只是资源可能会转移到非之前活动的节点上;
大于0:资源更愿意留在当前位置,但是如果有更合适的节点可用时会移动。值越高表示资源越愿意留在当前位置;
小于0:资源更愿意移离当前位置。绝对值越高表示资源越愿意离开当前位置;
INFINITY:如果不是因节点不适合运行资源(节点关机、节点待机、达到migration-threshold 或配置更改)而强制资源转移,资源总是留在当前位置。此选项的作用几乎等同于完全禁用自动故障回复;
-INFINITY:资源总是移离当前位置;
 
我们这里可以通过以下方式为资源指定默认黏性值:
node1
[root@node1 ~]# crm configure rsc_defaults resource-stickiness=100
node2
[root@node2 ~]# crm configure rsc_defaults resource-stickiness=100
 
因字数限制,所以分上下两篇,请看“下”