Greenplum 2+4测试集群搭建
概述:
本文档用于Centos7.6系统下搭建Grennplum 5.25版本“2+4节点”的测试集群。集群规划如下:
主机名 | 角色 | 硬件配置 | 网卡配置 |
master | master节点 | 4core+8G+2*30G | 192.168.0.137;192.168.0.138 |
standby | standby节点 | 4core+8G+2*30G | 192.168.0.139;192.168.0.140 |
seg-1 | 计算节点1 | 4core+8G+2*30G | 192.168.0.142;192.168.0.143 |
seg-2 | 计算节点2 | 4core+8G+2*30G | 192.168.0.145;192.168.0.146 |
seg-3 | 计算节点3 | 4core+8G+2*30G | 192.168.0.151;192.168.0.152 |
seg-4 | 计算节点4 | 4core+8G+2*30G | 192.168.0.153;192.168.0.154 |
磁盘空间划分:
名称 | 磁盘 | 大小 |
/boot | sda | 200M |
swap | sda | 8G |
/ | sda | 21.8G |
/data | sdb | 30G |
一、Linux服务调整(所有服务器)
1、关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
systemctl status firewalld
2、关闭Selinux
setenforce 0
vim /etc/selinux/config
3、关闭不必要的服务
系统服务名 | 摘要 | 建议 |
abrt-oops | 内核严重异常信息收集工具 | off |
abrt-ccpp | C/C++程序问题收集工具 | off |
lvm2-monitor | lvm的mirror监控进程 | off |
NetworkManager | 网络自动管理进程 | off |
Postfix | 电子邮件服务 | off |
Rhnsd | 操作系统自动更新进程 | off |
Rhsmcertd | 在线更新订阅状态信息 | off |
Rpcbind | RPC服务相关 | off |
Rpcgssd | 针对nfsv4客户端管理RPCSEC GSS contexts进程 | off |
Rpcidmapd | RPC名字与UID/GID映射进程 | off |
4、修改sshd服务的参数
修改/etc/ssh/sshd_config文件:
vim /etc/ssh/sshd_config
#MaxStartups 10:30:100
MaxStartups 5000:30:10000
二、系统参数配置
1、修改内核参数
vim /etc/sysctl.conf
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 500 1024000 200 4096
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.conf.all.arp_filter = 1 net.ipv4.ip_local_port_range = 1025 65535 net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152 net.core.wmem_max = 2097152
vm.overcommit_memory = 2
net.ipv4.ip_forward = 0
sysctl -p //使修改生效
2、配置用户资源限额
vim /etc/security/limits.conf
* soft nofile 1048576
* hard nofile 1048576
* soft nproc 131072
* hard nproc 131072
vim /etc/security/limits.d/20-nproc.conf
* soft nofile 1048576
* hard nofile 1048576
* soft nproc 131072
* hard nproc 131072
3、配置数据盘挂载参数
vim /etc/fstab
将defaults替换为‘nodev,noatime,nobarrier,inode64,allocsize=16m’
4、配置磁盘预读取
echo "blockdev --setra 16384 /dev/sda" >> /etc/rc.d/rc.local
echo "blockdev --setra 16384 /dev/sdb" >> /etc/rc.d/rc.local
chmod +X /etc/rc.d/rc.local //为文件添加执行权限
5、修改磁盘调度算法(elevator=deadline)和磁盘IO参数(hugepage)
cat /sys/block/sda/queue/scheduler
grubby --update-kernel=ALL --args="elevator=deadline"
grubby --update-kernel=ALL --args="transparent_hugepage=never"
重启生效
reboot
检查相关配置是否生效
blockdev --getra /dev/sda
blockdev --getra /dev/sdb
ulimit -n
ulimit -u
cat /sys/block/*/queue/scheduler
cat /sys/kernel/mm/transparent_hugepage/enabled
getenforce
6、配置hosts解析
vim /etc/hosts
##
#ens160:
192.168.0.137 master
192.168.0.139 standby
192.168.0.142 seg-1
192.168.0.145 seg-2
192.168.0.151 seg-3
192.168.0.153 seg-4
##
#ens192:
192.168.0.138 eth2-master
192.168.0.140 eth2-standby
192.168.0.143 eth2-seg-1
192.168.0.146 eth2-seg-2
192.168.0.152 eth2-seg-3
192.168.0.154 eth2-seg-4
将/etc/hosts文件scp到其他主机上:
scp /etc/hosts ip:/etc/hosts
三、Greenplum软件安装
1、Greenplum软件下载与解压
[root@master ~]# ll
total 276820
-rw-------. 1 root root 2095 Mar 30 14:53 anaconda-ks.cfg
-rwxr-xr-x 1 root root 283451930 Mar 31 15:43 greenplum-db-5.25.0-rhel7-x86_64.bin
-rw-r--r--. 1 root root 2126 Mar 30 16:14 initial-setup-ks.cfg
2、依赖检查
rpm -qa | grep zlib
rpm -qa | grep bash
rpm -qa | grep zip
rpm -qa | grep tar
rpm -qa | grep sed
rpm -qa | grep perl
3、安装
./greenplum-db-5.25.0-rhel7-x86_64.bin
[root@master ~]# ll /usr/local/
total 0
drwxr-xr-x. 2 root root 6 Apr 11 2018 bin
drwxr-xr-x. 2 root root 6 Apr 11 2018 etc
drwxr-xr-x. 2 root root 6 Apr 11 2018 games
lrwxrwxrwx 1 root root 21 Mar 31 16:07 greenplum-db -> ./greenplum-db-5.25.0
drwxr-xr-x 11 root root 138 Mar 31 16:07 greenplum-db-5.25.0
drwxr-xr-x. 2 root root 6 Apr 11 2018 include
drwxr-xr-x. 2 root root 6 Apr 11 2018 lib
drwxr-xr-x. 2 root root 6 Apr 11 2018 lib64
drwxr-xr-x. 2 root root 6 Apr 11 2018 libexec
drwxr-xr-x. 2 root root 6 Apr 11 2018 sbin
drwxr-xr-x. 5 root root 49 Mar 30 14:40 share
drwxr-xr-x. 2 root root 6 Apr 11 2018 src
使GP的环境变量生效
source /usr/local/greenplum-db/greenplum_path.sh
切换窗口或用户会使环境变量失效,需要重新source。
4、配置主机互信
1、创建host file文件
[root@master greenplum-db]# echo "master
> standby
> seg-1
> seg-2
> seg-3
> seg-4
> eth2-master
> eth2-standby
> eth2-seg-1
> eth2-seg-2
> eth2-seg-3
> eth2-seg-4">hostfile_exkeys
2、通过gpssh-exkeys程序配置主机间互信
gpssh-exkeys -f hostfile_exkeys
5、对其他主机安装Greenplum软件
在master使用gpseginstall程序对所有主机安装Greenplum软件,软件安装过程中会自动创建gpadmin用户组与用户。
这一步一定要在GP安装目录下完成
gpseginstall -f hostfile_exkeys
6、创建数据目录并授权
创建master、standby的数据目录并赋权
touch hostfile_masters //创建只包含master、standby的hostfile
echo "master>standby">hostfile_masters
gpssh -f hostfile_masters -e 'mkdir /data/master'
gpssh -f hostfile_masters -e 'chown -R gpadmin:gpadmin /data/master'
创建segments的数据目录并赋权
[root@master greenplum-db]# echo "seg-1
> seg-2
> seg-3
> seg-4">hostfile_segments
[root@master greenplum-db]# gpssh -f hostfile_segments -e 'mkdir /data/primary'
[root@master greenplum-db]# gpssh -f hostfile_segments -e 'mkdir /data/mirror'
[root@master greenplum-db]# gpssh -f hostfile_segments -e 'chown -R gpadmin:gpadmin /data/mirror'
[root@master greenplum-db]# gpssh -f hostfile_segments -e 'chown -R gpadmin:gpadmin /data/primary'
7、配置时间同步服务
启动ntpd服务
gpssh -f hostfile_exkeys
systemctl start ntpd
systemctl enable ntpd
所有主机设置定时计划,同步时间
crontab -e
输入:
0 1 * * * /usr/sbin/ntpdate -u ntp.api.bz
crantab -l //查看定时计划
四、系统测试
1、网络测试
使用gpcheckperf测试
创建只包含第一个网卡对应主机名的hostfile_net1文件和只包含第二张网卡对应主机名的hostfile_net2文件:
[root@master greenplum-db]# cat hostfile_net1
master
standby
seg-1
seg-2
seg-3
seg-4
[root@master greenplum-db]# cat hostfile_net2
eth2-master
eth2-standby
eth2-seg-1
eth2-seg-2
eth2-seg-3
eth2-seg-4
测试第一个网卡所在网段的网络情况
[root@master greenplum-db]# gpcheckperf -f hostfile_net1 -r N -d /tmp > net1.out
[root@master greenplum-db]# cat net1.out
/usr/local/greenplum-db/./bin/gpcheckperf -f hostfile_net1 -r N -d /tmp
-------------------
-- NETPERF TEST
-------------------
[Warning] retrying with port 23012
====================
== RESULT
====================
Netperf bisection bandwidth test
master -> standby = 920.450000
seg-1 -> seg-2 = 837.850000
seg-3 -> seg-4 = 738.410000
standby -> master = 912.270000
seg-2 -> seg-1 = 900.320000
seg-4 -> seg-3 = 730.650000
Summary:
sum = 5039.95 MB/sec
min = 730.65 MB/sec
max = 920.45 MB/sec
avg = 839.99 MB/sec
median = 900.32 MB/sec
[Warning] connection between seg-3 and seg-4 is no good
[Warning] connection between seg-4 and seg-3 is no good
测试第二个网卡所在网段的网络情况
gpcheckperf -f hostfile_net2 -r N -d /tmp > net2.out
注意:生产环境下要求网络速率在 1GM 以上,速率越低性能越低。
2、I/O测试
使用gpcheckperf测试集群主机的I/O性能。
注意:生产环境要求每台机器的 I/O 性能均达到 1500M/S 以上的速率。
gpcheckperf -f hostfile_exkeys -r ds -D -d /data > diskio.out
五、创建Greenplum数据
1、创建数据库初始化所需的hostfile
[root@master greenplum-db]# su - gpadmin
Last login: Tue Mar 31 17:33:38 CST 2020 on pts/0
[gpadmin@master ~]$ mkdir /home/gpadmin/gpconfigs
[gpadmin@master ~]$ cp /usr/local/greenplum-db/hostfile_* /home/gpadmin/gpconfigs/
[gpadmin@master ~]$ cd ./gpconfigs/
[gpadmin@master gpconfigs]$ ll
total 20
-rw-r--r-- 1 gpadmin gpadmin 108 Apr 1 13:41 hostfile_exkeys
-rw-r--r-- 1 gpadmin gpadmin 15 Apr 1 13:41 hostfile_masters
-rw-r--r-- 1 gpadmin gpadmin 39 Apr 1 13:41 hostfile_net1
-rw-r--r-- 1 gpadmin gpadmin 69 Apr 1 13:41 hostfile_net2
-rw-r--r-- 1 gpadmin gpadmin 24 Apr 1 13:41 hostfile_segments
[gpadmin@master gpconfigs]$ mv hostfile_exkeys hostfile_allnode
[gpadmin@master gpconfigs]$ ll
total 20
-rw-r--r-- 1 gpadmin gpadmin 108 Apr 1 13:41 hostfile_allnode
-rw-r--r-- 1 gpadmin gpadmin 15 Apr 1 13:41 hostfile_masters
-rw-r--r-- 1 gpadmin gpadmin 39 Apr 1 13:41 hostfile_net1
-rw-r--r-- 1 gpadmin gpadmin 69 Apr 1 13:41 hostfile_net2
-rw-r--r-- 1 gpadmin gpadmin 24 Apr 1 13:41 hostfile_segments
2、创建Greenplum数据库配置文件
[gpadmin@master gpconfigs]$ source /usr/local/greenplum-db/greenplum_path.sh
[gpadmin@master gpconfigs]$ cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpconfigs/
[gpadmin@master gpconfigs]$ ll
total 24
-rwxr-xr-x 1 gpadmin gpadmin 2663 Apr 1 13:48 gpinitsystem_config
-rw-r--r-- 1 gpadmin gpadmin 108 Apr 1 13:41 hostfile_allnode
-rw-r--r-- 1 gpadmin gpadmin 15 Apr 1 13:41 hostfile_masters
-rw-r--r-- 1 gpadmin gpadmin 39 Apr 1 13:41 hostfile_net1
-rw-r--r-- 1 gpadmin gpadmin 69 Apr 1 13:41 hostfile_net2
-rw-r--r-- 1 gpadmin gpadmin 24 Apr 1 13:41 hostfile_segments
配置gpinitsystem_config文件内容
[gpadmin@master gpconfigs]$ cat gpinitsystem_config
# FILE NAME: gpinitsystem_config
# Configuration file needed by the gpinitsystem
################################################
#### REQUIRED PARAMETERS
################################################
#### Name of this Greenplum system enclosed in quotes.
ARRAY_NAME="Greenplum Data Platform"
#### Naming convention for utility-generated data directories.
SEG_PREFIX=gpseg
#### Base number by which primary segment port numbers
#### are calculated.
PORT_BASE=6001
#### File system location(s) where primary segment data directories
#### will be created. The number of locations in the list dictate
#### the number of primary segments that will get created per
#### physical host (if multiple addresses for a host are listed in
#### the hostfile, the number of segments will be spread evenly across
#### the specified interface addresses).
declare -a DATA_DIRECTORY=(/data/primary /data/primary)
#### OS-configured hostname or IP address of the master host.
MASTER_HOSTNAME=master
#### File system location where the master data directory
#### will be created.
MASTER_DIRECTORY=/data/master
#### Port number for the master instance.
MASTER_PORT=5432
#### Shell utility used to connect to remote hosts.
TRUSTED_SHELL=ssh
#### Maximum log file segments between automatic WAL checkpoints.
CHECK_POINT_SEGMENTS=8
#### Default server-side character set encoding.
ENCODING=UNICODE
################################################
#### OPTIONAL MIRROR PARAMETERS
################################################
#### Base number by which mirror segment port numbers
#### are calculated.
MIRROR_PORT_BASE=21050
#### Base number by which primary file replication port
#### numbers are calculated.
REPLICATION_PORT_BASE=22050
#### Base number by which mirror file replication port
#### numbers are calculated.
MIRROR_REPLICATION_PORT_BASE=23050
#### File system location(s) where mirror segment data directories
#### will be created. The number of mirror locations must equal the
#### number of primary locations as specified in the
#### DATA_DIRECTORY parameter.
declare -a MIRROR_DATA_DIRECTORY=(/data/mirror /data/mirror)
################################################
#### OTHER OPTIONAL PARAMETERS
################################################
#### Create a database of this name after initialization.
DATABASE_NAME=mdb
#### Specify the location of the host address file here instead of
#### with the the -h option of gpinitsystem.
#MACHINE_LIST_FILE=/home/gpadmin/gpconfigs/hostfile_gpinitsystem
3、gp初始化
[gpadmin@master gpconfigs]$ gpinitsystem -c /home/gpadmin/gpconfigs/gpinitsystem_config -h /home/gpadmin/gpconfigs/hostfile_segments -s standby -S
初始化完成后设置gp环境变量
master、standby
[gpadmin@master ~]$ cat /home/gpadmin/.bashrc
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=
# User specific aliases and functions
source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/data/master/gpseg-1
segments:
echo "source /usr/local/greenplum-db/greenplum_path.sh">> /home/gpadmin/.bashrc
4、注册PL/python
操作系统默认已安装了 Python,只需要在 gpadmin 用户下执行以下命令,就可以在指 定的数据库中使用 SQL 命令注册 PL/Python 语言。
[gpadmin@master ~]$ createlang plpythonu -d mdb
如果要卸载已注册的 PL/Python 语言,则执行下面的命令:
[gpadmin@master ~]$ droplang plpythonu -d mdb
5、安装Oracle兼容函数
在 gpadmin 用户下执行以下命令:
psql -d mdb -f /usr/local/greenplum-db/share/postgresql/contrib/orafunc.sql
如果要卸载该函数,则执行下面的命令:
psql -d mdb -f /usr/local/greenplum-db/share/postgresql/contrib/uninstall_orafunc.sql