Greenplum 2+4测试集群搭建

概述:

本文档用于Centos7.6系统下搭建Grennplum 5.25版本“2+4节点”的测试集群。集群规划如下:

主机名

角色

硬件配置

网卡配置

master

master节点

4core+8G+2*30G

192.168.0.137;192.168.0.138

standby

standby节点

4core+8G+2*30G

192.168.0.139;192.168.0.140

seg-1

计算节点1

4core+8G+2*30G

192.168.0.142;192.168.0.143

seg-2

计算节点2

4core+8G+2*30G

192.168.0.145;192.168.0.146

seg-3

计算节点3

4core+8G+2*30G

192.168.0.151;192.168.0.152

seg-4

计算节点4

4core+8G+2*30G

192.168.0.153;192.168.0.154

磁盘空间划分:

名称

磁盘

大小

/boot

sda

200M

swap

sda

8G

/

sda

21.8G

/data

sdb

30G

一、Linux服务调整(所有服务器)

1、关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
systemctl status firewalld
2、关闭Selinux
setenforce 0
vim /etc/selinux/config
3、关闭不必要的服务

系统服务名

摘要

建议

abrt-oops

内核严重异常信息收集工具

off

abrt-ccpp

C/C++程序问题收集工具

off

lvm2-monitor

lvm的mirror监控进程

off

NetworkManager

网络自动管理进程

off

Postfix

电子邮件服务

off

Rhnsd

操作系统自动更新进程

off

Rhsmcertd

在线更新订阅状态信息

off

Rpcbind

RPC服务相关

off

Rpcgssd

针对nfsv4客户端管理RPCSEC GSS contexts进程

off

Rpcidmapd

RPC名字与UID/GID映射进程

off

4、修改sshd服务的参数

修改/etc/ssh/sshd_config文件:

vim /etc/ssh/sshd_config
#MaxStartups 10:30:100
MaxStartups 5000:30:10000

二、系统参数配置

1、修改内核参数
vim /etc/sysctl.conf

kernel.shmmax = 500000000 
kernel.shmmni = 4096 
kernel.shmall = 4000000000 
kernel.sem = 500 1024000 200 4096 
kernel.sysrq = 1 
kernel.core_uses_pid = 1 
kernel.msgmnb = 65536 
kernel.msgmax = 65536 
kernel.msgmni = 2048 
net.ipv4.tcp_syncookies = 1 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_tw_recycle = 1 
net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.conf.all.arp_filter = 1 net.ipv4.ip_local_port_range = 1025 65535 net.core.netdev_max_backlog = 10000 
net.core.rmem_max = 2097152 net.core.wmem_max = 2097152 
vm.overcommit_memory = 2 
net.ipv4.ip_forward = 0 

sysctl -p			//使修改生效
2、配置用户资源限额
vim /etc/security/limits.conf

* soft nofile 1048576 
* hard nofile 1048576 
* soft nproc 131072 
* hard nproc 131072

vim /etc/security/limits.d/20-nproc.conf

* soft nofile 1048576 
* hard nofile 1048576 
* soft nproc 131072 
* hard nproc 131072
3、配置数据盘挂载参数
vim /etc/fstab

将defaults替换为‘nodev,noatime,nobarrier,inode64,allocsize=16m’
4、配置磁盘预读取
echo "blockdev --setra 16384 /dev/sda" >> /etc/rc.d/rc.local 

echo "blockdev --setra 16384 /dev/sdb" >> /etc/rc.d/rc.local

chmod +X /etc/rc.d/rc.local  //为文件添加执行权限
5、修改磁盘调度算法(elevator=deadline)和磁盘IO参数(hugepage)
cat /sys/block/sda/queue/scheduler

grubby --update-kernel=ALL --args="elevator=deadline" 

grubby --update-kernel=ALL --args="transparent_hugepage=never"

重启生效

reboot

检查相关配置是否生效

blockdev --getra /dev/sda
blockdev --getra /dev/sdb
ulimit -n
ulimit -u
cat /sys/block/*/queue/scheduler
cat /sys/kernel/mm/transparent_hugepage/enabled
getenforce
6、配置hosts解析
vim /etc/hosts

##
#ens160:
192.168.0.137 master
192.168.0.139 standby
192.168.0.142 seg-1
192.168.0.145 seg-2
192.168.0.151 seg-3
192.168.0.153 seg-4
##
#ens192:
192.168.0.138 eth2-master
192.168.0.140 eth2-standby
192.168.0.143 eth2-seg-1
192.168.0.146 eth2-seg-2
192.168.0.152 eth2-seg-3
192.168.0.154 eth2-seg-4

将/etc/hosts文件scp到其他主机上:

scp /etc/hosts ip:/etc/hosts

三、Greenplum软件安装

1、Greenplum软件下载与解压
[root@master ~]# ll
total 276820
-rw-------. 1 root root      2095 Mar 30 14:53 anaconda-ks.cfg
-rwxr-xr-x  1 root root 283451930 Mar 31 15:43 greenplum-db-5.25.0-rhel7-x86_64.bin
-rw-r--r--. 1 root root      2126 Mar 30 16:14 initial-setup-ks.cfg
2、依赖检查
rpm -qa | grep zlib
rpm -qa | grep bash
rpm -qa | grep zip
rpm -qa | grep tar
rpm -qa | grep sed
rpm -qa | grep perl
3、安装

./greenplum-db-5.25.0-rhel7-x86_64.bin

[root@master ~]# ll /usr/local/
total 0
drwxr-xr-x.  2 root root   6 Apr 11  2018 bin
drwxr-xr-x.  2 root root   6 Apr 11  2018 etc
drwxr-xr-x.  2 root root   6 Apr 11  2018 games
lrwxrwxrwx   1 root root  21 Mar 31 16:07 greenplum-db -> ./greenplum-db-5.25.0
drwxr-xr-x  11 root root 138 Mar 31 16:07 greenplum-db-5.25.0
drwxr-xr-x.  2 root root   6 Apr 11  2018 include
drwxr-xr-x.  2 root root   6 Apr 11  2018 lib
drwxr-xr-x.  2 root root   6 Apr 11  2018 lib64
drwxr-xr-x.  2 root root   6 Apr 11  2018 libexec
drwxr-xr-x.  2 root root   6 Apr 11  2018 sbin
drwxr-xr-x.  5 root root  49 Mar 30 14:40 share
drwxr-xr-x.  2 root root   6 Apr 11  2018 src

使GP的环境变量生效

source /usr/local/greenplum-db/greenplum_path.sh

切换窗口或用户会使环境变量失效,需要重新source。

4、配置主机互信

1、创建host file文件

[root@master greenplum-db]# echo "master
> standby
> seg-1
> seg-2
> seg-3
> seg-4
> eth2-master
> eth2-standby
> eth2-seg-1
> eth2-seg-2
> eth2-seg-3
> eth2-seg-4">hostfile_exkeys

2、通过gpssh-exkeys程序配置主机间互信

gpssh-exkeys -f hostfile_exkeys

5、对其他主机安装Greenplum软件

在master使用gpseginstall程序对所有主机安装Greenplum软件,软件安装过程中会自动创建gpadmin用户组与用户。

这一步一定要在GP安装目录下完成

gpseginstall -f hostfile_exkeys

6、创建数据目录并授权

创建master、standby的数据目录并赋权

touch hostfile_masters			//创建只包含master、standby的hostfile
echo "master>standby">hostfile_masters
gpssh -f hostfile_masters -e 'mkdir /data/master'
gpssh -f hostfile_masters -e 'chown -R gpadmin:gpadmin /data/master'

创建segments的数据目录并赋权

[root@master greenplum-db]# echo "seg-1
> seg-2
> seg-3
> seg-4">hostfile_segments

[root@master greenplum-db]# gpssh -f hostfile_segments -e 'mkdir /data/primary'

[root@master greenplum-db]# gpssh -f hostfile_segments -e 'mkdir /data/mirror'

[root@master greenplum-db]# gpssh -f hostfile_segments -e 'chown -R gpadmin:gpadmin /data/mirror'

[root@master greenplum-db]# gpssh -f hostfile_segments -e 'chown -R gpadmin:gpadmin /data/primary'
7、配置时间同步服务

启动ntpd服务

gpssh -f hostfile_exkeys

systemctl start ntpd

systemctl enable ntpd

所有主机设置定时计划,同步时间

crontab -e

输入:
0 1 * * * /usr/sbin/ntpdate -u ntp.api.bz

crantab -l				//查看定时计划

四、系统测试

1、网络测试

使用gpcheckperf测试

创建只包含第一个网卡对应主机名的hostfile_net1文件和只包含第二张网卡对应主机名的hostfile_net2文件:

[root@master greenplum-db]# cat hostfile_net1 
master
standby
seg-1
seg-2
seg-3
seg-4

[root@master greenplum-db]# cat hostfile_net2
eth2-master
eth2-standby
eth2-seg-1
eth2-seg-2
eth2-seg-3
eth2-seg-4

测试第一个网卡所在网段的网络情况

[root@master greenplum-db]# gpcheckperf -f hostfile_net1 -r N -d /tmp > net1.out
[root@master greenplum-db]# cat net1.out
/usr/local/greenplum-db/./bin/gpcheckperf -f hostfile_net1 -r N -d /tmp

-------------------
--  NETPERF TEST
-------------------
[Warning] retrying with port 23012

====================
==  RESULT
====================
Netperf bisection bandwidth test
master -> standby = 920.450000
seg-1 -> seg-2 = 837.850000
seg-3 -> seg-4 = 738.410000
standby -> master = 912.270000
seg-2 -> seg-1 = 900.320000
seg-4 -> seg-3 = 730.650000

Summary:
sum = 5039.95 MB/sec
min = 730.65 MB/sec
max = 920.45 MB/sec
avg = 839.99 MB/sec
median = 900.32 MB/sec

[Warning] connection between seg-3 and seg-4 is no good
[Warning] connection between seg-4 and seg-3 is no good

测试第二个网卡所在网段的网络情况

gpcheckperf -f hostfile_net2 -r N -d /tmp > net2.out

注意:生产环境下要求网络速率在 1GM 以上,速率越低性能越低。

2、I/O测试

使用gpcheckperf测试集群主机的I/O性能。

注意:生产环境要求每台机器的 I/O 性能均达到 1500M/S 以上的速率。

gpcheckperf -f hostfile_exkeys -r ds -D -d /data > diskio.out

五、创建Greenplum数据

1、创建数据库初始化所需的hostfile
[root@master greenplum-db]# su - gpadmin
Last login: Tue Mar 31 17:33:38 CST 2020 on pts/0
[gpadmin@master ~]$ mkdir /home/gpadmin/gpconfigs
[gpadmin@master ~]$ cp /usr/local/greenplum-db/hostfile_* /home/gpadmin/gpconfigs/
[gpadmin@master ~]$ cd ./gpconfigs/
[gpadmin@master gpconfigs]$ ll
total 20
-rw-r--r-- 1 gpadmin gpadmin 108 Apr  1 13:41 hostfile_exkeys
-rw-r--r-- 1 gpadmin gpadmin  15 Apr  1 13:41 hostfile_masters
-rw-r--r-- 1 gpadmin gpadmin  39 Apr  1 13:41 hostfile_net1
-rw-r--r-- 1 gpadmin gpadmin  69 Apr  1 13:41 hostfile_net2
-rw-r--r-- 1 gpadmin gpadmin  24 Apr  1 13:41 hostfile_segments
[gpadmin@master gpconfigs]$ mv hostfile_exkeys hostfile_allnode
[gpadmin@master gpconfigs]$ ll
total 20
-rw-r--r-- 1 gpadmin gpadmin 108 Apr  1 13:41 hostfile_allnode
-rw-r--r-- 1 gpadmin gpadmin  15 Apr  1 13:41 hostfile_masters
-rw-r--r-- 1 gpadmin gpadmin  39 Apr  1 13:41 hostfile_net1
-rw-r--r-- 1 gpadmin gpadmin  69 Apr  1 13:41 hostfile_net2
-rw-r--r-- 1 gpadmin gpadmin  24 Apr  1 13:41 hostfile_segments
2、创建Greenplum数据库配置文件
[gpadmin@master gpconfigs]$ source /usr/local/greenplum-db/greenplum_path.sh 
[gpadmin@master gpconfigs]$ cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpconfigs/
[gpadmin@master gpconfigs]$ ll
total 24
-rwxr-xr-x 1 gpadmin gpadmin 2663 Apr  1 13:48 gpinitsystem_config
-rw-r--r-- 1 gpadmin gpadmin  108 Apr  1 13:41 hostfile_allnode
-rw-r--r-- 1 gpadmin gpadmin   15 Apr  1 13:41 hostfile_masters
-rw-r--r-- 1 gpadmin gpadmin   39 Apr  1 13:41 hostfile_net1
-rw-r--r-- 1 gpadmin gpadmin   69 Apr  1 13:41 hostfile_net2
-rw-r--r-- 1 gpadmin gpadmin   24 Apr  1 13:41 hostfile_segments

配置gpinitsystem_config文件内容

[gpadmin@master gpconfigs]$ cat gpinitsystem_config 
# FILE NAME: gpinitsystem_config

# Configuration file needed by the gpinitsystem

################################################
#### REQUIRED PARAMETERS
################################################

#### Name of this Greenplum system enclosed in quotes.
ARRAY_NAME="Greenplum Data Platform"

#### Naming convention for utility-generated data directories.
SEG_PREFIX=gpseg

#### Base number by which primary segment port numbers 
#### are calculated.
PORT_BASE=6001

#### File system location(s) where primary segment data directories 
#### will be created. The number of locations in the list dictate
#### the number of primary segments that will get created per
#### physical host (if multiple addresses for a host are listed in 
#### the hostfile, the number of segments will be spread evenly across
#### the specified interface addresses).
declare -a DATA_DIRECTORY=(/data/primary /data/primary)

#### OS-configured hostname or IP address of the master host.
MASTER_HOSTNAME=master

#### File system location where the master data directory 
#### will be created.
MASTER_DIRECTORY=/data/master

#### Port number for the master instance. 
MASTER_PORT=5432

#### Shell utility used to connect to remote hosts.
TRUSTED_SHELL=ssh

#### Maximum log file segments between automatic WAL checkpoints.
CHECK_POINT_SEGMENTS=8

#### Default server-side character set encoding.
ENCODING=UNICODE

################################################
#### OPTIONAL MIRROR PARAMETERS
################################################

#### Base number by which mirror segment port numbers 
#### are calculated.
MIRROR_PORT_BASE=21050

#### Base number by which primary file replication port 
#### numbers are calculated.
REPLICATION_PORT_BASE=22050

#### Base number by which mirror file replication port 
#### numbers are calculated. 
MIRROR_REPLICATION_PORT_BASE=23050

#### File system location(s) where mirror segment data directories 
#### will be created. The number of mirror locations must equal the
#### number of primary locations as specified in the 
#### DATA_DIRECTORY parameter.
declare -a MIRROR_DATA_DIRECTORY=(/data/mirror /data/mirror)


################################################
#### OTHER OPTIONAL PARAMETERS
################################################

#### Create a database of this name after initialization.
DATABASE_NAME=mdb

#### Specify the location of the host address file here instead of
#### with the the -h option of gpinitsystem.
#MACHINE_LIST_FILE=/home/gpadmin/gpconfigs/hostfile_gpinitsystem
3、gp初始化
[gpadmin@master gpconfigs]$ gpinitsystem -c /home/gpadmin/gpconfigs/gpinitsystem_config -h /home/gpadmin/gpconfigs/hostfile_segments -s standby -S

初始化完成后设置gp环境变量

master、standby

[gpadmin@master ~]$ cat /home/gpadmin/.bashrc 
# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
	. /etc/bashrc
fi

# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=

# User specific aliases and functions
source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/data/master/gpseg-1

segments:

echo "source /usr/local/greenplum-db/greenplum_path.sh">> /home/gpadmin/.bashrc

4、注册PL/python

操作系统默认已安装了 Python,只需要在 gpadmin 用户下执行以下命令,就可以在指 定的数据库中使用 SQL 命令注册 PL/Python 语言。

[gpadmin@master ~]$ createlang plpythonu -d mdb

如果要卸载已注册的 PL/Python 语言,则执行下面的命令:

[gpadmin@master ~]$ droplang plpythonu -d mdb
5、安装Oracle兼容函数

在 gpadmin 用户下执行以下命令:

psql -d mdb -f /usr/local/greenplum-db/share/postgresql/contrib/orafunc.sql

如果要卸载该函数,则执行下面的命令:

psql -d mdb -f /usr/local/greenplum-db/share/postgresql/contrib/uninstall_orafunc.sql