1.安装依赖项
yum -y install libicu perl-JSON-XS
2.下载clickhou的RPM包
各版本系统与包的地址:https://packagecloud.io/Altinity/clickhouse
以clickhouse19.16.14.65版本为例:使用wget下载RPM包:
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-common-19.16.14.65-1.el7.x86_64.rpm/download.rpm
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-common-static-19.16.14.65-1.el7.x86_64.rpm/download.rpm
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.16.14.65-1.el7.x86_64.rpm/download.rpm
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-debuginfo-19.16.14.65-1.el7.x86_64.rpm/download.rpm
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-test-19.16.14.65-1.el7.x86_64.rpm/download.rpm(为一些clickhouse官网提供的测试数据, 执行失败了也没关系, 不影响安装使用)
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.16.14.65-1.el7.x86_64.rpm/download.rpm
3.clickhouse双实例高可用说明
clickhouse 存在数据分片(shard)及副本(replica)的概念。数据分片一般是指同时有多少的并发度,即一张分布式的表会被分开存储成多少份。理论上说数据分片越多,查询计算起来越快。副本就是将分布式存储的表进行实时备份,在某个节点发生问题的时候可以用备份补上,从而不影响实际使用。
因为当前服务器较少,先进行小规模实验。采用一个服务器启动两个clickhouse实例,一个实例作为数据分片, 另一个作为下一个数据分片的备份。
例如:
服务器1启动 分片A 分片B(备份)
服务器2启动 分片B 分片C(备份)
服务器3启动 分片C 分片D(备份)
.........
服务器n启动 分片N 分片A(备份)
这样如果某个服务器挂掉,总能在其他服务器找到备份节点恢复过来。
因为clickhouse的备份高度依赖于zookeeper,所以需要提前安装好zookeeper服务。
4.配置修改
配置文件默认是在/etc/clickhouse-server路径下,有一个config.xml和users.xml。因为我们一台服务器要启动两个实例,所以需要将config.xml复制一份config2.xml。两份配置文件中部分配置要做差异配置,下面对主要的一些配置项进行说明。
4.1需要在yandex标签下添加zookeeper的相关信息(主要就是host和port)。
<zookeeper>
<node index="1">
<host>bd1</host>
<port>2181</port>
</node>
<node index="2">
<host>bd2</host>
<port>2181</port>
</node>
<node index="3">
<host>bd3</host>
<port>2181</port>
</node>
</zookeeper>
4.2在yandex标签下添加集群节点配置(以3中的模型里3台机器的情况为例)。
<remote_servers>
<sht_ck_cluster_1>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>bd1</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>bd2</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>bd2</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>bd3</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>bd3</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<host>bd1</host>
<port>9001</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
</sht_ck_cluster_1>
</remote_servers>
4.2标签说明:
sht_ck_cluster_1:集群标识 自己进行约定
weight:写入权重,权重越大数据越容易进入该分片。(应该与服务器的性能相关,因为服务器配置相同所以都设置成1)
internal_replication:内部复制标签 分布式情况官网建议开启, 写入本地后会通过zookeeper后台同步到合适的备份分片,如果为false会出现备份与分片数据不一致的情况
replica:各个节点的相关信息
4.3在原有配置文件中修改压缩配置:
<compression incl="clickhouse_compression">
<!-- Set of variants. Checked in order. Last matching case wins. If nothing matches, lz4 will be used. -->
<case>
<!-- Conditions. All must be satisfied. Some conditions may be omitted. -->
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio> <!-- Min size of part relative to whole table size. -->
<!-- What compression method to use. -->
<method>zstd</method>
</case>
</compression>
标签说明:
min_part_size:可被压缩的数据块最小大小
min_part_size_ratio:可以被压缩的数据块占全表的比例,默认1%
method是压缩算法,可选lz4和zstd
4.4连接、并发配置:
<!-- <listen_backlog>64</listen_backlog> -->
<max_connections>4096</max_connections>
<keep_alive_timeout>3</keep_alive_timeout>
<!-- Maximum number of concurrent queries. -->
<max_concurrent_queries>100</max_concurrent_queries>
标签说明:
max_connections:最大连接数量
keep_alive_timeout:查询超时时间
max_concurrent_queries:最大并发数量
4.5时区配置
<timezone>Asia/Shanghai</timezone>
4.6数据存放路径等配置
<logger>
<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
<level>trace</level>
<log>/gldata1/data/clickhouse/logs/clickhouse-server.log</log>
<errorlog>/gldata1/data/clickhouse/logs/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>10</count>
<!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
</logger>
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<path>/gldata1/data/clickhouse/data</path>
<!-- Path to temporary data for processing hard queries. -->
<tmp_path>/gldata1/data/clickhouse/tmp</tmp_path>
<!-- Directory with user provided files that are accessible by 'file' table function. -->
<user_files_path>/gldata1/data/clickhouse/user_files</user_files_path>
<!-- Path to configuration file with users, access rights, profiles of settings, quotas. -->
<users_config>users.xml</users_config>
<!-- Directory in <clickhouse-path> containing schema files for various input formats.The directory will be created if it doesn't exist.-->
<format_schema_path>/gldata1/data/clickhouse/format_schemas</format_schema_path>
配置说明(一台服务器的两个clickhouse实例的以下配置应该区分开来配置,且配置上的路径如果不存在需要mkdir新建,并对系统中的clickhouse用户进行授权 否则启动过程会出现unknown XXXXXX的不明确的错误):
logger->log:服务日志存储位置
logger->errorlog:异常日志存储位置
http_port:连接端口号
tcp_port:连接端口号
path:数据存放路径
tmp_path:临时数据的存放路径
user_files_path:用户相关信息的数据
users_config:用户信息配置文件
format_schema_path:表结构数据路径
4.7复制表宏配置:
<macros incl="macros" optional="true" />
<macros>
<layer>01</layer>
<shard>01</shard>
<replica>cluster01-01-1</replica>
</macros>
配置说明(每个clickhouse实例都应该不同):
layer:复制表的层级。第几级备份,业务不到那么大直接1
shard: 分片Id
replica:实例持有副本Id
4.8其他配置注意事项:
<include_from>标签:可以引入外部配置文件(clickhouse默认配置的信息是 metrika.xml)。可以将一些通用配置转移到这个文件上,以便修改的时候可以更加方便。如果不需要引入外部配置文件一定要注释掉!否则clickhouse服务启动会引发错误。
4.9用户表相关配置
<!-- Users and ACL. -->
<users>
<!-- If user name was not specified, 'default' user is used. -->
<default>
<password_sha256_hex>37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f</password_sha256_hex>
<!--<password></password>-->
<!-- List of networks with open access.
To open access from everywhere, specify:
<ip>::/0</ip>
To open access only from localhost, specify:
<ip>::1</ip>
<ip>127.0.0.1</ip>
Each element of list has one of the following forms:
<ip> IP-address or network mask. Examples: 213.180.204.3 or 10.0.0.1/8 or 10.0.0.1/255.255.255.0
2a02:6b8::3 or 2a02:6b8::3/64 or 2a02:6b8::3/ffff:ffff:ffff:ffff::.
<host> Hostname. Example: server01.yandex.ru.
To check access, DNS query is performed, and all received addresses compared to peer address.
<host_regexp> Regular expression for host names. Example, ^server\d\d-\d\d-\d\.yandex\.ru$
To check access, DNS PTR query is performed for peer address and then regexp is applied.
Then, for result of PTR query, another DNS query is performed and all received addresses compared to peer address.
Strongly recommended that regexp is ends with $
All results of DNS requests are cached till server restart.
-->
<networks incl="networks" replace="replace">
<ip>::/0</ip>
</networks>
<!-- Settings profile for user. -->
<profile>default</profile>
<!-- Quota for user. -->
<quota>default</quota>
<!-- For testing the table filters -->
<databases>
<test>
<!-- Simple expression filter -->
<filtered_table1>
<filter>a = 1</filter>
</filtered_table1>
<!-- Complex expression filter -->
<filtered_table2>
<filter>a + b < 1 or c - d > 5</filter>
</filtered_table2>
<!-- Filter with ALIAS column -->
<filtered_table3>
<filter>c = 1</filter>
</filtered_table3>
</test>
</databases>
</default>
<!-- Example of user with readonly access. -->
<!-- <readonly>
<password></password>
<networks incl="networks" replace="replace">
<ip>::1</ip>
<ip>127.0.0.1</ip>
</networks>
<profile>readonly</profile>
<quota>default</quota>
</readonly> -->
</users>
标签说明:
password_sha256_hex:密码的哈希256值(echo -n 'your password' | sha256sum | tr -d '-' 指令快速获取密码明文的哈希256值)
password:与password_sha256_hex只需要配置一个
5.修改启动脚本
进入启动脚本所在目录:/etc/init.d/
将启动脚本进行进行复制 cp clickhouse-server clickhous-server2
将clickhouse-server中的CLICKHOUSE_DATADIR_OLD、CLICKHOUSE_DATADIR修改成自己存放数据的路径。
将clickhouse-server2中的CLICKHOUSE_LOGDIR、CLICKHOUSE_DATADIR_OLD、CLICKHOUSE_DATADIR、CLICKHOUSE_CRONFILE、CLICKHOUSE_CONFIG、
CLICKHOUSE_PIDFILE修改成自己存放数据的路径。(基本上是在clickhouse-server中的文件或者路径后加2进行区分,配置上的文件和路径都要先新建好)
6.运行、测试
service clickhouse-server start
service clickhouse-server2 start
在任意节点的服务器上执行’clickhouse-client -m -h 127.0.0.1 --password default --port 9000‘弹出会话。输入select * from system.clusters;可以查询到各个切片及备份的数据。
7.异常记录
问题1:执行启动命令系统抛出 UNKNOWN 异常,且clickhouse相关进程不存在。
原因1:clickhouse的配置文件、启动脚本、配置的路径未授权给clickhouse用户。
解决方法:chown -R clickhouse:clickhouse /xxx 将配置文件、启动脚本、配置路径授权给clickhouse用户组。
问题2:执行启动命令 系统抛出junk after document elementXXXX
原因2:config.xml配置文件配置了多个顶层标签
解决方法:只保留一个 yandex顶层标签 其他均放置在yandex标签里。