Clickhouse 单机版及集群的搭建
本文详细描述了如何搭建 Clickhouse 单机版及 Clickhouse 集群。
Clickhouse 单机版搭建
本小节简述了如何搭建 Clickhouse 的单机版。该部分内容也会成为后续搭建 Clickhouse 集群的先决条件,请务必认真对待。
安装包拷贝至相应目录
将下载好的 clickhouse 安装包利用命令分发至待安装主机上。安装包可以到 Clickhouse 官网 下载至本地。
[hadoop@p0-tklcdh-clickhouse01 software]$ pwd
/home/hadoop/software
[hadoop@p0-tklcdh-clickhouse01 software]$ ls -l
total 1112624
-rw-r--r-- 1 hadoop hadoop     39866 Sep 28 13:57 clickhouse-client-22.3.2.2.tgz
-rw-r--r-- 1 hadoop hadoop 179545907 Sep 28 13:57 clickhouse-common-static-22.3.2.2.tgz
-rw-r--r-- 1 hadoop hadoop 774024509 Sep 28 13:57 clickhouse-common-static-dbg-22.3.2.2.tgz
-rw-r--r-- 1 hadoop hadoop     61901 Sep 28 13:57 clickhouse-server-22.3.2.2.tgz
-rw-r--r-- 1 hadoop hadoop 185646832 Sep 28 11:40 jdk-8u181-linux-x64.tar.gz
[hadoop@p0-tklcdh-clickhouse01 software]$ scp clickhouse-* hadoop@10.129.81.114:`pwd`
[hadoop@p0-tklcdh-clickhouse01 software]$ scp clickhouse-* hadoop@10.129.81.115:`pwd自行编写的安装脚本
在 /homehadoop/software 中创建一个新脚本 sudo vim /homehadoop/software/ 并将下面的内容拷贝至脚本。
也可以参考官方网站的 Installation Guide 进行安装。
#!/bin/bash
# Remember to switch Linux user to root
sudo mkdir /home/hadoop/clickhouse
sudo cp /home/hadoop/software/clickhouse-* /home/hadoop/clickhouse
cd /home/hadoop/clickhouse
tar -xzvf "clickhouse-common-static-22.3.2.2.tgz"
tar -xzvf "clickhouse-common-static-dbg-22.3.2.2.tgz"
tar -xzvf "clickhouse-server-22.3.2.2.tgz"
tar -xzvf "clickhouse-client-22.3.2.2.tgz"
rm -f /home/hadoop/clickhouse/*.tgz
sudo "clickhouse-common-static-22.3.2.2/install/"
sudo "clickhouse-common-static-dbg-22.3.2.2/install/"
sudo "clickhouse-server-22.3.2.2/install/"
sudo "clickhouse-client-22.3.2.2/install/"
# Remember to generate a openssl certification
sudo openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 3650 -nodes -x509 -keyout /etc/clickhouse-servee-server/server.crt
# Set config to ensure only IPv4 is usable
sudo echo "<clickhouse><listen_host>0.0.0.0</listen_host></clickhouse>" > /etc/clickhouse-server/config.d/listen.xml拷贝完毕后,更改脚本权限 chmod 777 sudo vim /homehadoop/software/ 并执行脚本 /homehadoop/software/。根据提示输入相应的指令并完成安装。
检验单机 Clickhouse 是否成功安装
使用如下命令启动 clickhouse server:
sudo clickhouse start在 server 成功启动后,使用 clickhouse-client 配合在安装时指定的 password 进行登录
clickhouse-client --password如观测到如下输出则可表明 Clickhouse 在单机上运行成功
[root@p0-tklcdh-clickhouse03 software]# clickhouse-client --password
ClickHouse client version 22.3.2.1.
Password for user (default): 
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.3.2 revision 54455.
p0-tklcdh-clickhouse01 :) show databases;
SHOW DATABASES
Query id: d9a6c3f9-1aa8-42fd-92e4-e29de6d32dd4
┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default            │
│ information_schema │
│ system             │
└────────────────────┘
4 rows in set. Elapsed: 0.002 sec.
p0-tklcdh-clickhouse03 :) exit
Bye.Clickhouse 分布式集群搭建
本小节在上一个小节的基础上配合 Zookeeper 开始搭建一个 Clickhouse 集群。
安装前置需求
- 确保每一台要组成 Clickhouse 集群的节点上都按上述方法成功安装了 Clickhouse
- 至少已经准备好一个可用的 Zookeeper 集群,并确保所有 Clickhouse 节点都可以访问 Zookeeper 集群的相应端口
- 
/etc/hosts中各个 Clickhouse 节点和 Zookeeper 节点的 hostname-IP 映射全部配置完毕,并每台机器可以相互 ping 通,相应端口不要被占用
- 待配置文件 /etc/clickhouse-server/config.xml完成配置后,将配置文件全部替换至各个 Clickhouse 节点的相应位置
集群化运行修改配置文件
Clickhouse 官方文档给出了配置分布式集群的方法。详见 Clickhouse Clusters
为了能够使 Clickhouse 以集群的方式顺利运行,需要在 /etc/clickhouse-server/config.xml 配置文件的相应位置替换如下配置项:
- 
remote_servers的相关配置。特别注意,配置remote_servers时需要确保每台节点上的/etc/hosts中都有相应的域名-IP映射关系,否则节点之间无法互通。
<remote_servers>
		<example_cluster>
			<shard>
				<!-- Optional. Shard weight when writing data. Default: 1. -->
				<weight>1</weight>
				<!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
				<internal_replication>false</internal_replication>
				<replica>
					<host>p0-tklcdh-clickhouse01</host>
					<port>9000</port>
				</replica>
			</shard>
			<shard>
				<weight>1</weight>
				<internal_replication>false</internal_replication>
				<replica>
					<host>p0-tklcdh-clickhouse02</host>
					<port>9000</port>
				</replica>
			</shard>
			<shard>
				<weight>1</weight>
				<internal_replication>false</internal_replication>
				<replica>
					<host>p0-tklcdh-clickhouse03</host>
					<port>9000</port>
				</replica>
			</shard>
		</example_cluster>
	</remote_servers>- 
zookeeper的相关配置。特别注意,配置zookeeper时需要确保每台节点上的/etc/hosts中都有相应的域名-IP映射关系,否则该节点无法找到 Zookeeper 服务,也就无法获取从其他节点传来的任务指令。详见 Data Replication 和 Clickhouse 官方 Zookeeper 配置指南
<zookeeper>
		<node>
			<host>p0-tklfbirp-cdh-zk01</host>
			<port>2181</port>
		</node>
		<node>
			<host>p0-tklfbirp-cdh-zk02</host>
			<port>2181</port>
		</node>
		<node>
			<host>p0-tklfbirp-cdh-zk03</host>
			<port>2181</port>
		</node>
		<node>
			<host>p0-tklfbirp-cdh-zk04</host>
			<port>2181</port>
		</node>
		<node>
			<host>p0-tklfbirp-cdh-zk05</host>
			<port>2181</port>
		</node>
	</zookeeper>- 
macros的相关配置。具体作用请见官方文档说明 macros 和 Replicated*MergeTree parameters
<macros>
        <shard>01</shard>
        <replica>p0-tklcdh-clickhouse01</replica>
    </macros>- 修改数据存储位置
<!-- Path to data directory, with trailing slash. -->
    <path>/data02/clickhouse/</path>
    <!-- Path to temporary data for processing hard queries. -->
    <tmp_path>/data02/clickhouse/tmp/</tmp_path>
	<!-- Directory with user provided files that are accessible by 'file' table function. -->
    <user_files_path>/data02/clickhouse/user_files/</user_files_path>分发配置文件至其余 Clickhouse 节点
使用 scp 命令或将添加完集群配置的配置文件 /etc/clickhouse-server/config.xml 手动拷贝至所有 Clickhouse 节点的相应位置。
重启各个 Clickhouse 节点的服务
使用如下命令在各个 Clickhouse 节点上重新启动修改完配置文件的 clickhouse server:
sudo clickhouse restart查看 Clickhouse 集群信息
在 Clickhouse Cmd 中使用相关命令查看元数据的相应信息。
p0-tklcdh-clickhouse01 :) select * from system.clusters
SELECT *
FROM system.clusters
Query id: 8fac3aac-c543-4fae-97b2-58b278089675
┌─cluster─────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────────────┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─slowdowns_count─┬─estimated_recovery_time─┐
│ example_cluster │         1 │            1 │           1 │ p0-tklcdh-clickhouse01 │ 10.129.81.113 │ 9000 │        1 │ default │                  │            0 │               0 │                       0 │
│ example_cluster │         2 │            1 │           1 │ p0-tklcdh-clickhouse02 │ 10.129.81.114 │ 9000 │        0 │ default │                  │            0 │               0 │                       0 │
│ example_cluster │         3 │            1 │           1 │ p0-tklcdh-clickhouse03 │ 10.129.81.115 │ 9000 │        0 │ default │                  │            0 │               0 │                       0 │
└─────────────────┴───────────┴──────────────┴─────────────┴────────────────────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────┴─────────────────────────┘
3 rows in set. Elapsed: 0.002 sec.在主节点上使用 clickhouse-client 连接并验证分布式搭建是否成功
使用如下命令连接至 clickhouse 的主节点,并运行一个分布式建表语句检查 Clickhouse 的分布式搭建是否成功。详见 Creating Replicated Tables
建表语句如下:
CREATE TABLE IF NOT EXISTS table_test01
ON CLUSTER example_cluster
(
    UserID UInt32,
	ver UInt16
) 
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/01/table_test01', 'p0-tklcdh-clickhouse01-01', ver)
PRIMARY KEY UserID;执行结果如下:
[root@p0-tklcdh-clickhouse01 clickhouse-client-22.3.2.2]# clickhouse-client --password
ClickHouse client version 22.3.2.1.
Password for user (default): 
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.3.2 revision 54455.
p0-tklcdh-clickhouse01 :) CREATE TABLE IF NOT EXISTS table_test01
                          ON CLUSTER example_cluster
                          (
                              UserID UInt32,
                          ^Iver UInt16
                          ) 
                          ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/01/table_test01', 'p0-tklcdh-clickhouse01-01', ver)
                          PRIMARY KEY UserID;
CREATE TABLE IF NOT EXISTS table_test01 ON CLUSTER example_cluster
(
    `UserID` UInt32,
    `ver` UInt16
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/01/table_test01', 'p0-tklcdh-clickhouse01-01', ver)
PRIMARY KEY UserID
Query id: 90055fe0-7443-4c65-8aa1-4cb09becc393
┌─host───────────────────┬─port─┬─status─┬─error─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─num_hosts_remaining─┬─num_hosts_active─┐
│ p0-tklcdh-clickhouse02 │ 9000 │    253 │ Code: 253. DB::Exception: Replica /clickhouse/tables/01/table_test01/replicas/p0-tklcdh-clickhouse01-01 already exists. (REPLICA_IS_ALREADY_EXIST) (version 22.3.2.1) │                   2 │                0 │
│ p0-tklcdh-clickhouse03 │ 9000 │      0 │                                                                                                                                                                       │                   1 │                0 │
│ p0-tklcdh-clickhouse01 │ 9000 │    253 │ Code: 253. DB::Exception: Replica /clickhouse/tables/01/table_test01/replicas/p0-tklcdh-clickhouse01-01 already exists. (REPLICA_IS_ALREADY_EXIST) (version 22.3.2.1) │                   0 │                0 │
└────────────────────────┴──────┴────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────┴──────────────────┘
→ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.)  0%
3 rows in set. Elapsed: 0.120 sec. 
Received exception from server (version 22.3.2):
Code: 253. DB::Exception: Received from localhost:9000. DB::Exception: There was an error on [p0-tklcdh-clickhouse02:9000]: Code: 253. DB::Exception: Replica /clickhouse/tables/01/table_test01/replicas/p0-tklcdh-clickhouse01-01 already exists. (REPLICA_IS_ALREADY_EXIST) (version 22.3.2.1). (REPLICA_IS_ALREADY_EXIST)如果显示以上信息则可以初步表明分布式搭建已经成功,所有 Clickhouse 节点组成了一个 Clickhouse 集群并由 Zookeeper 进行任务协同。(由于本例中之前已经做过了一次该操作,故这里有一个 Replica 已经存在的 Exception,可忽略。)
Clickhouse 集群指标测试
通过官方给定的测试用例,可以系统性的测试 Clickhouse 集群搭建的是否成功,以及查询性能等诸多指标。详见 Clickhouse Testing
备注
- Clickhouse 默认日志存储路径,如果发现问题可以及时登录每台 Clickhouse 节点上进行日志分析。
[root@p0-tklcdh-clickhouse02 clickhouse-server]# ls /var/log/clickhouse-server/
clickhouse-server.err.log  clickhouse-server.log  stderr.log  stdout.logReferences
- Clickhouse tgz 包下载路径
- Installation Guide
- Clickhouse Clusters
- Zookeeper 配置指南
- Data Replication
- macros
- Replicated*MergeTree parameters
- Creating Replicated Tables
- Clickhouse Testing
- Creating a ClickHouse cluster - Part I: Sharding
- Creating a ClickHouse cluster - Part II: Replication
附录:配置文件全部内容
(略) 数据合规,信息安全,无法透露请谅解
                
 
 
                     
            
        













 
                    

 
                 
                    