一、安装前准备
1、防火墙已关闭 2、JDK环境配置完成(1.8) 3、zookeeper集群搭建完成(3.6) 4、设备CPU需要支持SSE 4.2指令集
#查看CPU是否支持SSE4.2指令集
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
二、安装环境
3 * centos7.4 192.168.20.1 node1 192.168.20.2 node2 192.168.20.3 node3
三、安装流程
1、下载rpm安装包
1)下载地址: ClickHouse镜像仓库 2)下载文件列表:
clickhouse-server-common-20.1.8.41-1.el7.x86_64.rpm
clickhouse-server-20.1.8.41-1.el7.x86_64.rpm
clickhouse-common-static-20.1.8.41-1.el7.x86_64.rpm
clickhouse-client-20.1.8.41-1.el7.x86_64.rpm
clickhouse-test-20.1.8.41-1.el7.x86_64.rpm
2、单机安装ClickHouse(三台机器分别执行)
1)安装必要依赖
yum install libicu unixODBC
yum install perl-JSON-XS -y
2)安装clickhouse rpm — 注意安装顺序
rpm -ivh clickhouse-server-common-20.1.8.41-1.el7.x86_64.rpm
rpm -ivh clickhouse-common-static-20.1.8.41-1.el7.x86_64.rpm
rpm -ivh clickhouse-server-20.1.8.41-1.el7.x86_64.rpm
rpm -ivh clickhouse-client-20.1.8.41-1.el7.x86_64.rpm
rpm -ivh clickhouse-test-20.1.8.41-1.el7.x86_64.rpm
3)启动和验证ClickHouse服务
service clickhouse-server start
clickhouse-client -m
3、配置ClickHouse高可用集群(三台机器分别执行)
ClickHouse核心配置文件(位置:/etc/clickhouse-server/)
config.xml 端口配置、本机机器名配置、内存设置等
metrika.xml 集群配置、ZK配置、分片配置等
users.xml 权限、配额设置
1)修改config.xml
#修改以下配置项,其他不变
<interserver_http_host>node1</interserver_http_host> <!--填命令hostname -f返回的值-->
<timezone>Asia/Shanghai</timezone> <!--设置时区-->
<include_from>/etc/clickhouse-server/metrika.xml</include_from> <!--外部配置文件-->
<log>/var/log/clickhouse-server/clickhouse-server.log</log> <!--标准输出日志文件-->
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog><!--错误输出日志文件-->
<tcp_port>10000</tcp_port> <!--TCP端口-->
<interserver_http_port>9009</interserver_http_port> <!--备份间数据交互端口-->
<path>/var/lib/clickhouse/</path> <!--数据存储目录-->
<tmp_path>/var/lib/clickhouse/tmp/</tmp_path> <!--查询时临时数据文件存储目录-->
2)创建并修改config1.xml
cp /etc/clickhouse-server/config.xml /etc/clickhouse-server/config1.xml
vim /etc/clickhouse-server/config1.xml
修改以下内容
<log>/var/log/clickhouse-server/clickhouse-server-1.log</log> <!--标准输出日志文件-->
<errorlog>/var/log/clickhouse-server/clickhouse-server.err-1.log</errorlog><!--错误输出日志文件-->
<tcp_port>11000</tcp_port> <!--TCP端口-->
<interserver_http_port>9010</interserver_http_port> <!--备份间数据交互端口-->
<path>/var/lib/clickhouse1/</path> <!--数据存储目录-->
<tmp_path>/var/lib/clickhouse1/tmp/</tmp_path> <!--查询时临时数据文件存储目录-->
<include_from>/etc/clickhouse-server/metrika1.xml</include_from> <!--外部配置文件-->
3)新增并修改metrika.xml
<yandex>
<!--ck集群节点-->
<clickhouse_remote_servers>
<perftest_3shards_1replicas>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>node1</host>
<port>10000</port>
</replica>
<replica>
<host>node2</host>
<port>11000</port>
</replica>
</shard>
<shard>
<replica>
<internal_replication>true</internal_replication>
<host>node2</host>
<port>10000</port>
</replica>
<replica>
<host>node3</host>
<port>11000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>node3</host>
<port>10000</port>
</replica>
<replica>
<host>node1</host>
<port>11000</port>
</replica>
</shard>
</perftest_3shards_1replicas>
</clickhouse_remote_servers>
<!--zookeeper相关配置-->
<zookeeper-servers>
<node index="1">
<host>node2</host>
<port>2181</port>
</node>
<node index="2">
<host>node1</host>
<port>2181</port>
</node>
<node index="3">
<host>node3</host>
<port>2181</port>
</node>
</zookeeper-servers>
<macros>
<shard>见下文配置</shard>
<replica>node2</replica> <!--当前节点主机名-->
</macros>
<networks>
<ip>::/0</ip>
</networks>
<!--压缩相关配置-->
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method> <!--压缩算法lz4压缩比zstd快, 更占磁盘-->
</case>
</clickhouse_compression>
</yandex>
4)复制metrika.xml文件重命名为metrika1.xml,修改macros配置
macros配置包括两项,主要是用于创建表对应的zk目录
1、shard: 可以认为是分片ID,能够区分即可,比如
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>node1</host>
<port>10000</port>
</replica>
<replica>
<host>node2</host>
<port>11000</port>
</replica>
</shard>
这个shard可以定义为01shard,那么shard项可以配置为01
那么上面两个主机端口对应的macros中的shard都要配置成为01
2、replica: 可以认为是副本ID,上面提到的01shard有两个副本,只要能够区分即可,由于两个副本对应的host不一样,可以直接使用host值作为replica值,因为一般而言,同一个shard对应的不同副本不在一台机器上,所以本值一般都直接配置为host值
5)创建并修改clickhouse-server-1
sudo cp /etc/rc.d/init.d/clickhouse-server /etc/rc.d/init.d/clickhouse-server-1
sudo vim /etc/rc.d/init.d/clickhouse-server-1
修改以下内容
CLICKHOUSE_CONFIG=$CLICKHOUSE_CONFDIR/config1.xml
CLICKHOUSE_PIDFILE="$CLICKHOUSE_PIDDIR/$PROGRAM-1.pid"
6)启动配置对应的两个ClickHouse实例
service clickhouse-server restart
service clickhouse-server-1 restart
7)创建可复制表
CREATE TABLE log_test ON CLUSTER perftest_3shards_1replicas
(
`ts` DateTime,
`uid` String,
`biz` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/log_test', '{replica}')
PARTITION BY toYYYYMMDD(ts)
ORDER BY ts
SETTINGS index_granularity = 8192
8)往一个shard的一个副本写入数据,在另一个副本即可看到数据
INSERT INTO log_test VALUES ('2019-06-07 20:01:01', 'a', 'show');
INSERT INTO log_test VALUES ('2019-06-07 20:01:02', 'b', 'show');
INSERT INTO log_test VALUES ('2019-06-07 20:01:03', 'a', 'click');
INSERT INTO log_test VALUES ('2019-06-08 20:01:04', 'c', 'show');
INSERT INTO log_test VALUES ('2019-06-08 20:01:05', 'c', 'click');
物理架构