clickhouse笔记01--快速部署clickhouse

1 介绍

ClickHouse是一个用于联机分析(OLAP)的列式数据库管理系统(DBMS),当前已经被很多大公司使用。
Clickhouse 是一款真正的列式数据库管理系统,具备数据压缩、数据的磁盘存储、多核心并行处理、多服务器分布式处理、支持SQL、向量引擎、实时的数据更新、索引、适合在线查询、支持近似计算、Adaptive Join Algorithm、支持数据复制和数据完整性、角色的访问控制 等特性。更多介绍信息可以参考 ​​​clickhouse 官方文档​

笔者也是初次使用 clickhouse,因此将其部署方法和注意事项贴在此处,以方便有需要的小伙伴学习。

2 部署测试

2.1 准备配置

在 ​​clickhouse 稳定版本安装包​​ 准备最新稳定安装包,笔者准备的为 22.2.2.1 版本.

1) 下载安装包
clickhouse-client-22.2.2.1.tgz
clickhouse-common-static-22.2.2.1.tgz
clickhouse-common-static-dbg-22.2.2.1.tgz
clickhouse-server-22.2.2.1.tgz

2)解压安装包
tar -xzvf "clickhouse-common-static-22.2.2.1.tgz"
sudo "clickhouse-common-static-22.2.2.1/install/doinst.sh"

tar -xzvf "clickhouse-common-static-dbg-22.2.2.1.tgz"
sudo "clickhouse-common-static-dbg-22.2.2.1/install/doinst.sh"

tar -xzvf "clickhouse-server-22.2.2.1.tgz"
sudo "clickhouse-server-22.2.2.1/install/doinst.sh"

tar -xzvf "clickhouse-client-22.2.2.1.tgz"
sudo "clickhouse-client-22.2.2.1/install/doinst.sh"

3) vim /etc/clickhouse-server/config.xml
注释掉 listen_host 即可, 若为 ipv6 就注释第一个,若为 ipv4 就注释第二个,不注释掉的话只能在本机访问了

<listen_host>::</listen_host>
<listen_host>0.0.0.0</listen_host>

2.2 启动服务

# /etc/init.d/clickhouse-server start
# /etc/init.d/clickhouse-server status
Now there is no clickhouse-server process.

执行命令后,发现服务并未启动,查看 /etc/init.d/clickhouse-server 后发现该服务使用 clickhouse 用户启动,因此需要新建该用户,创建先关文件夹,并授权给 clickhouse 用户

创建clickhouse,此处直接把密码也设置为clickhouse
# adduser --home /home/clickhouse clickhouse
创建核心目录
# mkdir -p /var/lib/clickhouse
# chown clickhouse:clickhouse -R /var/lib/clickhouse
# mkdir -p /var/log/clickhouse-server
# chown clickhouse:clickhouse -R /var/log/clickhouse-server

再次执行命令发现服务正常启动:
# /etc/init.d/clickhouse-server start
# /etc/init.d/clickhouse-server status
/var/run/clickhouse-server/clickhouse-server.pid file exists and contains pid = 67486.
The process with pid = 67486

1 动服务后,您可以使用命令行客户端连接到它: clickhouse-client

clickhouse笔记01--快速部署clickhouse_数据库


2 也可以通过 curl http://192.168.2.136:8123 来查看服务是否ok,curl 返回 Ok.

3 也可以直接通过 mysql客户端连接 clickhouse, mysql -udefault -h192.168.2.136 -P9004

连接后就像平时查看sql 数据库一样访问数据即可。

2.3 测试

  1. 下载数据集(4.5Gb)
wget
  1. 创建表
$ clickhouse-cliet
:) use default;
:)
CREATE TABLE uk_price_paid
(
price UInt32,
date Date,
postcode1 LowCardinality(String),
postcode2 LowCardinality(String),
type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
is_new UInt8,
duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
addr1 String,
addr2 String,
street LowCardinality(String),
locality LowCardinality(String),
town LowCardinality(String),
district LowCardinality(String),
county LowCardinality(String),
category UInt8
) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2);
  1. 预处理 & 导入数据
clickhouse-local --input-format CSV --structure '
uuid String,
price UInt32,
time DateTime,
postcode String,
a String,
b String,
c String,
addr1 String,
addr2 String,
street String,
locality String,
town String,
district String,
county String,
d String,
e String
' --query "
WITH splitByChar(' ', postcode) AS p
SELECT
price,
toDate(time) AS date,
p[1] AS postcode1,
p[2] AS postcode2,
transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,
b = 'Y' AS is_new,
transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration,
addr1,
addr2,
street,
locality,
town,
district,
county,
d = 'B' AS category
FROM table" --date_time_input_format best_effort < pp-complete.csv | clickhouse-client --query "INSERT INTO uk_price_paid FORMAT TSV"
  1. 验证数据
4.1 查看数据条数
:) SELECT count() FROM uk_price_paid;

SELECT count()
FROM uk_price_paid
Query id: 828475ce-ad80-45de-8ed2-4c9f6fd8e871
┌──count()─┐
27265985
└──────────┘
1 rows in set. Elapsed: 0.004 sec.

4.2 查看占用存储大小
:) SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'uk_price_paid';

SELECT formatReadableSize(total_bytes)
FROM system.tables
WHERE name = 'uk_price_paid'
Query id: 04d2a185-0989-4649-86a5-e38e6edac861
┌─formatReadableSize(total_bytes)─┐
292.80 MiB │
└─────────────────────────────────┘
1 rows in set. Elapsed: 0.019
  1. 针对该数据集更多测试参考:​​快速上手/示例数据集/K Property Price Paid​

3 注意事项

  1. 必须新建 clickhouse 用户,并创建相关目录,配置clickhouse权限,否则无法正常启动。
  2. 需要在 /etc/clickhouse-server/config.xml 中开启 <listen_host>0.0.0.0或者::</listen_host>, 否则只能本机访问。

4 说明

软件环境:
Ubuntu 21.04 Desktop
clickhouse 22.2.2.1
参考文档:
​​​clickhouse 官方文档​​​​clickhouse 稳定版本安装包​​​​ClickHouse安装部署【非常详细】​