1 概述

  在对ClickHouse进行分布表+复制表+zookeeper保证高可用的情况下进行性能测试时遇到如下坑,进行整理

2 分布表join问题Unknown identifier: LO_CUSTKEY, context:…

1.1 问题描述

  SQL如下:

SELECT count(1)
FROM performance.line_all AS c 
LEFT JOIN performance.customer_all AS l ON l.C_CUSTKEY = c.LO_CUSTKEY

  执行该SQL报错如下:

Received exception from server (version 19.4.0):
Code: 47. DB::Exception: Received from 10.0.0.50:9000. DB::Exception: Received from ambari04:9000, 10.0.0.54. DB::Exception: Unknown identifier: LO_CUSTKEY, context: query: 'LO_CUSTKEY' required_names: 'LO_CUSTKEY' source_tables: table_aliases: complex_aliases: masked_columns: array_join_columns: source_columns: .

  根据报错信息可以不知道LO_CUSTKEY,这个连接字段

1.2 解决

  分布表join,在on后的连接条件中,from后面跟的表的连接字段放在前面。修改SQL如下:

SELECT count(1)
FROM performance.customer_all AS c 
LEFT JOIN performance.line_all AS l ON l.C_CUSTKEY = c.LO_CUSTKEY

2 与Zookeeper连接丢失,Unknown status, Cannot allocate block number in ZooKeeper: , ZooKeeper session has been expired…

2.1 问题描述

  在执行SQL中如在遇到如下报错:

↑ Progress: 157.94 million rows, 6.91 GB (92.63 thousand rows/s., 4.05 MB/s.) Received exception from server (version 19.4.0):
Code: 319. DB::Exception: Received from 10.0.0.50:9000. DB::Exception: Unknown status, client must retry. Reason: Connection loss.
↖ Progress: 94.47 million rows, 4.18 GB (95.07 thousand rows/s., 4.20 MB/s.) Received exception from server (version 19.4.0):
Code: 999. DB::Exception: Received from 10.0.0.50:9000. DB::Exception: Cannot allocate block number in ZooKeeper: Coordination::Exception: Connection loss.
lineorder_flat_all.Distributed.DirectoryMonitor: Code: 225, e.displayText() = DB::Exception: Received from ambari02:9000, 10.0.0.52. DB::Exception: ZooKeeper session has been expired.. Stack trace:

  根据报错信息可知,是因为与Zookeeper的连接丢失导致不能分配块号等问题。因为clickhouse对zookeeper的依赖非常的重,表的元数据信息,每个数据块的信息,每次插入的时候,数据同步的时候,都需要和zookeeper进行交互。zookeerper 服务在同步日志过程中,会导致ZK无法响应外部请求,进而引发session过期等问题

2.2 解决

  (1)加大zookeeper会话最大超时时间,在zoo.cfg 中修改MaxSessionTimeout=120000,修改后重启zookeeper。
注意:zookeeper的超时时间不要设置太大,在服务挂掉的情况下,会反映很慢
  (2)zookeeper的snapshot文件存储盘不低于1T,注意清理策略
  (3)在zookeeper中将dataLogDir存放目录应该与dataDir分开,可单独采用一套存储设备来存放ZK日志。
  (4)在ZOO.CFG中增加:forceSync=no。默认是开启的,为避免同步延迟问题,ZK接收到数据后会立刻去将当前状态信息同步到磁盘日志文件中,同步完成后才会应答。将此项关闭后,客户端连接可以得到快速响应。关闭forceSync选项后,会存在潜在风险,虽然依旧会刷磁盘(log.flush()首先被执行),但因为操作系统为提高写磁盘效率,会先写缓存,当机器异常后,可能导致一些zk状态信息没有同步到磁盘,从而带来ZK前后信息不一样问题。
  (5)clickhouse建表的时候添加use_minimalistic_part_header_in_zookeeper参数,对元数据进行压缩存储,但是修改完了以后无法再回滚的。

3 分布表只读Table is in readonly mode

3.1 问题描述

  如SQL在执行插入数据时遇到如下错误:

2020.05.28 10:59:11.048910 [ 47 ] {} <Error> lineorder_flat_all.Distributed.DirectoryMonitor: Code: 242, e.displayText() = DB::Exception: Received from ambari04:9000, 10.0.0.54. DB::Exception: Table is in readonly mode. Stack trace:

  是因为zookeeper压力太大,表处于“read only mode”模式,导致插入失败

3.2 解决

  (1)在zookeeper中将dataLogDir存放目录应该与dataDir分开,可单独采用一套存储设备来存放ZK日志。
  (2)做好zookeeper集群和clickhouse集群的规划,可以多套zookeeper集群服务一套clickhouse集群。

4 Clickhouse 集群zookeeper数据丢失,Can’t get data for node /clickhouse/tables/…

4.1 问题描述

  如在日志中发现如下报错

Cannot create table from metadata file /var/lib/clickhouse/metadata/xx/xxx.sql, error: Coordination::Exception: Can’t get data for node /clickhouse/tables/xx/cluster_xxx-01/xxxx/metadata: node doesn’t exist (No node), stack trace:

  是因为zookeeper数据丢失,从而使clickhouse数据库无法启动

4.2 解决

  (1)将/var/lib/clickhouse/metadata/ 下的SQL与/var/lib/clickhouse/data/ 下的数据备份之后删除
  (2)启动数据库
  (3)创建与原来表数据结构的MergeTree表
  (4)将之前分布式表的数据文件夹复制到新表的数据目录中。
  (5)重启数据库
  (6)重新创建原结构本地表
  (7)重新创建原结构分布式表
  (8)insert into [分布式表] select * from [MergeTree表]