clickhouse表转mysql表 clickhouse 修改表引擎

转载

mob6454cc7aaa9d 2023-12-10 10:24:54

文章标签 clickhouse表转mysql表 clickhouse 引擎 MergeTree 引擎介绍 文章分类 MySQL 数据库

一、基本语法

1、建表语句

目前只有MergeTree、Merge和Distributed这三类表引擎支持 alter 操作。

create  table  test_ddl1(
	id UInt16 ,
	name String
)
engine = MergeTree() 
order by id ;

MergeTree一定要指定主键和排序字段，order  by 代表两个含义

查看建表语句

show  create  table  test_ddl1

2、修改表结构

alter table  test_ddl1  add column age UInt8  (after name);  --增加字段
alter table  test_ddl1  drop column age ;                    --删除字段
alter table  test_ddl1  modify column name Int8 ;            -- 不能修改排序字段和主键
alter table  test_ddl1  comment column  name '用户名' ;       --添加注释

3、修改字段类型

语法：ALTER TABLE tb_name MODIFY COLUMN [IF EXISTS] name [type] [default_expr]

例如： alter  table  test_alter  modify column gender UInt8  default 0

4、表的重命名和移动

rename table t1  to t2;   -- 重命名

rename table t1  to new_db.t1 ;   --移动表

5、默认值

create table tb_t1(
    id Int8 , 
	name String,
	role String default 'vip'
) engine=Log ;

6、导入数据

方式一：

insert into tb_user(id  , name , age) values(1,’zss’,21) ;
insert into tb_user values(1,’zss’,21) ;

方式二：

create table log3 as log2 ;  --建表
insert into log3  select * from log2 ;

方式三：

cat data.txt | clickhouse-client  -q 'insert into test_db.log1 format csv' 

clickhouse-client  --format_csv_delimiter="," -q 'insert into tb_client format csv' < data.txt

\t 对应 tsv
 , 对应 csv

7、更新删除数据

更新：

alter  table tb_update  update name='李四'   where  id=2

删除：

alter  table tb_update  delete  where  id=2

二、clickhouse 分区表

目前只有MergeTree系列的表引擎支持数据分区。

1、建立分区表

create table tb_partition(
    id String , 
	ctime DateTime
)engine=MergeTree() 
partition by toYYYYMM(ctime) 
order by (id) ;

2、删除分区

alter table tb_partition drop partition '20201105' ;

3、复制分区

create table tb_partition_back  as tb_partition;

alter table tb_partition_back replace partition  '2020-05-20' from  tb_partition  ;

4、卸载/装载分区

卸载分区：

alter table tb_partition  detach partition  ‘2020-05-20’;

装载分区：

alter table tb_partition  attach partition  ‘2020-05-20’;

三、clickhouse视图

ClickHouse拥有普通和物化两种视图，其中普通视图只是一层简单的查询代理，不存数据，而物化视图拥有独立的存储。

1、普通视图

create  view  user_view  as  select  id, upper(name)  from  tb_insert2

2、物化视图

create materialized view mv_log  engine=Log  populate  as  select * from log ;

如果使用了populate修饰符，那么在创建视图的过程中，会连带将源表中已存在的数据并导入；如果在源表中插入了数据，会同步到物化视图，删除了源表数据，物化视图的数据不会变化。

四、clickhouse表引擎

表引擎（即表的类型）决定了：

1.数据的存储方式和位置，写到哪里以及从哪里读取数据
2.支持哪些查询以及如何支持。
3.并发数据访问。
4.索引的使用（如果存在）。
5.是否可以执行多线程请求。
6.数据复制参数,是否可以存储数据副本。
7.分布式引擎

1、Log系列引擎

Log家族具有最小功能的轻量级引擎。当您需要快速写入许多小表（最多约100万行）并在以后整体读取它们时，该类型的引擎是最有效的。

TinyLog引擎(数据不分块)

最简单的表引擎，用于将数据存储在磁盘上。每列都存储在单独的压缩文件中，写入时，数据将附加到文件末尾。该引擎没有并发控制、只支持并发读、如果同时从表中读取和写入数据，则读取操作将抛出异常；如果同时写入多个查询中的表，则数据将被破坏。

create table tb_tinylog (
	id Int8 , 
    name String , 
	age Int8
) engine=TinyLog ;

insert into tb_tinylog values(1,'马云',56);

Log(数据分块记录偏移量)

该引擎支持并发读写、支持多线程处理、mark.mrk 数据块标记文件、不支持索引

create table tb_log(
   id Int8 , 
   name String ,
   age Int8
) engine=Log ;

StripeLog(数据分块列在一起)

该引擎index.mrk 对数据建立了索引、支持并发读写，在你需要写入许多小数据量（小于一百万行）的表的场景下使用这个引擎

create table stripe_log_table(
    timestamp DateTime,
    message_type String,
    message String
)engine= StripeLog ;

INSERT INTO stripe_log_table VALUES (now(),'Title','多易教育') ;

2、MergeTree家族引擎

MergeTree系列的表引擎是ClickHouse数据存储功能的核心。它们提供了用于弹性和高性能数据检索的大多数功能：列存储，自定义分区，稀疏的主索引，辅助数据跳过索引等。

MergeTree

create table tb_merge_tree(
	id Int8 ,
	name String ,
	ctime Date 
)
engine=MergeTree()
order by id 
partition by name ;

或者

create table tb_merge_tree(
	id Int8 ,
	name String ,
	ctime Date 
)
engine=MergeTree()
primary key id
order by id 
partition by name ;

备注： order by 第一个字段是主键，主键可以重复

insert into tb_merge_tree values (1,'hng','2020-08-07'),(4,'hng','2020-08-07');

合并分区数据：

optimize  table  tb_merge_tree  final

ReplacingMergeTree

这个引擎是在 MergeTree 的基础上，添加了“处理重复数据”的功能，该引擎和MergeTree的不同之处在于它会删除具有相同(区内)排序一样的重复项。适用于在后台清除重复的数据以节省空间，但是它不保证没有重复的数据出现。

create table test_replacingMergeTree(
	id Int8 ,
	name String ,
	gender String , 
	city String,
	ctime DateTime
)engine = ReplacingMergeTree(ctime)
order by id
partition by city ;

insert into test_replacingMergeTree values (1,'zss','3','北京', '2022-06-03 14:14:00'),(1,'zss','1','北京','2022-06-03 13:13:00');

合并分区数据：

optimize  table  test_replacingMergeTreefinal

分区都是北京，排序都是1，如果有版本, 则会保留时间最大的那条记录，若是没有，则会保留最后插入的记录

CollapsingMergeTree

CollapsingMergeTree就是一种通过以增代删的思路，支持行级数据修改和删除的表引擎。它通过定义一个sign标记位字段，记录数据行的状态：1 为有效，-1为需要被删除。

create table tb_cps_merge_tree
(
    user_id UInt64,
    name String,
    age UInt8,
    sign Int8
)ENGINE = CollapsingMergeTree(sign)
partition by age
ORDER BY user_id;

INSERT INTO tb_cps_merge_tree VALUES (1001, 'ADA', 19, 1), (1001, 'MADA', 19, -1), (1002, 'HDA', 19, 1);

合并分区数据：

optimize  table  tb_cps_merge_treefinal

备注：若是-1在前，则删除不了1001 ，解决问题，则需要以下引擎

VersionedCollapsingMergeTree

为了解决CollapsingMergeTree乱序写入情况下无法正常折叠问题，VersionedCollapsingMergeTree表引擎在建表语句中新增了一列Version，用于在乱序情况下记录状态行与取消行的对应关系。主键相同，且Version相同、Sign相反的行，在Compaction时会被删除。

create table tb_vscmt
(
    uid UInt64,
    name String,
    age UInt8,
    sign Int8,
    version UInt8
)
ENGINE = VersionedCollapsingMergeTree(sign, version)
ORDER BY uid;

INSERT INTO tb_vscmt  VALUES (1001, 'ADA', 19, -1，1), (1001, 'MADA', 19, 1, 1), (1002, 'HDA', 19, 1, 1);

合并分区数据：

optimize  table  tb_vscmt final

SummingMergeTree

只需要查询数据的汇总结果，不关心明细数据，并且数据的汇总条件是预先明确的（GROUP BY 条件明确，且不会随意改变）。

create table summing_table( 
	id String, 
	city String, 
	sal UInt32, 
	comm Float64, 
	ctime DateTime 
)ENGINE = SummingMergeTree() 
PARTITION BY toDate(ctime) 
ORDER BY (id, city) 
PRIMARY KEY id ;

insert into summing_table  values 
(1,'shanghai',10,20,'2020-12-12 01:11:12'),(1,'shanghai',20,30,'2020-12-12 01:11:12'),
(3,'shanghai',10,20,'2020-11-12 01:11:12'),(3,'Beijing',10,20,'2020-11-12 01:11:12') ;

合并分区数据：

optimize  table  summing_table final

– 相同分区中 ,相同的Id和city的数据会被聚合在一起，数字字段的数据都会被sum在一起
–如果指定合并字段 , 那么只有指定的字段才会sum操作

create table summing_table1( 
	id String, 
	city String, 
	sal UInt32, 
	comm Float64, 
	ctime DateTime 
)ENGINE = SummingMergeTree(sal) 
PARTITION BY toDate(ctime) 
ORDER BY (id, city) 
PRIMARY KEY id ;


insert into summing_table1  values 
(1,'shanghai',10,20,'2020-12-12 01:11:12'),(1,'shanghai',20,30,'2020-12-12 01:11:12'),
(3,'shanghai',10,20,'2020-11-12 01:11:12'),(3,'Beijing',10,20,'2020-11-12 01:11:12') ;

合并分区数据：

optimize  table  summing_table1 final

AggregatingMergeTree

案例一：

明细表

create table  detail_table(
    id UInt8,
    ctime Date,
    money UInt64
) ENGINE = MergeTree() 
PARTITION BY toYYYYMM(ctime) 
ORDER BY (id, ctime);

INSERT INTO detail_table VALUES(1, '2020-08-06', 100);
INSERT INTO detail_table VALUES(1, '2020-08-06', 200);
INSERT INTO detail_table VALUES(2, '2020-08-07', 100);
INSERT INTO detail_table VALUES(2, '2020-08-07', 200);

建立聚合表

create table agg_table(
    id UInt8,
    ctime Date,
    money AggregateFunction(sum, UInt64) -- 在这个字段上进行聚合操作
) ENGINE = AggregatingMergeTree() 
PARTITION BY toYYYYMM(ctime) 
ORDER BY (id, ctime);

插入明细数据

insert into agg_table   select id, ctime, sumState(money)  from detail_table
group by id, ctime ;

获取聚合结果

select  id, ctime ,sumMerge(uid)  as  state  FROM agg_table  group by  id, ctime;

以上方式明细数据插入后，不能自动同步.

案例二：物化视图

明细表

create table visits
(
    UserID UInt64,
    CounterID UInt8,
    StartDate Date,
    Sign Int8
)ENGINE = MergeTree()
order by UserID;

insert into visits values(0, 0, '2019-11-11', 1);
insert into visits values(1, 1, '2019-11-12', 1);```

向物化视图同步数据

create MATERIALIZED view visits_agg_view
ENGINE = AggregatingMergeTree() 
partition by toYYYYMM(StartDate)
order  by (CounterID, StartDate)
as select
    CounterID,
    StartDate,
    sumState(Sign)    AS Visits,
    uniqState(UserID) AS Users
from visits
group by CounterID, StartDate;

查看数据

select  StartDate, sumMerge(Visits) as Visits,uniqMerge(Users) as Users from visits_agg_view group by  StartDate order by  StartDate;

3、外部存储引擎

HDFS引擎

该引擎里面的数据不能变动，只能一次性加载

create table test_hdfs1(
	id Int8 ,
	name String ,
	age Int8
)engine=HDFS('hdfs://linux01:8020/ck/test1/text.txt' ,CSV) ;

MySQL引擎

可以查询数据，也可以向表中插入数据，不支持删除和更新操作

create table test_mysql (
    id Int32 , 
    name String
 )engine=MySQL('ck1:3306','ck','t1','root','root') ;

File引擎

create table test_file1(
	id String ,
	name String ,
	age UInt8
)engine=File("CSV") ;

在默认的目录下回生成一个文件夹 , 文件夹中可以写入文件 ,但是文件的名字必须是data.CSV

insert into  test_file1 values('asdsdf','hangge',33) ;

File表函数

select  * from file('/ck/user.txt', 'CSV'，'id String , name String , age UInt8') ;

默认加载的是特定文件夹，数据一定在特定文件夹下才加载

vi   /etc/clickhouse-server/config.xml

<user_files_path>/</user_files_path>

重启服务

service  clickhouse-server   restart

4、内存引擎

Memory引擎

Memory表引擎直接将数据保存在内存中，数据既不会被压缩也不会被格式转换,当ClickHouse服务重启的时候，Memory表内的数据会全部丢失.

create table memory_1 ( 
   id UInt64 
)ENGINE = Memory()

Set引擎

Set表引擎是拥有物理存储的，数据首先会被写至内存，然后被同步到磁盘文件中。所以当服务重启时，它的数据不会丢失。Set表引擎具有去重的能力。

create table test_set(
   id Int8 ,
   name String
)engine=Set();

insert into test_set values(1,'zss'),(5,'ddd') ;

但是这种表不允许我们直接查询，而是当做条件表

create table x
(
    id  Int8,
    name String
)
ENGINE =Log

insert into x values(1,'zss'),(4,'ww') ;

查询语句

select * from x where (id,name) in test_set ;

Buffer引擎

Buffer表引擎完全使用内存装载数据，不支持文件的持久化存储，所以当服务重启之后，表内的数据会被清空。Buffer表引擎不是为了面向查询场景而设计的，它的作用是充当缓冲区的角色。

clickhouse表转mysql表 clickhouse 修改表引擎_clickhouse

·如果三组条件中所有的最小阈值都已满足，则触发刷新动作；
·如果三组条件中至少有一个最大阈值条件满足，则触发刷新动作；

创建目标表

create table tb_user(
   id Int8,
   name  String
)engine=Log ;

创建缓存表

create table tb_user_buffer AS tb_user
ENGINE = Buffer(doit23, tb_user, 16, 10, 100, 10000, 1000000, 10000000, 100000000) ;

向缓存表插入数据

insert  into  tb_user_buffer  values (1, 'yang'), (2,'hana'), (3,'hive') ;

查询

select * from  tb_user

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：java中String转Set JAVA中string转hash

下一篇：pytorch tensor转坐标 pytorch 转 onnx

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

clickhouse表转mysql表 clickhouse 修改表引擎

clickhouse表转mysql表 clickhouse 修改表引擎

51CTO博客