greendao 查询所有 greenplum查询慢

转载

killads 2024-05-28 23:42:41

文章标签 greendao 查询所有数据分布 oracle SQL 文章分类 云原生云计算

一、前言

在一次对比oracle和greenplum查询性能过程中，由于greenplum查询性能不理想，因此进行定位分析，提升greenplum的查询性能

二、环境信息

初始情况下，搭建一个小的集群，进行性能测试

磁盘	SAS
交换机	千兆
集群大小	4segment
数据量	3亿
数据文件大小	68G
表类型	Heap 行表
字段类型	所有列为varchar
列宽	41列
索引	无
查询语句	select count(*) from xxx where gjdqdm = 'CHN' and crrqsj >= '20100101000000' and crrqsj <= '20180101000000' and crkadm = '055'

PS:由于要求greenplum中的表数据类型和源表类型一直，且索引一致。所以所有字段都为varchar类型且无索引，因此这方面没有优化空间。

三、优化过程

3.1 结果对比

SQL	ORACLE耗时	greenplum耗时
select count(*) from xxx where gjdqdm = 'CHN' and crrqsj >= '20100101000000' and crrqsj <= '20180101000000' and crkadm = '055'	24S	14.1S

14.1S是不能接受的速度，因此需要查找原因，以期找出性能瓶颈，提供优化方案

3.2 分析过程

3.2.1 查看执行计划

greendao 查询所有 greenplum查询慢_SQL

从①处可以看出，所有的耗时都在③的操作，seq scan上。

这里①处的意思是（摘自官网）：

The numbers that are quoted by EXPLAIN are (left to right):

    Estimated start-up cost (time expended before the output scan can start, e.g., time to do the sorting in a sort node)

    Estimated total cost (if all rows are retrieved, though they might not be; e.g., a query with a LIMIT clause will stop short of paying the total cost of the Limit plan node's input node)

    Estimated number of rows output by this plan node (again, only if executed to completion)

    Estimated average width (in bytes) of rows output by this plan node

③处的意思是：顺序扫描磁盘

从②处可以看出，所有的segment都参与了查询

从④处可以看出，所有的列设置为varchar都进行了类型转换，转成了text，且没有走索引（也无索引能用）

从⑤出可以看出，实际使用的内存远小于分配的内容，所以这里可以判断出问题不在内存