PostgreSQL中的crosstab(行转列函数)

行转列是一个非常有用的功能,如果不适用行转列函数,则通用做法是使用 case when 。不过,接下来,介绍一下今天的新主角–crosstab

一、安装扩展

环境:centos7 、pg10(yum 安装)、扩展组件 tablefunc

crosstab 需要按照扩展tablefunc

[root@uzong ~]# su postgres
bash-4.2$ psql technology postgres
could not change directory to "/root"
psql (9.2.24, server 10.8)
WARNING: psql version 9.2, server version 10.0.
         Some psql features might not work.
Type "help" for help.

technology=# create extension tablefunc;
CREATE EXTENSION
technology=#

二、 使用入门

2.1 crosstab
crosstab(text sql)
crosstab(text sql, int N)

这crosstab 函数被用来生成pivot 展示,即通过横向而不是下拉展示(行转列)。例如下面这个例子。原本类型如下:

row1    val11
row1    val12
row1    val13
...
row2    val21
row2    val22
row2    val23
...

但是我们想要的是下面这种格式:

row1    val11   val12   val13   ...
row2    val21   val22   val23   ...
...

crosstab函数采用text参数,该参数是生成以第一种方式格式化的原始数据的SQL查询,并生成以第二种方式格式化的表。

该sql参数是一个生成源数据集的SQL语句。这个语句必须返回 row_name、category 、value 列。(N这列已经过时,不讨论)

例如:提供的查询可能会产生如下集合:

row_name    cat    value
----------+-------+-------
  row1      cat1    val1
  row1      cat2    val2
  row1      cat3    val3
  row1      cat4    val4
  row2      cat1    val5
  row2      cat2    val6
  row2      cat3    val7
  row2      cat4    val8

crosstab声明该函数返回setof record,因此必须在FROM在调用SELECT语句的子句中定义输出列的实际名称和类型,例如:

SELECT * FROM crosstab('...') AS ct(row_name text, category_1 text, category_2 text);

这个例子产生一个类似于的集合:

row_name   category_1   category_2
----------+------------+------------
  row1        val1         val2
  row2        val5         val6

The FROM clause must define the output as one row_name column (of the same data type as the first result column of the SQL query) followed by N value columns (all of the same data type as the third result column of the SQL query)

该crosstab函数为具有相同row_name值的每个连续输入行组生成一个输出行。它value使用value这些行中的字段从左到右填充输出列。如果组中的行少于输出value列,则额外的输出列将填充空值; 如果有更多行,则跳过额外的输入行。

实际上,SQL查询应始终指定ORDER BY 1,2以确保输入行的顺序正确,即具有相同值的值row_name汇集在一起并在行中正确排序。请注意,crosstab它本身并不关注查询结果的第二列; 它只是在那里被命令,以控制第三列值在页面中出现的顺序。

完整案例

CREATE TABLE ct(id SERIAL, rowid TEXT, attribute TEXT, value TEXT);
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att1','val1');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att2','val2');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att3','val3');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att4','val4');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att1','val5');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att2','val6');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att3','val7');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att4','val8');

SELECT *
FROM crosstab(
  'select rowid, attribute, value
   from ct
   where attribute = ''att2'' or attribute = ''att3''
   order by 1,2')
AS ct(row_name text, category_1 text, category_2 text, category_3 text);

 row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
 test1    | val2       | val3       |
 test2    | val6       | val7       |
(2 rows)

2.2 CrosstabN(Text)
crosstabN(text sql)

其中N的使用是限制数量
实际案例,N是多少,category就输出多少列:

SELECT *
FROM crosstab3(
  'select rowid, attribute, value
   from ct
   where attribute = ''att2'' or attribute = ''att3''
   order by 1,2');
 row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
 test1    | val2       | val3       |
 test2    | val6       | val7       |
(2 rows)

2.3 crosstab(text, text)
crosstab(text source_sql, text category_sql)

单参数形式的主要限制crosstab是它同样处理组中的所有值,将每个值插入第一个可用列。如果您希望值列对应于特定类别的数据,并且某些组可能没有某些类别的数据,那么这样做效果不佳。crosstab这种情况下的双参数形式通过提供与输出列对应的类别的显式列表来处理

category_sql是一个生成类别集的SQL语句。该语句必须只返回一列。它必须至少生成一行,否则将生成错误。此外,它不能生成重复值,否则将生成错误。category_sql可能是这样的:

案例一:

create table sales(year int, month int, qty int);
insert into sales values(2007, 1, 1000);
insert into sales values(2007, 2, 1500);
insert into sales values(2007, 7, 500);
insert into sales values(2007, 11, 1500);
insert into sales values(2007, 12, 2000);
insert into sales values(2008, 1, 1000);

select * from crosstab(
  'select year, month, qty from sales order by 1',
  'select m from generate_series(1,12) m'
) as (
  year int,
  "Jan" int,
  "Feb" int,
  "Mar" int,
  "Apr" int,
  "May" int,
  "Jun" int,
  "Jul" int,
  "Aug" int,
  "Sep" int,
  "Oct" int,
  "Nov" int,
  "Dec" int
);
 year | Jan  | Feb  | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov  | Dec
------+------+------+-----+-----+-----+-----+-----+-----+-----+-----+------+------
 2007 | 1000 | 1500 |     |     |     |     | 500 |     |     |     | 1500 | 2000
 2008 | 1000 |      |     |     |     |     |     |     |     |     |      |
(2 rows)

案例二:

CREATE TABLE cth(rowid text, rowdt timestamp, attribute text, val text);
INSERT INTO cth VALUES('test1','01 March 2003','temperature','42');
INSERT INTO cth VALUES('test1','01 March 2003','test_result','PASS');
INSERT INTO cth VALUES('test1','01 March 2003','volts','2.6987');
INSERT INTO cth VALUES('test2','02 March 2003','temperature','53');
INSERT INTO cth VALUES('test2','02 March 2003','test_result','FAIL');
INSERT INTO cth VALUES('test2','02 March 2003','test_startdate','01 March 2003');
INSERT INTO cth VALUES('test2','02 March 2003','volts','3.1234');

SELECT * FROM crosstab
(
  'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
  'SELECT DISTINCT attribute FROM cth ORDER BY 1'
)
AS
(
       rowid text,
       rowdt timestamp,
       temperature int4,
       test_result text,
       test_startdate timestamp,
       volts float8
);
 rowid |          rowdt           | temperature | test_result |      test_startdate      | volts
-------+--------------------------+-------------+-------------+--------------------------+--------
 test1 | Sat Mar 01 00:00:00 2003 |          42 | PASS        |                          | 2.6987
 test2 | Sun Mar 02 00:00:00 2003 |          53 | FAIL        | Sat Mar 01 00:00:00 2003 | 3.1234
(2 rows)

您可以创建预定义函数,以避免在每个查询中写出结果列名称和类型。像上面一样,SELECT DISTINCT attribute FROM cth ORDER BY 1


2.4 Connectby
connectby(text relname, text keyid_fld, text parent_keyid_fld
          [, text orderby_fld ], text start_with, int max_depth
          [, text branch_delim ])

该connectby函数生成存储在表中的分层数据的显示。该表必须具有唯一标识行的键字段,以及引用每行的父(如果有)的父键字段。connectby可以显示从任何行下降的子树。
参数描述

参数

描述

relname

源关系的名称

keyid_fld

关键字段的名称

parent_keyid_fld

父键字段的名称

orderby_fld

要对同级排序的字段的名称(可选)

start_with

要开始的行的键值

max_depth

下降到的最大深度,或无限深度的零

branch_delim

用于在分支输出中分隔键的字符串(可选)

键和父键字段可以是任何数据类型,但它们必须是相同的类型。请注意start_with,无论键字段的类型如何,都必须将该值作为文本字符串输入

connectby声明该函数返回setof record,因此必须FROM在调用SELECT语句的子句中定义输出列的实际名称和类型,例如:

SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
    AS t(keyid text, parent_keyid text, level int, branch text, pos int);

使用注意事项

  • branch_delim 不要出现任何的键值,否者connectby 会报一个无线递归错误,默认提供 ~(It is important that the branch_delim string not appear in any key values, else connectby may incorrectly report an infinite-recursion error. Note that if branch_delim is not provided, a default value of ~ is used for recursion detection purposes.)
  • 在大表中,除非父节点有索引(In large tables, performance will be poor unless there is an index on the parent-key field.)
  • 如果同一父节点的兄弟节点的顺序很重要,请包含orderby_fld参数以指定要按兄弟节点排序的字段。该字段可以是任何可排序的数据类型。输出列列表必须包含最后一个整数序列号列,当且仅当orderby_fld指定时。
CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int);

INSERT INTO connectby_tree VALUES('row1',NULL, 0);
INSERT INTO connectby_tree VALUES('row2','row1', 0);
INSERT INTO connectby_tree VALUES('row3','row1', 0);
INSERT INTO connectby_tree VALUES('row4','row2', 1);
INSERT INTO connectby_tree VALUES('row5','row2', 0);
INSERT INTO connectby_tree VALUES('row6','row4', 0);
INSERT INTO connectby_tree VALUES('row7','row3', 0);
INSERT INTO connectby_tree VALUES('row8','row6', 0);
INSERT INTO connectby_tree VALUES('row9','row5', 0);

-- 带分支,没有orderby_fld(不保证结果的顺序)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
 AS t(keyid text, parent_keyid text, level int, branch text);
 keyid | parent_keyid | level |       branch
-------+--------------+-------+---------------------
 row2  |              |     0 | row2
 row4  | row2         |     1 | row2~row4
 row6  | row4         |     2 | row2~row4~row6
 row8  | row6         |     3 | row2~row4~row6~row8
 row5  | row2         |     1 | row2~row5
 row9  | row5         |     2 | row2~row5~row9
(6 rows)

-- 没有分支,没有orderby_fld(不保证结果的顺序)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
 AS t(keyid text, parent_keyid text, level int);
 keyid | parent_keyid | level
-------+--------------+-------
 row2  |              |     0
 row4  | row2         |     1
 row6  | row4         |     2
 row8  | row6         |     3
 row5  | row2         |     1
 row9  | row5         |     2
(6 rows)

-- 带分支,有排序。(notice that row5 comes before row4) 
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
 AS t(keyid text, parent_keyid text, level int, branch text, pos int);
 keyid | parent_keyid | level |       branch        | pos
-------+--------------+-------+---------------------+-----
 row2  |              |     0 | row2                |   1
 row5  | row2         |     1 | row2~row5           |   2
 row9  | row5         |     2 | row2~row5~row9      |   3
 row4  | row2         |     1 | row2~row4           |   4
 row6  | row4         |     2 | row2~row4~row6      |   5
 row8  | row6         |     3 | row2~row4~row6~row8 |   6
(6 rows)

-- without branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
 AS t(keyid text, parent_keyid text, level int, pos int);
 keyid | parent_keyid | level | pos
-------+--------------+-------+-----
 row2  |              |     0 |   1
 row5  | row2         |     1 |   2
 row9  | row5         |     2 |   3
 row4  | row2         |     1 |   4
 row6  | row4         |     2 |   5
 row8  | row6         |     3 |   6
(6 rows)

参考

https://www.postgresql.org/docs/10/tablefunc.html