目录
- 1. select查询
- 1.1 union、intersect、except
- 1.2 grouping set、rollup、cube
- 1.3 with内联视图
- 1.4 子查询
- 2. SQL函数
- 2.1 over (partition by col order by col)
1. select查询
语法如下:
[ WITH [ RECURSIVE ] with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expression [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition]
[ WINDOW window_definition_list]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ OFFSET count [ ROW | ROWS ] ]
[ LIMIT { count | ALL } ]
[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
- having子句在group by之后计算,作用于分组聚合后的行
- join默认是inner join,还是left outer join、right outer join、full outer join、cross join(对于from a, b where a.key = b.key,前半部分是cronss join)
基础的select例子如下:
trino>
trino> select upper(b.name) customer_name, round(sum(totalprice), 1) total_price
-> from tpch.sf1.orders a join tpch.sf1.customer b on a.custkey = b.custkey
-> where extract(year from a.orderdate) = 1997 or b.nationkey = 1
-> group by b.name
-> having round(sum(totalprice), 1) > 10000
-> order by total_price desc limit 3;
customer_name | total_price
--------------------+-------------
CUSTOMER#000111280 | 5776853.7
CUSTOMER#000126199 | 5682036.3
CUSTOMER#000015751 | 5594552.2
(3 rows)
Query 20220109_222708_00100_jiymz, FINISHED, 2 nodes
Splits: 14 total, 14 done (100.00%)
4.40 [1.95M rows, 13.2MB] [443K rows/s, 2.99MB/s]
trino>
1.1 union、intersect、except
- union all:合并所有的行,不去除重复行
- union:合并所有的行,去除重复行
- intersect:返回两个查询中都有的行,但去除重复行
- except:返回只有第一个查询中有的行,但去除重复行
trino>
trino> select * from (values (1, 2), (1, 2), (2, 3)) except select * from (values (2, 3), (3, 4));
_col0 | _col1
-------+-------
1 | 2
(1 row)
Query 20220109_223715_00107_jiymz, FINISHED, 2 nodes
Splits: 9 total, 9 done (100.00%)
0.98 [0 rows, 0B] [0 rows/s, 0B/s]
trino>
1.2 grouping set、rollup、cube
分别对每个分组组合进行聚合计算,结果是所有计算结果的联合,没有值的列,则为NULL
例如对于聚合列(a, b, c):
- rollup(a, b, c)生成基于层次结构的分组组合(a, b, c), (a, b), (a), ()
- cube(a, b, c)生成所有可能的分组组合(a, b, c), (a, b), (a, c), (b, c), (a), (b), ©, ()
- grouping sets可以自定义分组组合,如grouping sets((a, b, c), ())
trino>
trino> select orderstatus, orderpriority, shippriority, sum(totalprice) totalprice, grouping(orderstatus, orderpriority, shippriority) as group_id
-> from tpch.sf1.orders
-> group by grouping sets((orderstatus, orderpriority, shippriority), ())
-> order by group_id;
orderstatus | orderpriority | shippriority | totalprice | group_id
-------------+-----------------+--------------+-----------------------+----------
O | 5-LOW | 0 | 2.2075989903380028E10 | 0
O | 1-URGENT | 0 | 2.202843787702009E10 | 0
F | 5-LOW | 0 | 2.1997577898639805E10 | 0
F | 4-NOT SPECIFIED | 0 | 2.194042389442989E10 | 0
O | 2-HIGH | 0 | 2.2042604367650177E10 | 0
F | 3-MEDIUM | 0 | 2.1777505918090096E10 | 0
P | 4-NOT SPECIFIED | 0 | 1.4212495106299987E9 | 0
P | 3-MEDIUM | 0 | 1.4197204557000003E9 | 0
P | 5-LOW | 0 | 1.427590893769999E9 | 0
O | 3-MEDIUM | 0 | 2.1956381714669975E10 | 0
O | 4-NOT SPECIFIED | 0 | 2.1914360578039955E10 | 0
P | 1-URGENT | 0 | 1.402424471659998E9 | 0
F | 2-HIGH | 0 | 2.1999039814129982E10 | 0
F | 1-URGENT | 0 | 2.1987867088400032E10 | 0
P | 2-HIGH | 0 | 1.4381320612500029E9 | 0
NULL | NULL | NULL | 2.2682930644745822E11 | 7
(16 rows)
Query 20220109_230647_00123_jiymz, FINISHED, 2 nodes
Splits: 11 total, 11 done (100.00%)
1:16 [1.5M rows, 0B] [19.9K rows/s, 0B/s]
trino>
- grouping(col1, col2, col3)函数用于标识一行属于哪个分组,计算出来的group id可能不是连续的
1.3 with内联视图
用于在一个查询中定义一个内联视图,供多次查询访问
trino>
trino> with
-> total as (
-> select mktsegment,
-> sum(acctbal) as total_per_mktsegment
-> from tpch.sf1.customer
-> group by 1
-> ),
-> average as (
-> select avg(total_per_mktsegment) as mktsegment_average
-> from total
-> )
-> select mktsegment,
-> total_per_mktsegment,
-> mktsegment_average
-> from total,
-> average
-> where total_per_mktsegment > mktsegment_average;
mktsegment | total_per_mktsegment | mktsegment_average
------------+----------------------+--------------------
HOUSEHOLD | 1.358733411699999E8 | 1.34865369948E8
BUILDING | 1.3588862194000012E8 | 1.34865369948E8
(2 rows)
Query 20220109_231949_00129_jiymz, FINISHED, 2 nodes
Splits: 19 total, 19 done (100.00%)
3.74 [300K rows, 0B] [80.1K rows/s, 0B/s]
trino>
average内联视图引用了total内联视图,所以total内联视图被引用了两次。
Presto/Trino会基于代价的计算,判断是否将total内联视图的结果进行保存缓存
1.4 子查询
1. 标量子查询
子查询返回单个值(一行一列), 子查询只计算一次
trino>
trino> select name from tpch.sf1.nation
-> where regionkey = (
-> select regionkey from tpch.sf1.region where name = 'AMERICA'
-> );
name
---------------
ARGENTINA
BRAZIL
CANADA
PERU
UNITED STATES
(5 rows)
Query 20220109_233108_00136_jiymz, FINISHED, 2 nodes
Splits: 11 total, 11 done (100.00%)
1.15 [32 rows, 116B] [27 rows/s, 101B/s]
trino>
2. exists子查询
- in子查询:如果一个列的值存在exists子查询的结果中,则为true
- not in子查询:如果一个列的值不存在exists子查询的结果中,则为true
trino>
trino> select name from tpch.sf1.nation
-> where regionkey not in (
-> select regionkey from tpch.sf1.region
-> );
name
------
(0 rows)
Query 20220110_001101_00000_jiymz, FINISHED, 2 nodes
Splits: 9 total, 9 done (100.00%)
0.31 [40 rows, 244B] [128 rows/s, 785B/s]
trino>
3. 集合比较子查询
较的运算符除了=,还有<、>、<=、>=、=、<>
集合比较子查询的查询方式有两种:
- any:表示只需满足子查询中的一个值,
= any
等价于in
子查询
trino>
trino> select name from tpch.sf1.nation where regionkey = any (select regionkey from tpch.sf1.region);
name
----------------
ALGERIA
ARGENTINA
......省略部分......
UNITED KINGDOM
UNITED STATES
(25 rows)
Query 20220110_001814_00005_jiymz, FINISHED, 2 nodes
Splits: 13 total, 13 done (100.00%)
0.37 [40 rows, 328B] [107 rows/s, 879B/s]
trino>
trino> select name from tpch.sf1.nation where regionkey in (select regionkey from tpch.sf1.region);
name
----------------
ALGERIA
ARGENTINA
......省略部分......
UNITED KINGDOM
UNITED STATES
(25 rows)
Query 20220110_001821_00006_jiymz, FINISHED, 2 nodes
Splits: 13 total, 13 done (100.00%)
0.36 [40 rows, 328B] [111 rows/s, 916B/s]
trino>
对于any, 如果子查询中的一个值为NULL,则对应的集合比较子查询的结果是NULL(不是true和false)。合并结果时一个false和NULL进行合并, 则计算结果为NULL
- all:表示必须满足子查询的所有值
对于all,如果子查询为空,则集合比较子查询的结果为true。如果子查询中的一个值为NULL,则对应的集合比较子查询的结果是NULL(不是true和false)。合并结果时一个true和NULL进行合并, 则计算结果为NULL
2. SQL函数
2.1 over (partition by col order by col)
生成测试数据
trino> create schema memory.test_db;
CREATE SCHEMA
trino>
trino>
trino> create table memory.test_db.test_tb(
-> user_id bigint,
-> buy_year int,
-> buy_amount int
-> );
CREATE TABLE
trino>
trino>
trino> insert into memory.test_db.test_tb(user_id, buy_year, buy_amount)
-> values(1, 2012, 30),
-> (1, 2010, 20),
-> (1, 2009, 10),
-> (2, 2022, 300),
-> (2, 2020, 200),
-> (2, 2019, 100);
INSERT: 6 rows
Query 20220210_001404_00009_jx84g, FINISHED, 2 nodes
Splits: 5 total, 5 done (100.00%)
0.24 [0 rows, 0B] [0 rows/s, 0B/s]
trino>
1. 开窗求迭代累加和
trino>
trino> select user_id, buy_year, buy_amount,
-> sum(buy_amount) over (partition by user_id order by buy_year) as user_iteration_sum_buy_amount
-> from memory.test_db.test_tb;
user_id | buy_year | buy_amount | user_iteration_sum_buy_amount
---------+----------+------------+-------------------------------
1 | 2009 | 10 | 10
1 | 2010 | 20 | 30
1 | 2012 | 30 | 60
2 | 2019 | 100 | 100
2 | 2020 | 200 | 300
2 | 2022 | 300 | 600
(6 rows)
Query 20220210_001707_00010_jx84g, FINISHED, 1 node
Splits: 4 total, 4 done (100.00%)
0.24 [6 rows, 114B] [24 rows/s, 469B/s]
trino>
2. 开窗获取前N行的值
lag(col, offset)中的offset值为0或正整数,如果offset为0,表示当前行;offset为2,表示前面第二行
trino>
trino> select user_id, buy_year, buy_amount,
-> lag(buy_amount, 1) over (partition by user_id order by buy_year) as previous_buy_amount
-> from memory.test_db.test_tb;
user_id | buy_year | buy_amount | previous_buy_amount
---------+----------+------------+---------------------
1 | 2009 | 10 | NULL
1 | 2010 | 20 | 10
1 | 2012 | 30 | 20
2 | 2019 | 100 | NULL
2 | 2020 | 200 | 100
2 | 2022 | 300 | 200
(6 rows)
Query 20220210_001915_00011_jx84g, FINISHED, 1 node
Splits: 4 total, 4 done (100.00%)
0.27 [6 rows, 114B] [21 rows/s, 418B/s]
trino>