概述

今天说的这个内容其实是窗口函数中常用的一个, 原来都是用默认的,由于今天我遇到的了一个比较奇葩的问题,比较纠结这个小的知识点,自己说不清相关问题因此深入研究了一下

官网文档:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

格式:

窗口函数| 聚合函数 OVER (PARTITION BY XXX order by xx DDDDDDD)

今天我们说的就是DDDDDDD 部分, 这部分的作用是什么的,就是窗口函数的小集合,此时会影响窗口函数或者聚合函数的取值

语法

unbounded 无边界
preceding 往前
following 往后
unbounded preceding 往前所有行,即初始行
n preceding 往前n行
unbounded following 往后所有行,即末尾行
n following 往后n行
current row 当前行
 
语法
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING
  1. rows between … and …

rows:指以行号来决定frame的范围,是物理意义上的行。

比如rows between 1 preceding and 1 following代表从当前行往前一行以及往后一行。

  1. range between … and …

range:指以当前行在开窗函数中的值为根基,然后按照order by进行排序,最后根据range去加减上下界。是逻辑意义上的行。

比如sum(score) over (PARTITION by id order by score RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) 表示按照id分组,按照score升序排序,然后以当前行的score,下界减一,上界加一,作为范围,将这范围里的score进行加总。

讲的比较拗口,下面看个例子就懂了。

demo 1

HIVE SQL 聚合函数与 rows between / range between详解_hadoop

SELECT id, score,
sum(score) over (PARTITION by id) as a1,
sum(score) over (PARTITION by id order by score) as a2,
sum(score) over (PARTITION by id order by score ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as a3,
sum(score) over (PARTITION by id order by score ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as a4,
sum(score) over (PARTITION by id order by 1) as a5
from datadev.t_student;

HIVE SQL 聚合函数与 rows between / range between详解_sql_02

SELECT id, score,
sum(score) over (PARTITION by id) as a1,
sum(score) over (PARTITION by id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as a1,
sum(score) over (PARTITION by id order by score) as a2,
sum(score) over (PARTITION by id order by score RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as a2,
from datadev.t_student;

测试 range between … and …

SELECT id, score,
sum(score) over (PARTITION by id order by score RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as b1,
sum(score) over (PARTITION by id order by score RANGE BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING) as b2
from datadev.t_student;

HIVE SQL 聚合函数与 rows between / range between详解_hive_03


HIVE SQL 聚合函数与 rows between / range between详解_hive_04


HIVE SQL 聚合函数与 rows between / range between详解_窗口函数_05


HIVE SQL 聚合函数与 rows between / range between详解_sql_06