SQL 子查询和临时表格
原创
©著作权归作者所有:来自51CTO博客作者mb63774a171e569的原创作品,请联系作者获取转载授权,否则将追究法律责任
首个子查询解决方案
- 首先,我们需要按照日期和渠道分组。然后按事件数(第三列)排序,这样可以快速得出第一个问题的答案。
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC;
- 可以看出,要获得这一结果,提供了整个原始表格。查询的附加部分包括
*
,并且我们需要为表格设置别名。此外,是在 SELECT 语句中(而不是 FROM)中提供表格。
SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC) sub;
- 最后,我们在以下语句中能够获得显示每个渠道一天的平均事件数的表格。
SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;
格式清晰的查询
与之前的示例相比,在这个格式清晰的示例中,我们很容易就看出要从哪个表格中获取数据。此外,如果在子查询后面有 GROUP BY、ORDER BY、WHERE、HAVING 或任何其他语句,则按照外部查询的同一级别缩进,正如最后一个示例所显示的,它是上个练习的最后一个解决方案。
SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC) sub;
下面的查询很相似,但是向外部查询应用了其他逻辑,因此按照外部查询的级别缩进。而内部查询逻辑的缩进级别与内部表格匹配。
SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC) sub
GROUP BY channel
ORDER BY 2 DESC;
最后两个查询容易读懂多了!
- 以下是从 orders 表格中获取第一个订单的年/月信息的查询。
SELECT DATE_TRUNC('month', MIN(occurred_at))
FROM orders;
- 然后,为了获取每个订单的平均值,我们可以在一个查询中执行所有的任务。但是为了便于阅读,我在下面提供了两个查询,单独执行每一步。
SELECT AVG(standard_qty) avg_std, AVG(gloss_qty) avg_gls, AVG(poster_qty) avg_pst
FROM orders
WHERE DATE_TRUNC('month', occurred_at) =
(SELECT DATE_TRUNC('month', MIN(occurred_at)) FROM orders);
SELECT SUM(total_amt_usd)
FROM orders
WHERE DATE_TRUNC('month', occurred_at) =
(SELECT DATE_TRUNC('month', MIN(occurred_at)) FROM orders);
- 提供每个区域拥有最高销售额 (total_amt_usd) 的销售代表的姓名。
首先,我要算出与每个销售代表相关的总销售额 (total_amt_usd),并且要得出他们所在的区域。以下查询提供了这一信息。
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC;
- 接着,得出每个区域的最高销售额,然后使用该信息从最终结果中获取这些行。
SELECT region_name, MAX(total_amt) total_amt
FROM(SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1, 2) inner1
GROUP BY 1;
- 本质上,这是两个表格的连接,其中区域和销售额相匹配。
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM(SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC) t1
JOIN (SELECT region_name, MAX(total_amt) total_amt
FROM(SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1, 2) inner1
GROUP BY 1) t2
ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;
- 对于具有最高销售额 (total_amt_usd) 的区域,总共下了多少个订单?
我写的第一个查询是获取每个区域的 total_amt_usd。
SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name;
- 然后,我们仅从该表格中获取销售额最高的区域。可以通过两种方法来获取,一种是使用子查询后的最大值,另一种是按降序排序,然后获取最高值。
SELECT MAX(total_amt)
FROM (SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name) sub;
- 最终,我们要获取具有该区域销售额的总订单量:
SELECT r.name, SUM(o.total) total_orders
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name
HAVING SUM(o.total_amt_usd) = (
SELECT MAX(total_amt)
FROM (SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name) sub);
- 结果就是 Northeast,总订单为 1230378 个。
- 对于购买标准纸张数量 (standard_qty) 最多的客户(在作为客户的整个时期内),有多少客户的购买总数依然更多?
首先,我们要得出购买标准纸张数量 (standard_qty) 最多的客户。以下查询获取了该客户,以及总消费:
SELECT a.name account_name, SUM(o.standard_qty) total_std, SUM(o.total) total
FROM accounts a
JOIN orders o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1;
现在,我将使用上述信息获取总消费更高的所有客户:
SELECT a.name
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY 1
HAVING SUM(o.total) > (SELECT total
FROM (SELECT a.name act_name, SUM(o.standard_qty) tot_std, SUM(o.total) total
FROM accounts a
JOIN orders o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1) sub);
上述查询列出了具有更多订单的客户列表。我们还可以使用另一个简单的子查询获取数量。
SELECT COUNT(*)
FROM (SELECT a.name
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY 1
HAVING SUM(o.total) > (SELECT total
FROM (SELECT a.name act_name, SUM(o.standard_qty) tot_std, SUM(o.total) total
FROM accounts a
JOIN orders o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1) inner_tab)
) counter_tab;
- 对于(在作为客户的整个时期内)总消费 (total_amt_usd) 最多的客户,他们在每个渠道上有多少 web_events?
我们首先需要获取在整个客户时期内消费最多的客户。
SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 1;
现在,我们要获取该企业(可以使用 id 进行匹配)在每个渠道上的事件数。
SELECT a.name, w.channel, COUNT(*)
FROM accounts a
JOIN web_events w
ON a.id = w.account_id AND a.id = (SELECT id
FROM (SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 1) inner_table)
GROUP BY 1, 2
ORDER BY 3 DESC;
我添加了 ORDER BY,其实并没特别的理由,并添加了客户名称,确保仅从一个客户那获取数据。
- 对于总消费前十名的客户,他们的平均终身消费 (total_amt_usd) 是多少?
首先,我们需要找出总消费 (total_amt_usd) 在前十名的客户。
SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 10;
- 现在计算这十个客户的平均消费。
SELECT AVG(tot_spent)
FROM (SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 10) temp;
- 比所有客户的平均消费高的企业平均终身消费 (total_amt_usd) 是多少?
首先,算出所有客户的总消费 (total_amt_usd) 平均值:
SELECT AVG(o.total_amt_usd) avg_all
FROM orders o
JOIN accounts a
ON a.id = o.account_id;
- 然后,只获取高于这一平均值的客户。
SELECT o.account_id, AVG(o.total_amt_usd)
FROM orders o
GROUP BY 1
HAVING AVG(o.total_amt_usd) > (SELECT AVG(o.total_amt_usd) avg_all
FROM orders o
JOIN accounts a
ON a.id = o.account_id);
- 最后,算出这些值的平均值。
SELECT AVG(avg_amt)
FROM (SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
FROM orders o
GROUP BY 1
HAVING AVG(o.total_amt_usd) > (SELECT AVG(o.total_amt_usd) avg_all
FROM orders o
JOIN accounts a
ON a.id = o.account_id)) temp_table;
第一个 WITH (CTE)
下面是“你的第一个子查询”部分的问题和解决方案。
问题:你需要算出每个渠道每天的平均事件数。
解决方案:
SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;
我们使用 WITH 语句重新编写查询。
注意:你可以获取内部查询:
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
我们在此部分放入 WITH 语句。注意,在下面我们将表格的别名设为 events
:
WITH events AS (
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2)
现在,我们可以像对待数据库中的任何其他表格一样使用这个新创建的 events
表格:
WITH events AS (
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2)
SELECT channel, AVG(events) AS average_events
FROM events
GROUP BY channel
ORDER BY 2 DESC;
对于上述示例,我们只需一个额外的表格,但是想象下我们要创建第二个表格来从中获取数据。我们可以按照以下方式来创建额外的表格并从中获取数据:
WITH table1 AS (
SELECT *
FROM web_events),
table2 AS (
SELECT *
FROM accounts)
SELECT *
FROM table1
JOIN table2
ON table1.account_id = table2.id;
然后,你可以按照相同的方式使用 WITH 语句添加越来越多的表格。底部的练习将确保你掌握了这些新查询的所有必要组成部分。
WITH 解决方案
以下是使用 WITH 条件重新编写的之前问题的答案。这些查询通常更容易读懂。
- 提供每个区域拥有最高销售额 (total_amt_usd) 的销售代表的姓名。
WITH t1 AS (
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC),
t2 AS (
SELECT region_name, MAX(total_amt) total_amt
FROM t1
GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;
- 对于具有最高销售额 (total_amt_usd) 的区域,总共下了多少个订单?
WITH t1 AS (
SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name),
t2 AS (
SELECT MAX(total_amt)
FROM t1)
SELECT r.name, SUM(o.total) total_orders
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY r.name
HAVING SUM(o.total_amt_usd) = (SELECT * FROM t2);
- 对于购买标准纸张数量 (standard_qty) 最多的客户(在作为客户的整个时期内),有多少客户的购买总数依然更多?
WITH t1 AS (
SELECT a.name account_name, SUM(o.standard_qty) total_std, SUM(o.total) total
FROM accounts a
JOIN orders o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1),
t2 AS (
SELECT a.name
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY 1
HAVING SUM(o.total) > (SELECT total FROM t1))
SELECT COUNT(*)
FROM t2;
- 对于(在作为客户的整个时期内)总消费 (total_amt_usd) 最多的客户,他们在每个渠道上有多少 web_events?
WITH t1 AS (
SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 1)
SELECT a.name, w.channel, COUNT(*)
FROM accounts a
JOIN web_events w
ON a.id = w.account_id AND a.id = (SELECT id FROM t1)
GROUP BY 1, 2
ORDER BY 3 DESC;
- 对于总消费前十名的客户,他们的平均终身消费 (total_amt_usd) 是多少?
WITH t1 AS (
SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY a.id, a.name
ORDER BY 3 DESC
LIMIT 10)
SELECT AVG(tot_spent)
FROM t1;
- 比所有客户的平均消费高的企业平均终身消费 (total_amt_usd) 是多少?
WITH t1 AS (
SELECT AVG(o.total_amt_usd) avg_all
FROM orders o
JOIN accounts a
ON a.id = o.account_id),
t2 AS (
SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
FROM orders o
GROUP BY 1
HAVING AVG(o.total_amt_usd) > (SELECT * FROM t1))
SELECT AVG(avg_amt)
FROM t2;