(Introduction)

In the article, Multiple Options to Transposing Rows into Columns, I covered various options available in SQL Server to rotating a given row into columns. One of the options included the use of a PIVOT relational operator. The mandatory requirement of the operator is that you must supply the aggregate function with only a numeric data type. Such a mandatory requirement is usually not an issue as most aggregations and subsequent pivoting is performed against fields of numeric data type. However, sometimes the nature of business reporting requests may be such that you are required to cater for pivoting against non-numeric data types. In this article we take a look at how you can deal with such requirements by introducing a workaround to pivoting on non-numeric fields.

在文章“ 将行转换为列的多个选项”中 ,我介绍了SQL Server中将给定的行旋转为列的各种选项。 选项之一包括使用PIVOT关系运算符。 操作员的强制性要求是,必须仅向聚合函数提供数字数据类型 。 这种强制性要求通常不会成为问题,因为大多数聚合和随后的数据透视都是针对数字数据类型的字段执行的。 但是,有时业务报告请求的性质可能会要求您满足针对非数字数据类型的需求。 在本文中,我们介绍了如何通过解决非数字字段的变通方法来满足这些要求。

Similarly to the aforementioned article, a denormalized relational representation of our case study looks as shown in Table 1:

与上述文章类似,我们的案例研究的非规范化关系表示如表1所示:

RecKey

Policy

PolType

Effective Date

DocID

DocName

Submitted

Outstanding

Submitted Date

Captured By ID

1

Pol002

Hospital Cover

2007/10/01

1

Doc A

NULL

1

NULL

NULL

2

Pol002

Hospital Cover

2007/10/01

4

Doc B

NULL

1

NULL

NULL

3

Pol002

Hospital Cover

2007/10/01

5

Doc C

1

NULL

2016/01/13

1

4

Pol002

Hospital Cover

2007/10/01

7

Doc D

1

NULL

2016/01/14

2

5

Pol002

Hospital Cover

2007/10/01

10

Doc E

1

NULL

2016/01/15

1

记录键

政策

类型

生效日期

DocID

文件名

已提交

优秀

提交日期

被ID捕获

1个

Pol002

医院保险

2007/10/01

1个

文件A

空值

1个

空值

空值

2

Pol002

医院保险

2007/10/01

4

文件B

空值

1个

空值

空值

3

Pol002

医院保险

2007/10/01

5

文件C

1个

空值

2016/01/13

1个

4

Pol002

医院保险

2007/10/01

7

文件D

1个

空值

2016/01/14

2

5

Pol002

医院保险

2007/10/01

10

文件E

1个

空值

2016/01/15

1个

(Workaround #1: Pivot on Dates)

A simple business case that can be used to address this workaround is that business may be trying to determine a breakdown of documents captured by date. Ultimately, the aim of such a business case would to have a view of data that looks as shown in Table 2:

一个可以用来解决此变通办法的简单业务案例是,企业可能正在尝试确定按日期捕获的文档明细。 最终,这种业务案例的目的是对数据进行查看,如表2所示:

Policy

PolType

Effective Date

Doc A

Doc B

Doc C

Doc D

Doc E

Pol002

Hospital Cover

01-Oct-07

0

0

13 Jan 2016

14 Jan 2016

15 Jan 2016

政策

类型

生效日期

文件A

文件B

文件C

文件D

文件E

Pol002

医院保险

2007年10月1日

0

0

2016年1月13日

2016年1月14日

2016年1月15日

The first step to pivoting on date fields is to convert your date field to an integer value. Figure 1 shows how we have achieved this through a derived integer field, [Submitted Date INT], which is based off the [Submitted Date] field.

透视日期字段的第一步是将日期字段转换为整数值。 图1显示了我们如何通过基于[Submitted Date]字段的派生整数字段[Submitted Date INT]实现此目的。

hivesql 查询非纯数值的数据_hivesql 查询非纯数值的数据

The complete pivot script is depicted in Figure 2:

完整的数据透视脚本如图2所示 :

hivesql 查询非纯数值的数据_java_02

The execution of the script in Figure 3 result into the data shown in Table 3:

图3中脚本的执行导致表3中显示的数据:

Policy

PolType

Effective Date

Doc A

Doc B

Doc C

Doc D

Doc E

Pol002

Hospital Cover

01-Oct-07

0

0

20160113

20160114

20160115

政策

类型

生效日期

文件A

文件B

文件C

文件D

文件E

Pol002

医院保险

2007年10月1日

0

0

20160113

20160114

20160115

However, often business people prefer to view data related dates in a familiar format (i.e. dd/mm/yyyy, dd-mm-yyyy etc.). So we can further extend our script to format the dates according to business friendly format.

但是,通常商务人士倾向于以熟悉的格式(例如dd / mm / yyyy , dd-mm-yyyy等)查看与数据相关的日期。 因此,我们可以进一步扩展脚本以根据业务友好格式格式化日期。

There are two ways to achieve this:

有两种方法可以实现此目的:

1. Formatting SQL Server Date Using Date Dimension

1.使用日期维格式化SQL Server日期

If you write your scripts within a data warehouse or have access to a data warehouse or have a database environment that has a date dimension, then you can make use of the dimension to format the date. Date dimension is a database object that is setup according to a Ralph Kimball’s Data Warehouse design methodology. The dimension usually contain attributes such as date name, quarter, year, and calendar months. Date dimension allows for the breakdown of report data by period over time.

如果您在数据仓库中编写脚本或可以访问数据仓库或具有日期维度的数据库环境,则可以使用该维度来格式化日期。 日期维度是根据Ralph Kimball的数据仓库设计方法设置的数据库对象。 该维通常包含诸如日期名称,季度,年份和日历月之类的属性。 日期维度允许按时间段细分报告数据。

Figure 3 shows some of the content contained in a date dimension.

图3显示了日期维度中包含的某些内容。

hivesql 查询非纯数值的数据_数据库_03

One of the common ways to lookup data in the date dimension is to use a numeric date key. In our case, we already have derived date column ([Submitted Date INT]) which is in an integer format. Figure 4, shows our revised pivoted script joined to the date dimension.

在日期维度中查找数据的常用方法之一是使用数字日期键。 在我们的例子中,我们已经获得了整数格式的派生日期列( [Submitted Date INT] )。 图4显示了修改后的透视脚本,该脚本已加入日期维度。

As it can be seen there are several instances of the dim date to the fields that we are using as pivots. For instance, the first condition is to join dim date to doc a and so on. As argued in the aforementioned article, this way of doing this only suffer from one thing: it is not dynamic – meaning, if business later introduce Doc F as a document that should be submitted in order to process a claim, than a developer would have to refactor the script to include the new doc column and another join condition to dim date to display a custom date format for the newly added column.

可以看出,有几个实例将日期设为我们用作透视的日期。 例如,第一个条件是将暗淡的日期加入到doc a中,依此类推。 如前文所述,这样做的方法只有一件事:它不是动态的-意味着,如果企业后来将Doc F引入为处理索赔而应提交的文档,那么开发人员就不会这样做。重构脚本以包括新的doc列和另一个加入条件以暗化日期以显示新添加的列的自定义日期格式。

hivesql 查询非纯数值的数据_数据库_04

The results of the script depicted in Figure 4 are shown in Table 4:

表4中显示了图4中描述的脚本的结果:

Policy

PolType

Effective Date

Doc A

Doc B

Doc C

Doc D

Doc E

Pol002

Hospital Cover

01 Oct 2007

0

0

13 Jan 2016

14 Jan 2016

15 Jan 2016

政策

类型

生效日期

文件A

文件B

文件C

文件D

文件E

Pol002

医院保险

2007年10月1日

0

0

2016年1月13日

2016年1月14日

2016年1月15日

2. Convert Integer to SQL Server Date

2.将整数转换为SQL Server日期

An alternative to converting a SQL Server date is not to use a join to date dimension instead you convert the numeric values back to business friendly date using T-SQL Convert function. This is done through a nested convert function in which you first convert the numeric value to variable character (varchar), then convert varchar value to date (which is usually in the format yyyy-mm-dd) and finally convert the format of the date to dd mmm yyyy. The complete syntax looks as shown in Figure 5:

转换SQL Server日期的一种替代方法是不使用联接日期维度,而是使用T-SQL转换函数将数值转换回业务友好日期。 这是通过嵌套的转换函数完成的,在该函数中,您首先将数字值转换为变量字符(varchar),然后将varchar值转换为日期(通常为yyyy-mm-dd格式),最后转换为日期格式到dd mmm yyyy 。 完整的语法如图5所示:

hivesql 查询非纯数值的数据_大数据_05

The complete script is shown in Figure 6.

完整的脚本如图6所示。

hivesql 查询非纯数值的数据_大数据_06

(Workaround #2: Pivot on Varchar Data Types)

In my experience working within an insurance industry, I have noticed that there are often business users whose job is solely to receive and capture claim documents submitted by clients. Thus, another fitting business case would be to have a breakdown of claim documents captured by users as shown in Table 5.

根据我在保险业工作的经验,我注意到经常有一些业务用户仅负责接收和捕获客户提交的索赔文件。 因此,另一个合适的商业案例是对用户捕获的索赔文件进行细分,如表5所示。

PolicyPolTypeEffective DateDoc ADoc BDoc CDoc DDoc E

Pol002

Hospital Cover

01 Oct 2007

0

0

Jane Doe

John Doe

Jane Doe

策略PolType生效日期Doc ADoc BDoc CDoc DDoc E

Pol002

医院保险

2007年10月1日

0

0

简·多伊

约翰·杜

简·多伊

Similar to pivoting on date data types, the key to pivoting on varchar is identifying a field that can be used as an input parameter in the aggregate function. In our case, every user is assigned a numeric user ID as shown in Figure 7 (this user id can then be used to look-up user full name).

与透视日期数据类型相似,透视varchar的关键是识别可用作聚合函数中输入参数的字段。 在我们的例子中,为每个用户分配了一个数字用户ID, 如图7所示(然后可以使用该用户ID查找用户全名)。

hivesql 查询非纯数值的数据_java_07

The revised script that now pivots on captured by ID is shown in Figure 8. It can also be seen that after the pivoting of data, we then join back on user id to the User dimension to retrieve associated user full name.

现在,以ID捕获为基础的修订脚本如图8所示。 还可以看到,在数据透视之后,我们然后将用户ID重新加入到“用户”维度以检索关联的用户全名。

hivesql 查询非纯数值的数据_hivesql 查询非纯数值的数据_08

(Conclusion)

In this article we’ve had a look at available workarounds to pivoting using non-numeric fields. The key to pivoting using non-numeric fields is that you need to find a way to trick the aggregate function as it strictly takes in numeric values. Thus you ought to find a way to either find a numeric surrogate field (i.e. corresponding user id of user field) or derive such a field (i.e. converting date to integer) to be used as input parameters in the aggregate function.

在本文中,我们研究了使用非数字字段进行数据透视的可用解决方法。 使用非数字字段进行数据透视的关键是,您需要找到一种方法来欺骗聚合函数,因为它严格采用数字值。 因此,您应该找到一种方法来找到数字替代字段(即,用户字段的相应用户ID)或派生此类字段(即,将日期转换为整数)以用作聚合函数中的输入参数。

( Reference )

  • SQL Server Data Types SQL Server数据类型
  • Data Warehousing数据仓库
  • SQL Server CONVERT functionSQL Server CONVERT函数

翻译自: https://www.sqlshack.com/sql-server-pivoting-on-non-numeric-data-types/