目录

  • 1. 需求背景
  • 2. 最终实现语句
  • 3. 思路
  • 3.1 观察表结构
  • 3.2 能否直接Inner Join两张表呢?
  • 3.3 ROW_NUMBER() 函数登场
  • 3.4 取模(Modulo)函数登场
  • 3.4.1 核心实现
  • 3.5 FOR XML PATH 函数登场
  • 3.6 STUFF函数登场
  • 3.7 RIGHT函数登场
  • 4. 用到的SQL技术
  • 4.1 分区分组排名函数
  • 4.1.1 原始数据
  • 4.1.2 Group by的数据
  • 4.1.3 ROW_NUMBER() OVER(PARTITION BY col ORDER BY col)

1. 需求背景

背景

  • 跨境电商在做Ship-confirm的时候,需要带上Tracking Number(快递单号,下文简称Trk#);
  • 通常会发生一个订单需要带上多个Trk#s的情况;

现在业务针对同一个PORequest内的R#Trk#,在批量处理页面,做如下要求:

  1. 如果没有TRK#生成,或者TRK#都是Pending状态,则该R#对应的TRK栏位不展示;
  2. 如果有TRK#生成,则TRK#会逐个被分配给每一个R#,直至所有的TRK#都被R#分配;

执行完2之后,会产生三种情况:

  1. R#sTRKs刚好一一对应,则无需做任何进一步的处理;

R#

TRK#

R1

TRK1

R2

TRK2

R3

TRK3

R4

TRK4

  1. 如果还有R#s没有被分配到,则随机取之前已经被分配过的任意一个TRK#分配给这些R#

R#

TRK#

R1

TRK1

R2

TRK2

R3

TRK1

R4

TRK1

  1. 如果所有R#s都被分配完,还有剩余的TRKs,则继续开始新一轮的分配,直至所有的TRKs都被分配完毕;多个TRKs用逗号隔开。

R#

TRK#

R1

TRK1,TRK5

R2

TRK2,TRK6

R3

TRK3,TRK7

R4

TRK4

2. 最终实现语句

SELECT DISTINCT(PORID) INTO #PORIDs FROM T_POReplaceDetails WHERE REFPO IN @Rs

 SELECT PORID, MIN(Cou_) AS Cou_  INTO #Data1
     FROM
        (
            SELECT PORID,  COUNT(1) AS Cou_  FROM dbo.POBreakdown
            WHERE PORID IN ( SELECT PORID FROM #PORIDs )
                  AND GenerateDate IS NOT NULL
            GROUP BY PORID
         UNION
            SELECT PORID, COUNT(1) FROM dbo.T_POReplaceDetails
            WHERE PORID IN  ( SELECT PORID FROM #PORIDs )
            GROUP BY PORID
        ) AS d
        GROUP BY PORID

 SELECT a.PORID, b.REFPO, a.TRACKINGNUM
    INTO #Data23
    FROM
    (
        SELECT a.PORID,TRACKINGNUM,
               ROW_NUMBER() OVER (PARTITION BY a.PORID ORDER BY POBID) % b.Cou_ AS Rows FROM dbo.POBreakdown AS a
        INNER JOIN #Data1 AS b ON a.PORID = b.PORID
        WHERE GenerateDate IS NOT NULL AND a.PORID IN  ( SELECT PORID FROM #PORIDs )
    ) AS a
   
    INNER JOIN
    (
         SELECT a.PORID, REFPO,
                 ROW_NUMBER() OVER (PARTITION BY a.PORID ORDER BY REFPO) % b.Cou_ AS Rows FROM dbo.T_POReplaceDetails AS a
          INNER JOIN #Data1 AS b ON a.PORID = b.PORID
          WHERE a.PORID IN  ( SELECT PORID FROM #PORIDs )
     ) AS b
     ON a.PORID = b.PORID AND a.Rows = b.Rows;

SELECT * INTO #Trks FROM
( 
    SELECT PORID, REFPO,
       STUFF( (SELECT ',' + TRACKINGNUM FROM #Data23 t
           WHERE t.PORID = t1.PORID AND t.REFPO = t1.REFPO AND t.TRACKINGNUM <> 'pending' 
           FOR XML PATH('') ), 1, 1,'') AS TRK
    FROM #Data23 AS t1
    WHERE t1.TRACKINGNUM <> 'pending' 
    GROUP BY PORID, REFPO
) x

SELECT trk.PORID,
       trk.REFPO,
       trk.TRK AS Tracking,
       pob.SHIPVIA,
       pob.SHIPDATE AS Shipdate
FROM #Trks trk  INNER JOIN POBreakdown pob ON trk.PORID = pob.PORID
GROUP BY trk.PORID,trk.REFPO, trk.TRK, pob.SHIPVIA, pob.SHIPDATE;

3. 思路

3.1 观察表结构

观察两张表结构:

SELECT PORID,REFPO FROM dbo.T_POReplaceDetails WHERE PORID IN ( 660693, 660488 )

dbo.T_POReplaceDetails:

PORID

REFPO

660488

SF5613494-R-281204

660693

CS207743672-R-6810241

660693

CS207743672-R-6810240

660693

CS207743672-R-6810239

660693

CS207743672-R-6810238

660693

CS207743672-R-6810242

660693

CS207743672-R-6810243

660693

CS207743672-R-6810246

660693

CS207743672-R-6810244

dbo.POBreakdown:

PORID

TRACKINGNUM

660488

660488

pending

660693

794637508332

660693

794637509980

660693

794637511019

660693

pending

660693

pending

首先能想到的是dbo.T_POReplaceDetails这张表上的REFPO字段对应需要取到dbo.POBreakdown上的TRK#字段,得到的最终类似于这样一个效果:

PORID

REFPO

连接

PORID

TRACKINGNUM

660488

SF5613494-R-281204

<=>

660488

660693

CS207743672-R-6810241

<=>

660488

pending

660693

CS207743672-R-6810240

<=>

660693

794637508332

660693

CS207743672-R-6810239

<=>

660693

794637509980

660693

CS207743672-R-6810238

<=>

660693

794637511019

660693

CS207743672-R-6810242

<=>

660693

pending

660693

CS207743672-R-6810243

<=>

660693

pending

660693

CS207743672-R-6810246

<=>

660693

CS207743672-R-6810244

<=>

3.2 能否直接Inner Join两张表呢?

直接用通过PORID对两张表做Inner Join不行,相同的PORID和不同的REFPO+Trk之间会做交叉合并:

SELECT * FROM
(
    SELECT PORID,REFPO FROM dbo.T_POReplaceDetails
    WHERE PORID IN ( 660693, 660488 )

) AS x
    INNER JOIN
    (
        SELECT a.PORID,TRACKINGNUM FROM dbo.POBreakdown AS a
        WHERE a.PORID IN ( 660693, 660488 )
    ) AS y
        ON x.PORID = y.PORID
		ORDER BY x.REFPO

而期望的效果是 CS207743672-R-6810238有且只展示一条,后面的TRK如果全部分配一轮过后,有剩余再用逗号隔开叠加上去。

left join外键关联索引失效_left join外键关联索引失效

3.3 ROW_NUMBER() 函数登场

这时候就想到需要给2两张表进行分组排名,然后两边按照相同的排名进行连接:

SELECT * FROM
(
    SELECT PORID,REFPO
	,ROW_NUMBER() OVER(PARTITION BY PORID ORDER BY REFPO) AS Row#
	FROM dbo.T_POReplaceDetails
    WHERE PORID IN ( 660693, 660488 )

) AS x
    INNER JOIN
    (
        SELECT PORID,TRACKINGNUM as TRK#
		, ROW_NUMBER() OVER (PARTITION BY PORID ORDER BY POBID ) AS Row#
		FROM dbo.POBreakdown 
        WHERE  PORID IN ( 660693, 660488 ) 
    ) AS y
        ON x.PORID = y.PORID AND y.Row#= x.Row#

PORID

REFPO

Row#

PORID

TRK#

Row#

660488

SF5613494-R-281204

1

660488

1

660693

CS207743672-R-6810238

1

660693

794637508332

1

660693

CS207743672-R-6810239

2

660693

794637509980

2

660693

CS207743672-R-6810240

3

660693

794637511019

3

660693

CS207743672-R-6810241

4

660693

pending

4

660693

CS207743672-R-6810242

5

660693

pending

5

到目前为止,看数据结果是OK的,似乎达到了所要的目的。

不过为了严谨性,我们还需要对TRK数量多于REFPO数量的情况下做测试。为此我们先删除掉一些REFPO,再新增一些TRK#,最终得到如下的结果:

left join外键关联索引失效_数据_02

left join外键关联索引失效_XML_03

然而我们在执行上述的脚本观察到,虽然每个REFPO都获取到了TRK,但是剩余的6和7的TRK并未重新分配。

PORID

REFPO

Row#

PORID

TRK#

Row#

660488

SF5613494-R-281204

1

660488

1

660693

CS207743672-R-6810238

1

660693

794637508332

1

660693

CS207743672-R-6810239

2

660693

794637509980

2

660693

CS207743672-R-6810240

3

660693

794637511019

3

660693

CS207743672-R-6810241

4

660693

73742134

4

660693

CS207743672-R-6810242

5

660693

pending

5

3.4 取模(Modulo)函数登场

为了方便观察,我们暂时先把PORID = 660488数据移除掉:

SELECT ROW_NUMBER() OVER(PARTITION BY PORID ORDER BY REFPO) AS RowNum,PORID,REFPO
 ,ROW_NUMBER() OVER(PARTITION BY PORID ORDER BY REFPO) % 5 AS RowMod
FROM dbo.T_POReplaceDetails
WHERE PORID IN ( 660693)

RowNum

PORID

REFPO

RowMod

1

660693

CS207743672-R-6810238

1

2

660693

CS207743672-R-6810239

2

3

660693

CS207743672-R-6810240

3

4

660693

CS207743672-R-6810241

4

5

660693

CS207743672-R-6810242

0

SELECT ROW_NUMBER() OVER (PARTITION BY PORID ORDER BY POBID ) AS RowNum, PORID,TRACKINGNUM as TRK
, ROW_NUMBER() OVER (PARTITION BY PORID ORDER BY POBID ) % 5 AS RowMod
FROM dbo.POBreakdown 
WHERE  PORID IN ( 660693)

RowNum

PORID

TRK

RowMod

1

660693

794637508332

1

2

660693

794637509980

2

3

660693

794637511019

3

4

660693

73742134

4

5

660693

pending

0

6

660693

8872621314

1

7

660693

88990564

2

Inner Join两张表的数据到临时表#Trks中:

SELECT x.PORID,x.REFPO,x.RowMod,y.TRK  INTO #TRKs  FROM
(
    SELECT ROW_NUMBER() OVER(PARTITION BY PORID ORDER BY REFPO) AS RowNum,PORID,REFPO
	 ,ROW_NUMBER() OVER(PARTITION BY PORID ORDER BY REFPO) % 5 AS RowMod
    FROM dbo.T_POReplaceDetails
    WHERE PORID IN ( 660693 )

) AS x
    INNER JOIN
    (
        SELECT ROW_NUMBER() OVER (PARTITION BY PORID ORDER BY POBID ) AS RowNum, PORID,TRACKINGNUM as TRK
		, ROW_NUMBER() OVER (PARTITION BY PORID ORDER BY POBID ) % 5 AS RowMod
        FROM dbo.POBreakdown 
        WHERE  PORID IN ( 660693) 
    ) AS y
        ON x.PORID = y.PORID AND y.RowMod = x.RowMod
		ORDER BY x.REFPO

SELECT * FROM #TRKs

PORID

REFPO

RowMod

两边余数

TRK

660693

CS207743672-R-6810238

1

794637508332

660693

CS207743672-R-6810238

1

8872621314

660693

CS207743672-R-6810239

2

794637509980

660693

CS207743672-R-6810239

2

88990564

660693

CS207743672-R-6810240

3

794637511019

660693

CS207743672-R-6810241

4

73742134

660693

CS207743672-R-6810242

0

pending

3.4.1 核心实现

这里是本业务实现的算法关键所在:通过对两张表数据的取模,然后对两张表的相同的余数和组ID(PORID)进行关联,从而实现REFPO和TRK的均匀分配。

被取模的数据取决于两边分组count数最小的一方。

比如以上述数据为例,同一个组别内(PORID=660693 )REFPO总计有5个(38 ~ 42);而TRK#总计有7个; 这时候需要对两边取模的话,被除数必须是两边当中最小的那个数,也就是5。

这样子取模才能确保:

  • 6%5=1: 位于第6位的TRK会被分配到第1个REFPO
  • 7%5=2: 位于第7位的TRK会被分配给第2个REFPO

而反过来,如果是REFPO多的话,也能确保多余的REFPO获取到(虽然是重复的)TRK

如果对这段话不明白,可以翻上去就数据取模列进行观察。

到目前为止,可以看到在进行第一轮的TRK分配之后,每个REFPO都拿到了TRK;经常第一轮分配,还剩下多余的TRK没有分配,因此又会进行第二轮的分配,以此类推,直至所有的TRK都以此被分配完。

但是分配到一个以上TRK的REFPO,又以多了的形式展示了出来。问题似乎又回到了3.2环节上了。但是这里与3.2的Inner Join结果不同的是,这里的每个REFPO的TRKs(叠加起来)都不会相同,比如CS207743672-R-6810238的TRK是794637508332、8872621314;而3.2的是每个REFPOTRK都相同。

那么有没有一种办法,让REFPO只保留一个,而对应的多个TRKs合并,然后用逗号隔开呢?

这时候就需要SQL 的 XML FOR PATH函数登场了。

3.5 FOR XML PATH 函数登场

使用 XML FOR PATH函数配合 Group By 函数:

SELECT PORID,REFPO,
(select ',' + t.TRK from #TRKs t where t.PORID = t1.PORID AND 
    t.REFPO = t1.REFPO FOR xml path(''))  AS TRK
FROM #TRKs AS t1
GROUP BY PORID,REFPO

PORID

REFPO

TRK

660693

CS207743672-R-6810238

,794637508332,8872621314

660693

CS207743672-R-6810239

,794637509980,88990564

660693

CS207743672-R-6810240

,794637511019

660693

CS207743672-R-6810241

,73742134

660693

CS207743672-R-6810242

,pending

到目前为止大体接近业务需要,至于TRK列前面出现的多余的逗号,可以使用STUFF函数或LEFT/RIGHT函数去除。

3.6 STUFF函数登场

SELECT PORID,REFPO,
STUFF((select ',' + t.TRK from #TRKs t where t.PORID = t1.PORID AND 
    t.REFPO = t1.REFPO FOR xml path('')),1,1,'厚礼蟹 ')  AS TRK
FROM #TRKs AS t1
GROUP BY PORID,REFPO

PORID

REFPO

TRK

660693

CS207743672-R-6810238

厚礼蟹 794637508332,8872621314

660693

CS207743672-R-6810239

厚礼蟹 794637509980,88990564

660693

CS207743672-R-6810240

厚礼蟹 794637511019

660693

CS207743672-R-6810241

厚礼蟹 73742134

660693

CS207743672-R-6810242

厚礼蟹 pending

3.7 RIGHT函数登场

SELECT PORID,REFPO,RIGHT(x.TRK,LEN(x.TRK)-1) AS TRK FROM 
(
	SELECT PORID,REFPO,
	(select ',' + t.TRK from #TRKs t where t.PORID = t1.PORID AND  t.REFPO = t1.REFPO FOR xml path(''))  AS TRK
	FROM #TRKs AS t1
	GROUP BY PORID,REFPO
) x

PORID

REFPO

TRK

660693

CS207743672-R-6810238

794637508332,8872621314

660693

CS207743672-R-6810239

794637509980,88990564

660693

CS207743672-R-6810240

794637511019

660693

CS207743672-R-6810241

73742134

660693

CS207743672-R-6810242

pending

**RIGTH:返回字符串从右边开始指定个数字符: **

RIGHT(character_expression,integer_expression);

LEFT:返回字符串中从左边开始指定个数字符。

LEFT(character_expression,integer_expression);

4. 用到的SQL技术

  • ROW_NUMBER() OVER(PARTITION BY COLUMN ORDER BY COLUMN) (分组(分区)排名函数)
  • Union 函数
  • 取模方法
  • SQL STUFF方法
  • SQL FOR XML PATH方法

4.1 分区分组排名函数

partition by:关键字是分析性函数的一部分,它和聚合函数(如group by)不同的地方在于它能返回一个分组中的多条记录,而聚合函数一般只有一条反映统计值的记录;

partition by用于给结果集分组,如果没有指定那么它把整个结果集作为一个分组。

partition bygroup by不同之处在于前者返回的是分组里的每一条数据,并且可以对分组数据进行排序操作。后者只能返回聚合之后的组的数据统计值的记录。

以下面两列数据为例来解释说明:

4.1.1 原始数据

SELECT a.PORID, TRACKINGNUM FROM dbo.POBreakdown AS a
WHERE a.PORID IN ( 660693, 660488 )

PORID

TRACKINGNUM

660488

660488

pending

660693

794637508332

660693

794637509980

660693

794637511019

660693

pending

660693

pending

4.1.2 Group by的数据

SELECT a.PORID, TRACKINGNUM FROM dbo.POBreakdown AS a
WHERE a.PORID IN ( 660693, 660488 )
GROUP BY  a.PORID,TRACKINGNUM;

PORID

TRACKINGNUM

660488

660488

pending

660693

794637508332

660693

794637509980

660693

794637511019

660693

pending

4.1.3 ROW_NUMBER() OVER(PARTITION BY col ORDER BY col)

SELECT a.PORID, TRACKINGNUM,
	ROW_NUMBER() OVER (PARTITION BY a.PORID ORDER BY a.POBID) AS RowNumber
FROM dbo.POBreakdown AS a
WHERE a.PORID IN ( 660693, 660488 )

PORID

TRACKINGNUM

RowNumber

660488

1

660488

pending

2

660693

794637508332

1

660693

794637509980

2

660693

794637511019

3

660693

pending

4

660693

pending

5

相比前面,可以看到PORID=660693TrackingNum = pending这条路数据并没有被合并,而是都展示出来,只是RowNumber不一样(分别是4,5)。

这就是:partition by 与group by不同之处在于前者返回的是分组里的每一条数据,并且可以对分组数据进行排序操作。后者只能返回聚合之后的组的数据统计值的记录 的意思。