串联/聚合字符串的最佳方法

我正在寻找一种将不同行的字符串聚合为单行的方法。我希望在许多不同的地方做到这一点,所以有一个功能,以促进这将是不错的。我已经尝试过使用 COALESCEFOR XML的解决方案,但它们就是不适合我。

字符串聚合将执行以下操作:

id | Name                    Result: id | Names
-- - ----                            -- - -----
1  | Matt                            1  | Matt, Rocks
1  | Rocks                           2  | Stylus
2  | Stylus

I've taken a look at CLR-defined aggregate functions as a replacement for COALESCE and FOR XML, but apparently SQL Azure 没有 support CLR-defined stuff, which is a pain for me because I know being able to use it would solve a whole lot of problems for me.

是否有任何可能的解决方案,或类似的最佳方法(这可能不是最佳的 CLR,但 我将采取什么,我可以得到) ,我可以用来聚合我的东西?

193632 次浏览

解决方案

The definition of optimal can vary, but here's how to concatenate strings from different rows using regular Transact SQL, which should work fine in Azure.

;WITH Partitioned AS
(
SELECT
ID,
Name,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
COUNT(*) OVER (PARTITION BY ID) AS NameCount
FROM dbo.SourceTable
),
Concatenated AS
(
SELECT
ID,
CAST(Name AS nvarchar) AS FullName,
Name,
NameNumber,
NameCount
FROM Partitioned
WHERE NameNumber = 1


UNION ALL


SELECT
P.ID,
CAST(C.FullName + ', ' + P.Name AS nvarchar),
P.Name,
P.NameNumber,
P.NameCount
FROM Partitioned AS P
INNER JOIN Concatenated AS C
ON P.ID = C.ID
AND P.NameNumber = C.NameNumber + 1
)
SELECT
ID,
FullName
FROM Concatenated
WHERE NameNumber = NameCount

解释

这种方法可以归结为三个步骤:

  1. 使用 OVERPARTITION分组对行进行编号,并根据串联的需要对它们进行排序。结果是 Partitioned CTE。我们保留每个分区中的行数,以便稍后过滤结果。

  2. 使用递归 CTE (Concatenated)迭代通过向 FullName列添加 Name值的行号(NameNumber列)。

  3. 过滤掉所有的结果,除了那些 NameNumber最高的。

请记住,为了使这个查询可预测,必须同时定义分组(例如,在您的场景中,具有相同 ID的行是连接的)和排序(我假设您只是在连接之前按字母顺序对字符串进行排序)。

我使用以下数据在 SQLServer2012上快速测试了该解决方案:

INSERT dbo.SourceTable (ID, Name)
VALUES
(1, 'Matt'),
(1, 'Rocks'),
(2, 'Stylus'),
(3, 'Foo'),
(3, 'Bar'),
(3, 'Baz')

查询结果:

ID          FullName
----------- ------------------------------
2           Stylus
3           Bar, Baz, Foo
1           Matt, Rocks

像下面这样使用 FORXMLPATH 的方法真的那么慢吗?Itzik Ben-Gan 在他的 T-SQL 查询书中写道,这种方法具有良好的性能(在我看来,Ben-Gan 先生是一个值得信赖的来源)。

create table #t (id int, name varchar(20))


insert into #t
values (1, 'Matt'), (1, 'Rocks'), (2, 'Stylus')


select  id
,Names = stuff((select ', ' + name as [text()]
from #t xt
where xt.id = t.id
for xml path('')), 1, 2, '')
from #t t
group by id

虽然@serge 的回答是正确的,但是我比较了他的方法和 xmlpath 的时间消耗,发现 xmlpath 更快。我来编写比较代码,你可以自己检查。 这是“ Serge 方式:

DECLARE @startTime datetime2;
DECLARE @endTime datetime2;
DECLARE @counter INT;
SET @counter = 1;


set nocount on;


declare @YourTable table (ID int, Name nvarchar(50))


WHILE @counter < 1000
BEGIN
insert into @YourTable VALUES (ROUND(@counter/10,0), CONVERT(NVARCHAR(50), @counter) + 'CC')
SET @counter = @counter + 1;
END


SET @startTime = GETDATE()


;WITH Partitioned AS
(
SELECT
ID,
Name,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
COUNT(*) OVER (PARTITION BY ID) AS NameCount
FROM @YourTable
),
Concatenated AS
(
SELECT ID, CAST(Name AS nvarchar) AS FullName, Name, NameNumber, NameCount FROM Partitioned WHERE NameNumber = 1


UNION ALL


SELECT
P.ID, CAST(C.FullName + ', ' + P.Name AS nvarchar), P.Name, P.NameNumber, P.NameCount
FROM Partitioned AS P
INNER JOIN Concatenated AS C ON P.ID = C.ID AND P.NameNumber = C.NameNumber + 1
)
SELECT
ID,
FullName
FROM Concatenated
WHERE NameNumber = NameCount


SET @endTime = GETDATE();


SELECT DATEDIFF(millisecond,@startTime, @endTime)
--Take about 54 milliseconds

这是 xmlpath 方式:

DECLARE @startTime datetime2;
DECLARE @endTime datetime2;
DECLARE @counter INT;
SET @counter = 1;


set nocount on;


declare @YourTable table (RowID int, HeaderValue int, ChildValue varchar(5))


WHILE @counter < 1000
BEGIN
insert into @YourTable VALUES (@counter, ROUND(@counter/10,0), CONVERT(NVARCHAR(50), @counter) + 'CC')
SET @counter = @counter + 1;
END


SET @startTime = GETDATE();


set nocount off
SELECT
t1.HeaderValue
,STUFF(
(SELECT
', ' + t2.ChildValue
FROM @YourTable t2
WHERE t1.HeaderValue=t2.HeaderValue
ORDER BY t2.ChildValue
FOR XML PATH(''), TYPE
).value('.','varchar(max)')
,1,2, ''
) AS ChildValues
FROM @YourTable t1
GROUP BY t1.HeaderValue


SET @endTime = GETDATE();


SELECT DATEDIFF(millisecond,@startTime, @endTime)
--Take about 4 milliseconds

SQL Server 2017中的 STRING_AGG()、 Azure SQL 和 PostgreSQL: https://www.postgresql.org/docs/current/static/functions-aggregate.html
Https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql

MySQL 中的 GROUP_CONCAT()
Http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat

(感谢@Brianjorden 和@milanio 的 Azure 更新)

示例代码:

select Id
, STRING_AGG(Name, ', ') Names
from Demo
group by Id

SQL Fiddle: http://sqlfiddle.com/#!18/89251/1

更新: MsSQLServer2017 + ,Azure SQL 数据库

你可以使用: STRING_AGG

对于 OP 的请求,用法非常简单:

SELECT id, STRING_AGG(name, ', ') AS names
FROM some_table
GROUP BY id

阅读更多

我原来没有回答的问题被正确地删除了(完整地留在下面) ,但是如果将来有人碰巧降落在这里,还是有好消息的。他们还在 Azure SQL 数据库中实现了 STRING _ AGG ()。这将提供本文最初要求的确切功能,并支持原生和内置功能。@ hrobky 之前提到过这是 SQLServer2016的一个特性。

老邮报: 这里没有足够的声誉直接回复@hrobky,但 STRING _ AGG 看起来很棒,但是它目前只能在 SQL Server 2016 vNext 中使用。希望它也能很快跟上 Azure SQL 数据库的步伐。.

可以使用 + = 连接字符串,例如:

declare @test nvarchar(max)
set @test = ''
select @test += name from names

如果您选择@test,它会将所有名称连接起来

我发现 Serge 的答案非常有希望,但是我也遇到了性能问题。但是,当我重新构造它以使用临时表而不包括双 CTE 表时,对于1000个合并记录,性能从1分40秒提高到次秒。对于需要在老版本的 SQLServer 上不使用 FORXML 进行此操作的任何人,这里提供了以下建议:

DECLARE @STRUCTURED_VALUES TABLE (
ID                 INT
,VALUE              VARCHAR(MAX) NULL
,VALUENUMBER        BIGINT
,VALUECOUNT         INT
);


INSERT INTO @STRUCTURED_VALUES
SELECT   ID
,VALUE
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY VALUE) AS VALUENUMBER
,COUNT(*) OVER (PARTITION BY ID)    AS VALUECOUNT
FROM    RAW_VALUES_TABLE;


WITH CTE AS (
SELECT   SV.ID
,SV.VALUE
,SV.VALUENUMBER
,SV.VALUECOUNT
FROM    @STRUCTURED_VALUES SV
WHERE   VALUENUMBER = 1


UNION ALL


SELECT   SV.ID
,CTE.VALUE + ' ' + SV.VALUE AS VALUE
,SV.VALUENUMBER
,SV.VALUECOUNT
FROM    @STRUCTURED_VALUES SV
JOIN    CTE
ON  SV.ID = CTE.ID
AND SV.VALUENUMBER = CTE.VALUENUMBER + 1


)
SELECT   ID
,VALUE
FROM    CTE
WHERE   VALUENUMBER = VALUECOUNT
ORDER BY ID
;

Try this, i use it in my projects

DECLARE @MetricsList NVARCHAR(MAX);


SELECT @MetricsList = COALESCE(@MetricsList + '|', '') + QMetricName
FROM #Questions;