如何计算 SQL varchar 中某个子字符串的出现次数?

我有一个列,其值的格式是 a,b,c,d。在 T-SQL 中,有没有一种方法可以计算该值中逗号的数目?

279388 次浏览

我想到的第一种方法是间接地用空字符串替换逗号并比较长度

Declare @string varchar(1000)
Set @string = 'a,b,c,d'
select len(@string) - len(replace(@string, ',', ''))

您可以将字符串的长度与删除逗号后的长度进行比较:

len(value) - len(replace(value,',',''))

也许您不应该以这种方式存储数据。在字段中存储逗号分隔的列表是一种不好的做法。IT 对于查询来说是非常低效的。这应该是一个相关的表。

DECLARE @records varchar(400)
SELECT @records = 'a,b,c,d'
select  LEN(@records) as 'Before removing Commas' , LEN(@records) - LEN(REPLACE(@records, ',', '')) 'After Removing Commans'

Cmsjr 答案的快速扩展,适用于具有多个字符的字符串。

CREATE FUNCTION dbo.CountOccurrencesOfString
(
@searchString nvarchar(max),
@searchTerm nvarchar(max)
)
RETURNS INT
AS
BEGIN
return (LEN(@searchString)-LEN(REPLACE(@searchString,@searchTerm,'')))/LEN(@searchTerm)
END

用法:

SELECT * FROM MyTable
where dbo.CountOccurrencesOfString(MyColumn, 'MyString') = 1

可以使用以下存储过程来获取值。

IF  EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[sp_parsedata]') AND type in (N'P', N'PC'))
DROP PROCEDURE [dbo].[sp_parsedata]
GO
create procedure sp_parsedata
(@cid integer,@st varchar(1000))
as
declare @coid integer
declare @c integer
declare @c1 integer
select @c1=len(@st) - len(replace(@st, ',', ''))
set @c=0
delete from table1 where complainid=@cid;
while (@c<=@c1)
begin
if (@c<@c1)
begin
select @coid=cast(replace(left(@st,CHARINDEX(',',@st,1)),',','') as integer)
select @st=SUBSTRING(@st,CHARINDEX(',',@st,1)+1,LEN(@st))
end
else
begin
select @coid=cast(@st as integer)
end
insert into table1(complainid,courtid) values(@cid,@coid)
set @c=@c+1
end
Declare @string varchar(1000)


DECLARE @SearchString varchar(100)


Set @string = 'as as df df as as as'


SET @SearchString = 'as'


select ((len(@string) - len(replace(@string, @SearchString, ''))) -(len(@string) -
len(replace(@string, @SearchString, ''))) % 2)  / len(@SearchString)

Replace/Len 测试很可爱,但可能效率很低(特别是在内存方面)。 一个带有循环的简单函数就可以完成这项工作。

CREATE FUNCTION [dbo].[fn_Occurences]
(
@pattern varchar(255),
@expression varchar(max)
)
RETURNS int
AS
BEGIN


DECLARE @Result int = 0;


DECLARE @index BigInt = 0
DECLARE @patLen int = len(@pattern)


SET @index = CHARINDEX(@pattern, @expression, @index)
While @index > 0
BEGIN
SET @Result = @Result + 1;
SET @index = CHARINDEX(@pattern, @expression, @index + @patLen)
END


RETURN @Result


END

我觉得 Darrel Lee 的答案很不错。将 CHARINDEX()替换为 PATINDEX(),您也可以沿着字符串进行一些弱 regex搜索..。

比如说,你把这个用在 @pattern上:

set @pattern='%[-.|!,'+char(9)+']%'

你为什么要做这种疯狂的事?

假设您正在将分隔的文本字符串加载到一个暂存表中,其中保存数据的字段类似于 varchar (8000)或 nvarchar (max) ..。

有时候对数据进行 ELT (提取-加载-转换)比对 ETL (提取-转换-加载)更容易/更快,一种方法是将分隔的记录按原样加载到一个临时表中,特别是如果您可能想要一种更简单的方法来查看异常记录,而不是将它们作为 SSIS 包的一部分处理的话... ... 但这是一场针对不同线程的圣战。

在某些情况下,@csmjr 给出的答案有问题。

他的回答是这样做:

Declare @string varchar(1000)
Set @string = 'a,b,c,d'
select len(@string) - len(replace(@string, ',', ''))

这在大多数情况下都有效,但是,请尝试运行以下命令:

DECLARE @string VARCHAR(1000)
SET @string = 'a,b,c,d ,'
SELECT LEN(@string) - LEN(REPLACE(@string, ',', ''))

由于某种原因,REPLACE 去掉了最后一个逗号,但也去掉了它前面的空格(不知道为什么)。当您期望4时,这将导致返回值为5。下面是另一种方法,即使在这种特殊情况下也能奏效:

DECLARE @string VARCHAR(1000)
SET @string = 'a,b,c,d ,'
SELECT LEN(REPLACE(@string, ',', '**')) - LEN(@string)

请注意,您不需要使用星号。任何两个字符的替换都可以。其思想是,对于正在计数的字符的每个实例,将字符串加长一个字符,然后减去原始字符的长度。这基本上是原始答案的相反方法没有奇怪的修剪副作用。

基于@Andrew 的解决方案,使用一个非过程化的表值函数和 CROSS APPY,您将获得更好的性能:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
/*  Usage:
SELECT t.[YourColumn], c.StringCount
FROM YourDatabase.dbo.YourTable t
CROSS APPLY dbo.CountOccurrencesOfString('your search string',     t.[YourColumn]) c
*/
CREATE FUNCTION [dbo].[CountOccurrencesOfString]
(
@searchTerm nvarchar(max),
@searchString nvarchar(max)


)
RETURNS TABLE
AS
RETURN
SELECT (DATALENGTH(@searchString)-DATALENGTH(REPLACE(@searchString,@searchTerm,'')))/NULLIF(DATALENGTH(@searchTerm), 0) AS StringCount

公认的答案是正确的, 将其扩展为在子字符串中使用2个或更多字符:

Declare @string varchar(1000)
Set @string = 'aa,bb,cc,dd'
Set @substring = 'aa'
select (len(@string) - len(replace(@string, @substring, '')))/len(@substring)

对于单个字符和多个字符搜索,应该采用以下技巧:

CREATE FUNCTION dbo.CountOccurrences
(
@SearchString VARCHAR(1000),
@SearchFor    VARCHAR(1000)
)
RETURNS TABLE
AS
RETURN (
SELECT COUNT(*) AS Occurrences
FROM   (
SELECT ROW_NUMBER() OVER (ORDER BY O.object_id) AS n
FROM   sys.objects AS O
) AS N
JOIN (
VALUES (@SearchString)
) AS S (SearchString)
ON
SUBSTRING(S.SearchString, N.n, LEN(@SearchFor)) = @SearchFor
);
GO


---------------------------------------------------------------------------------------
-- Test the function for single and multiple character searches
---------------------------------------------------------------------------------------
DECLARE @SearchForComma      VARCHAR(10) = ',',
@SearchForCharacters VARCHAR(10) = 'de';


DECLARE @TestTable TABLE
(
TestData VARCHAR(30) NOT NULL
);


INSERT INTO @TestTable
(
TestData
)
VALUES
('a,b,c,de,de ,d e'),
('abc,de,hijk,,'),
(',,a,b,cde,,');


SELECT TT.TestData,
CO.Occurrences AS CommaOccurrences,
CO2.Occurrences AS CharacterOccurrences
FROM   @TestTable AS TT
OUTER APPLY dbo.CountOccurrences(TT.TestData, @SearchForComma) AS CO
OUTER APPLY dbo.CountOccurrences(TT.TestData, @SearchForCharacters) AS CO2;

这个函数可以使用一个数字表(dbo.Nums)进行简化:

   RETURN (
SELECT COUNT(*) AS Occurrences
FROM   dbo.Nums AS N
JOIN (
VALUES (@SearchString)
) AS S (SearchString)
ON
SUBSTRING(S.SearchString, N.n, LEN(@SearchFor)) = @SearchFor
);

如果我们知道 LEN 和空间有限制,为什么我们不能先替换空间? 那么我们知道没有空间来混淆 LEN。

len(replace(@string, ' ', '-')) - len(replace(replace(@string, ' ', '-'), ',', ''))

使用这个代码,它工作得很好。 我已经创建了一个 sql 函数,它接受两个参数,第一个参数是我们想要搜索到的长字符串,它可以接受字符串长度最多为1500个字符(当然,您可以扩展它,甚至可以将它更改为 text datatype)。 第二个参数是我们想要计算出现次数的子字符串(它的长度最多可达200个字符,当然您可以根据需要更改它)。而输出是一个整数,代表频率的数量... ... 请欣赏吧。


CREATE FUNCTION [dbo].[GetSubstringCount]
(
@InputString nvarchar(1500),
@SubString NVARCHAR(200)
)
RETURNS int
AS
BEGIN
declare @K int , @StrLen int , @Count int , @SubStrLen int
set @SubStrLen = (select len(@SubString))
set @Count = 0
Set @k = 1
set @StrLen =(select len(@InputString))
While @K <= @StrLen
Begin
if ((select substring(@InputString, @K, @SubStrLen)) = @SubString)
begin
if ((select CHARINDEX(@SubString ,@InputString)) > 0)
begin
set @Count = @Count +1
end
end
Set @K=@k+1
end
return @Count
end

最后,我编写了这个函数,它应该涵盖所有可能的情况,在输入中添加一个字符前缀和后缀。这个字符被评估为不同于搜索参数中包含的任何字符,因此它不会影响结果。

CREATE FUNCTION [dbo].[CountOccurrency]
(
@Input nvarchar(max),
@Search nvarchar(max)
)
RETURNS int AS
BEGIN
declare @SearhLength as int = len('-' + @Search + '-') -2;
declare @conteinerIndex as int = 255;
declare @conteiner as char(1) = char(@conteinerIndex);
WHILE ((CHARINDEX(@conteiner, @Search)>0) and (@conteinerIndex>0))
BEGIN
set @conteinerIndex = @conteinerIndex-1;
set @conteiner = char(@conteinerIndex);
END;
set @Input = @conteiner + @Input + @conteiner
RETURN (len(@Input) - len(replace(@Input, @Search, ''))) / @SearhLength
END

用途

select dbo.CountOccurrency('a,b,c,d ,', ',')
Declare @MainStr nvarchar(200)
Declare @SubStr nvarchar(10)
Set @MainStr = 'nikhildfdfdfuzxsznikhilweszxnikhil'
Set @SubStr = 'nikhil'
Select (Len(@MainStr) - Len(REPLACE(@MainStr,@SubStr,'')))/Len(@SubStr)

在 SQL2017或更高版本中,可以使用以下命令:

declare @hits int = 0
set @hits = (select value from STRING_SPLIT('F609,4DFA,8499',','));
select count(@hits)

这个 T-SQL 代码查找并打印句子中出现的所有@p 模式。之后你可以对这个句子进行任何处理。

declare @old_hit int = 0
declare @hit int = 0
declare @i int = 0
declare @s varchar(max)='alibcalirezaalivisualization'
declare @p varchar(max)='ali'
while @i<len(@s)
begin
set @hit=charindex(@p,@s,@i)
if @hit>@old_hit
begin
set @old_hit =@hit
set @i=@hit+1
print @hit
end
else
break
end

结果是: 1 6 13 20

我最后用的是 CTE 表,

CREATE TABLE #test (
[id] int,
[field] nvarchar(500)
)


INSERT INTO #test ([id], [field])
VALUES (1, 'this is a test string http://url, and https://google.com'),
(2, 'another string, hello world http://example.com'),
(3, 'a string with no url')


SELECT *
FROM #test


;WITH URL_count_cte ([id], [url_index], [field])
AS
(
SELECT [id], CHARINDEX('http', [field], 0)+1 AS [url_index], [field]
FROM #test AS [t]
WHERE CHARINDEX('http', [field], 0) != 0
UNION ALL
SELECT [id], CHARINDEX('http', [field], [url_index])+1 AS [url_index], [field]
FROM URL_count_cte
WHERE CHARINDEX('http', [field], [url_index]) > 0
)


-- total urls
SELECT COUNT(1)
FROM URL_count_cte


-- urls per row
SELECT [id], COUNT(1) AS [url_count]
FROM URL_count_cte
GROUP BY [id]

使用这个函数,您可以得到文本中单词的重复次数。

/****** Object:  UserDefinedFunction [dbo].[fn_getCountKeywords]    Script Date: 22/11/2021 17:52:00 ******/
DROP FUNCTION IF EXISTS [dbo].[fn_getCountKeywords]
GO
/****** Object:  UserDefinedFunction [dbo].[fn_getCountKeywords]    Script Date: 2211/2021 17:52:00 ******/
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:      m_Khezrian
-- Create date: 2021/11/22-17:52
-- Description: Return Count Keywords In Input Text
-- =============================================


Create OR Alter Function [dbo].[fn_getCountKeywords]
(@Text      nvarchar(max)
,@Keywords  nvarchar(max)
)
RETURNS @Result TABLE
(
[ID]         int Not Null IDENTITY PRIMARY KEY
,[Keyword]    nvarchar(max) Not Null
,[Cnt]        int Not Null Default(0)


)
/*With ENCRYPTION*/ As
Begin
Declare @Key    nvarchar(max);
Declare @Cnt    int;
Declare @I      int;


Set @I = 0 ;
--Set @Text = QUOTENAME(@Text);


Insert Into @Result
([Keyword])
Select Trim([value])
From String_Split(@Keywords,N',')
Group By [value]
Order By Len([value]) Desc;


Declare CntKey_Cursor Insensitive Cursor For
Select [Keyword]
From @Result
Order By [ID];


Open CntKey_Cursor;
Fetch Next From CntKey_Cursor Into @Key;
While (@@Fetch_STATUS = 0) Begin
Set @Cnt = 0;


While (PatIndex(N'%'+@Key+'%',@Text) > 0) Begin
Set @Cnt += 1;
Set @I += 1 ;
Set @Text = Stuff(@Text,PatIndex(N'%'+@Key+'%',@Text),len(@Key),N'{'+Convert(nvarchar,@I)+'}');
--Set @Text = Replace(@Text,@Key,N'{'+Convert(nvarchar,@I)+'}');
End--While


Update @Result
Set [Cnt] = @Cnt
Where ([Keyword] = @Key);


Fetch Next From CntKey_Cursor Into @Key;
End--While
Close CntKey_Cursor;
Deallocate CntKey_Cursor;
Return
End
GO


--Test
Select *
From dbo.fn_getCountKeywords(
N'<U+0001F4E3> MARKET IMPACT Euro area Euro CPIarea annual inflation up to 3.0% MaCPIRKET forex'
,N'CPI ,core,MaRKET , Euro area'
)


Go

根据最佳答案和其他答案改进版本:

用分隔符包装字符串可以确保 LEN 正常工作。如果替换字符串比匹配字符串长一个字符,则不需要除法。

CREATE FUNCTION dbo.MatchCount(@value nvarchar(max), @match  nvarchar(max))
RETURNS int
BEGIN
RETURN LEN('[' + REPLACE(@value,@match,REPLICATE('*', LEN('[' + @match + ']') - 1)) + ']') - LEN('['+@value+']')
END

参考资料 https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver15

例如:

SELECT  s.*
,s.[Number1] - (SELECT COUNT(Value)
FROM string_split(s.[StringColumn],',')
WHERE RTRIM(VALUE) <> '')
FROM TableName AS s

适用于: SQL Server 2016(13.x)及更高版本