如何计算字符串(实际上是char)在字符串中的出现次数?

我正在做一些事情,我意识到我想数一数我能在一个字符串中找到多少个/,然后我突然想到,有几种方法可以做到这一点,但无法决定什么是最好的(或最简单的)。

目前,我正在使用类似的东西:

string source = "/once/upon/a/time/";int count = source.Length - source.Replace("/", "").Length;

但我一点也不喜欢,有人要吗?

我真的不想为此挖掘RegEx,是吗?

我知道我的字符串会有我正在搜索的术语,所以你可以假设…

当然对于字符串在哪里长度>1

string haystack = "/once/upon/a/time";string needle = "/";int needleCount = ( haystack.Length - haystack.Replace(needle,"").Length ) / needle.Length;
853022 次浏览
string source = "/once/upon/a/time/";int count = 0;foreach (char c in source)if (c == '/') count++;

必须比source.Replace()本身更快。

LINQ适用于所有集合,由于字符串只是字符的集合,这个漂亮的单行代码怎么样:

var count = source.Count(c => c == '/');

确保代码文件的顶部有using System.Linq;,因为.Count是该命名空间的扩展方法。

如果您使用的是. NET 3.5,您可以在带有LINQ的单行程序中执行此操作:

int count = source.Count(f => f == '/');

如果你不想使用LINQ,你可以这样做:

int count = source.Split('/').Length - 1;

你可能会惊讶地发现,你的原始技术似乎比这两种方法都快30%左右!我刚刚用“/once/on/a/time/”做了一个快速基准测试,结果如下:

您的原始=12s
来源. Count=19s
拆分成17s
Foreach(从bobwienholt的回答)=10s

(泰晤士报是50,000,000次迭代,所以你不太可能注意到现实世界中的太大差异。

这两个都只适用于单字符搜索词…

countOccurences("the", "the answer is the answer");
int countOccurences(string needle, string haystack){return (haystack.Length - haystack.Replace(needle,"").Length) / needle.Length;}

可能会对更长的针更好…

但必须有一个更优雅的方式:)

如果您希望能够搜索整个字符串,而不仅仅是字符:

src.Select((c, i) => src.Substring(i)).Count(sub => sub.StartsWith(target))

读作“对于字符串中的每个字符,将从该字符开始的字符串的其余部分作为子字符串;如果它以目标字符串开头,则对其进行计数。”

编辑:

source.Split('/').Length-1
int count = new Regex(Regex.Escape(needle)).Matches(haystack).Count;
string s = "65 fght 6565 4665 hjk";int count = 0;foreach (Match m in Regex.Matches(s, "65"))count++;
private int CountWords(string text, string word) {int count = (text.Length - text.Replace(word, "").Length) / word.Length;return count;}

因为最初的解决方案,是最快的字符,我想它也将是字符串。所以这是我的贡献。

对于上下文:我在日志文件中寻找“失败”和“成功”之类的单词。

Gr,本

string source = "/once/upon/a/time/";int count = 0;int n = 0;
while ((n = source.IndexOf('/', n)) != -1){n++;count++;}

在我的计算机上,它比5000万迭代的每个字符解决方案快2秒。

2013年修订:

将字符串更改为char[]并遍历该字符串。将50m迭代的总时间再缩短一两秒!

char[] testchars = source.ToCharArray();foreach (char c in testchars){if (c == '/')count++;}

这是更快的:

char[] testchars = source.ToCharArray();int length = testchars.Length;for (int n = 0; n < length; n++){if (testchars[n] == '/')count++;}

为了更好地衡量,从数组末尾迭代到0似乎是最快的,大约5%。

int length = testchars.Length;for (int n = length-1; n >= 0; n--){if (testchars[n] == '/')count++;}

我想知道为什么这可能是并且正在谷歌搜索(我记得一些关于反向迭代更快的事情),并且遇到了这个SO问题,它已经恼人地使用了字符串到char[]技术。不过,我认为反转技巧在这种情况下是新的。

在C#中遍历字符串中各个字符的最快方法是什么?

str="aaabbbbjjja";int count = 0;int size = str.Length;
string[] strarray = new string[size];for (int i = 0; i < str.Length; i++){strarray[i] = str.Substring(i, 1);}Array.Sort(strarray);str = "";for (int i = 0; i < strarray.Length - 1; i++){
if (strarray[i] == strarray[i + 1]){
count++;}else{count++;str = str + strarray[i] + count;count = 0;}
}count++;str = str + strarray[strarray.Length - 1] + count;

这是用于计算字符出现的次数。对于这个例子,输出将是“a4b4j3”

字符串出现的通用函数:

public int getNumberOfOccurencies(String inputString, String checkString){if (checkString.Length > inputString.Length || checkString.Equals("")) { return 0; }int lengthDifference = inputString.Length - checkString.Length;int occurencies = 0;for (int i = 0; i < lengthDifference; i++) {if (inputString.Substring(i, checkString.Length).Equals(checkString)) { occurencies++; i += checkString.Length - 1; } }return occurencies;}
string Name = "Very good nice one is very good but is very good nice one this is called the term";bool valid=true;int count = 0;int k=0;int m = 0;while (valid){k = Name.Substring(m,Name.Length-m).IndexOf("good");if (k != -1){count++;m = m + k + 4;}elsevalid = false;}Console.WriteLine(count + " Times accures");

我做了一些研究,发现理查德·沃森的解决方案在大多数情况下是最快的。这是帖子中每个解决方案的结果的表(除了使用Regex的解决方案,因为它在解析字符串时抛出异常,如“test{test”)

    Name      | Short/char |  Long/char | Short/short| Long/short |  Long/long |Inspite   |         134|        1853|          95|        1146|         671|LukeH_1   |         346|        4490|         N/A|         N/A|         N/A|LukeH_2   |         152|        1569|         197|        2425|        2171|Bobwienholt   |         230|        3269|         N/A|         N/A|         N/A|Richard Watson|          33|         298|         146|         737|         543|StefanosKargas|         N/A|         N/A|         681|       11884|       12486|

您可以看到,在短字符串(10-50个字符)中找到短子串(1-5个字符)出现次数的情况下,首选原始算法。

此外,对于多字符子字符串,您应该使用以下代码(基于理查德·沃森的解决方案)

int count = 0, n = 0;
if(substring != ""){while ((n = source.IndexOf(substring, n, StringComparison.InvariantCulture)) != -1){n += substring.Length;++count;}}
string source = "/once/upon/a/time/";int count = 0, n = 0;while ((n = source.IndexOf('/', n) + 1) != 0) count++;

Richard Watson答案的变体,在字符串中出现的char次数越多,代码越少,效率就越高!

虽然我必须说,在没有广泛测试每个场景的情况下,我确实看到了一个非常显着的速度提高,通过使用:

int count = 0;for (int n = 0; n < source.Length; n++) if (source[n] == '/') count++;
public static int GetNumSubstringOccurrences(string text, string search){int num = 0;int pos = 0;
if (!string.IsNullOrEmpty(text) && !string.IsNullOrEmpty(search)){while ((pos = text.IndexOf(search, pos)) > -1){num ++;pos += search.Length;}}return num;}
            var conditionalStatement = conditionSetting.Value;
//order of replace matters, remove == before =, incase of ===conditionalStatement = conditionalStatement.Replace("==", "~").Replace("!=", "~").Replace('=', '~').Replace('!', '~').Replace('>', '~').Replace('<', '~').Replace(">=", "~").Replace("<=", "~");
var listOfValidConditions = new List<string>() { "!=", "==", ">", "<", ">=", "<=" };
if (conditionalStatement.Count(x => x == '~') != 1){result.InvalidFieldList.Add(new KeyFieldData(batch.DECurrentField, "The IsDoubleKeyCondition does not contain a supported conditional statement. Contact System Administrator."));result.Status = ValidatorStatus.Fail;return result;}

需要执行类似于从字符串测试条件语句的操作。

用单个字符替换了我正在寻找的内容,并计算了单个字符的实例。

显然,在发生这种情况之前,需要检查您使用的单个字符是否存在于字符串中,以避免错误的计数。

我认为最简单的方法是使用正则表达式。这样您就可以获得与使用myVar. Split('x')相同的拆分计数,但在多字符设置中。

string myVar = "do this to count the number of words in my wording so that I can word it up!";int count = Regex.Split(myVar, "word").Length;
Regex.Matches(input,  Regex.Escape("stringToMatch")).Count

对于任何想要准备使用String扩展方法的人,

以下是我使用的基于发布的最佳答案:

public static class StringExtension{/// <summary> Returns the number of occurences of a string within a string, optional comparison allows case and culture control. </summary>public static int Occurrences(this System.String input, string value, StringComparison stringComparisonType = StringComparison.Ordinal){if (String.IsNullOrEmpty(value)) return 0;
int count    = 0;int position = 0;
while ((position = input.IndexOf(value, position, stringComparisonType)) != -1){position += value.Length;count    += 1;}
return count;}
/// <summary> Returns the number of occurences of a single character within a string. </summary>public static int Occurrences(this System.String input, char value){int count = 0;foreach (char c in input) if (c == value) count += 1;return count;}}
string s = "HOWLYH THIS ACTUALLY WORKSH WOWH";int count = 0;for (int i = 0; i < s.Length; i++)if (s[i] == 'H') count++;

它只是检查字符串中的每个字符,如果该字符是您正在搜索的字符,则添加一个以计数。

如果你查看此网页,会有15种不同的方法进行基准测试,包括使用并行循环。

最快的方法似乎是使用单线程for循环(如果您有. Net版本<4.0)或parallel.for循环(如果使用. Net>4.0并进行数千次检查)。

假设“ss”是您的搜索字符串,“ch”是您的字符数组(如果您要查找的字符不止一个),以下是运行时间最快的单线程代码的基本要点:

for (int x = 0; x < ss.Length; x++){for (int y = 0; y < ch.Length; y++){for (int a = 0; a < ss[x].Length; a++ ){if (ss[x][a] == ch[y])//it's found. DO what you need to here.}}}

还提供了基准源代码,因此您可以运行自己的测试。

字符串中的字符串:

在"… JD JD JD etc and etc. JDJDJDJDJDJDJDJD and etc."中找到"etc"

var strOrigin = " .. JD JD JD JD etc. and etc. JDJDJDJDJDJDJDJD and etc.";var searchStr = "etc";int count = (strOrigin.Length - strOrigin.Replace(searchStr, "").Length)/searchStr.Length.

在丢弃这个不健全/笨拙之前检查性能…

我想我会把我的扩展方法扔进戒指(更多信息请参阅评论)。我没有做任何正式的基准测试,但我认为它必须非常快,适用于大多数场景。

编辑:好的-所以这个SO问题让我想知道我们当前实现的性能与这里提供的一些解决方案相比如何。我决定做一点基准测试,发现我们的解决方案的性能非常符合理查德·沃森提供的解决方案,直到你使用大字符串(100 Kb+)、大子字符串(32 Kb+)和许多嵌入式重复(10K+)进行激进搜索。当时我们的解决方案慢了大约2到4倍。考虑到这一点以及我们非常喜欢Richard Watson提出的解决方案,我们相应地重构了我们的解决方案。我只是想把它提供给任何可能从中受益的人。

我们的原始解决方案:

    /// <summary>/// Counts the number of occurrences of the specified substring within/// the current string./// </summary>/// <param name="s">The current string.</param>/// <param name="substring">The substring we are searching for.</param>/// <param name="aggressiveSearch">Indicates whether or not the algorithm/// should be aggressive in its search behavior (see Remarks). Default/// behavior is non-aggressive.</param>/// <remarks>This algorithm has two search modes - aggressive and/// non-aggressive. When in aggressive search mode (aggressiveSearch =/// true), the algorithm will try to match at every possible starting/// character index within the string. When false, all subsequent/// character indexes within a substring match will not be evaluated./// For example, if the string was 'abbbc' and we were searching for/// the substring 'bb', then aggressive search would find 2 matches/// with starting indexes of 1 and 2. Non aggressive search would find/// just 1 match with starting index at 1. After the match was made,/// the non aggressive search would attempt to make it's next match/// starting at index 3 instead of 2.</remarks>/// <returns>The count of occurrences of the substring within the string.</returns>public static int CountOccurrences(this string s, string substring,bool aggressiveSearch = false){// if s or substring is null or empty, substring cannot be found in sif (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(substring))return 0;
// if the length of substring is greater than the length of s,// substring cannot be found in sif (substring.Length > s.Length)return 0;
var sChars = s.ToCharArray();var substringChars = substring.ToCharArray();var count = 0;var sCharsIndex = 0;
// substring cannot start in s beyond following indexvar lastStartIndex = sChars.Length - substringChars.Length;
while (sCharsIndex <= lastStartIndex){if (sChars[sCharsIndex] == substringChars[0]){// potential match checkingvar match = true;var offset = 1;while (offset < substringChars.Length){if (sChars[sCharsIndex + offset] != substringChars[offset]){match = false;break;}offset++;}if (match){count++;// if aggressive, just advance to next char in s, otherwise,// skip past the match just found in ssCharsIndex += aggressiveSearch ? 1 : substringChars.Length;}else{// no match found, just move to next char in ssCharsIndex++;}}else{// no match at current index, move alongsCharsIndex++;}}
return count;}

以下是我们修改后的解决方案:

    /// <summary>/// Counts the number of occurrences of the specified substring within/// the current string./// </summary>/// <param name="s">The current string.</param>/// <param name="substring">The substring we are searching for.</param>/// <param name="aggressiveSearch">Indicates whether or not the algorithm/// should be aggressive in its search behavior (see Remarks). Default/// behavior is non-aggressive.</param>/// <remarks>This algorithm has two search modes - aggressive and/// non-aggressive. When in aggressive search mode (aggressiveSearch =/// true), the algorithm will try to match at every possible starting/// character index within the string. When false, all subsequent/// character indexes within a substring match will not be evaluated./// For example, if the string was 'abbbc' and we were searching for/// the substring 'bb', then aggressive search would find 2 matches/// with starting indexes of 1 and 2. Non aggressive search would find/// just 1 match with starting index at 1. After the match was made,/// the non aggressive search would attempt to make it's next match/// starting at index 3 instead of 2.</remarks>/// <returns>The count of occurrences of the substring within the string.</returns>public static int CountOccurrences(this string s, string substring,bool aggressiveSearch = false){// if s or substring is null or empty, substring cannot be found in sif (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(substring))return 0;
// if the length of substring is greater than the length of s,// substring cannot be found in sif (substring.Length > s.Length)return 0;
int count = 0, n = 0;while ((n = s.IndexOf(substring, n, StringComparison.InvariantCulture)) != -1){if (aggressiveSearch)n++;elsen += substring.Length;count++;}
return count;}
string search = "/string";var occurrences = (regex.Match(search, @"\/")).Count;

每次程序准确找到“/s”(区分大小写)时,这将计数这种情况的出现次数将存储在变量“出现次数”中

我最初的想法给了我这样的东西:

public static int CountOccurrences(string original, string substring){if (string.IsNullOrEmpty(substring))return 0;if (substring.Length == 1)return CountOccurrences(original, substring[0]);if (string.IsNullOrEmpty(original) ||substring.Length > original.Length)return 0;int substringCount = 0;for (int charIndex = 0; charIndex < original.Length; charIndex++){for (int subCharIndex = 0, secondaryCharIndex = charIndex; subCharIndex < substring.Length && secondaryCharIndex < original.Length; subCharIndex++, secondaryCharIndex++){if (substring[subCharIndex] != original[secondaryCharIndex])goto continueOuter;}if (charIndex + substring.Length > original.Length)break;charIndex += substring.Length - 1;substringCount++;continueOuter:;}return substringCount;}
public static int CountOccurrences(string original, char @char){if (string.IsNullOrEmpty(original))return 0;int substringCount = 0;for (int charIndex = 0; charIndex < original.Length; charIndex++)if (@char == original[charIndex])substringCount++;return substringCount;}

使用替换和除法的干草堆方法中的针产生21+秒,而这大约需要15.2秒。

在添加一个将substring.Length - 1添加到charIndex的位后进行编辑(就像它应该的那样),它在11.6秒。

编辑2:我使用了一个有26个双字符字符串的字符串,以下是更新为相同示例文本的时间:

大海捞针(OP版本):7.8秒

建议机制:4.6秒。

编辑3:添加单字符角盒,它去了1.2秒。

编辑4:对于上下文:使用了5000万次迭代。

在C#中,一个不错的String SubString计数器是这个出乎意料的棘手家伙:

public static int CCount(String haystack, String needle){return haystack.Split(new[] { needle }, StringSplitOptions.None).Length - 1;}

对于字符串分隔符的大小写(不是char大小写,正如主题所说):
字符串源 = "@@@一次@@@在@@@a@@@time@@@";
Split(新[] { "@@@" }, StringSplitOptions. Remve空条目)。长度-1;

海报的原始源值的(“/one/on/a/time/”)自然分隔符是一个char'/',响应确实解释了源。

我觉得我们缺少某些类型的子字符串计数,比如不安全的字节字节比较。我把原始海报的方法和我能想到的任何方法放在一起。

这些是我做的字符串扩展。

namespace Example{using System;using System.Text;
public static class StringExtensions{public static int CountSubstr(this string str, string substr){return (str.Length - str.Replace(substr, "").Length) / substr.Length;}
public static int CountSubstr(this string str, char substr){return (str.Length - str.Replace(substr.ToString(), "").Length);}
public static int CountSubstr2(this string str, string substr){int substrlen = substr.Length;int lastIndex = str.IndexOf(substr, 0, StringComparison.Ordinal);int count = 0;while (lastIndex != -1){++count;lastIndex = str.IndexOf(substr, lastIndex + substrlen, StringComparison.Ordinal);}
return count;}
public static int CountSubstr2(this string str, char substr){int lastIndex = str.IndexOf(substr, 0);int count = 0;while (lastIndex != -1){++count;lastIndex = str.IndexOf(substr, lastIndex + 1);}
return count;}
public static int CountChar(this string str, char substr){int length = str.Length;int count = 0;for (int i = 0; i < length; ++i)if (str[i] == substr)++count;
return count;}
public static int CountChar2(this string str, char substr){int count = 0;foreach (var c in str)if (c == substr)++count;
return count;}
public static unsafe int CountChar3(this string str, char substr){int length = str.Length;int count = 0;fixed (char* chars = str){for (int i = 0; i < length; ++i)if (*(chars + i) == substr)++count;}
return count;}
public static unsafe int CountChar4(this string str, char substr){int length = str.Length;int count = 0;fixed (char* chars = str){for (int i = length - 1; i >= 0; --i)if (*(chars + i) == substr)++count;}
return count;}
public static unsafe int CountSubstr3(this string str, string substr){int length = str.Length;int substrlen = substr.Length;int count = 0;fixed (char* strc = str){fixed (char* substrc = substr){int n = 0;
for (int i = 0; i < length; ++i){if (*(strc + i) == *(substrc + n)){++n;if (n == substrlen){++count;n = 0;}}elsen = 0;}}}
return count;}
public static int CountSubstr3(this string str, char substr){return CountSubstr3(str, substr.ToString());}
public static unsafe int CountSubstr4(this string str, string substr){int length = str.Length;int substrLastIndex = substr.Length - 1;int count = 0;fixed (char* strc = str){fixed (char* substrc = substr){int n = substrLastIndex;
for (int i = length - 1; i >= 0; --i){if (*(strc + i) == *(substrc + n)){if (--n == -1){++count;n = substrLastIndex;}}elsen = substrLastIndex;}}}
return count;}
public static int CountSubstr4(this string str, char substr){return CountSubstr4(str, substr.ToString());}}}

接下来是测试代码…

static void Main(){const char matchA = '_';const string matchB = "and";const string matchC = "muchlongerword";const string testStrA = "_and_d_e_banna_i_o___pfasd__and_d_e_banna_i_o___pfasd_";const string testStrB = "and sdf and ans andeians andano ip and and sdf and ans andeians andano ip and";const string testStrC ="muchlongerword amuchlongerworsdfmuchlongerwordsdf jmuchlongerworijv muchlongerword sdmuchlongerword dsmuchlongerword";const int testSize = 1000000;Console.WriteLine(testStrA.CountSubstr('_'));Console.WriteLine(testStrA.CountSubstr2('_'));Console.WriteLine(testStrA.CountSubstr3('_'));Console.WriteLine(testStrA.CountSubstr4('_'));Console.WriteLine(testStrA.CountChar('_'));Console.WriteLine(testStrA.CountChar2('_'));Console.WriteLine(testStrA.CountChar3('_'));Console.WriteLine(testStrA.CountChar4('_'));Console.WriteLine(testStrB.CountSubstr("and"));Console.WriteLine(testStrB.CountSubstr2("and"));Console.WriteLine(testStrB.CountSubstr3("and"));Console.WriteLine(testStrB.CountSubstr4("and"));Console.WriteLine(testStrC.CountSubstr("muchlongerword"));Console.WriteLine(testStrC.CountSubstr2("muchlongerword"));Console.WriteLine(testStrC.CountSubstr3("muchlongerword"));Console.WriteLine(testStrC.CountSubstr4("muchlongerword"));var timer = new Stopwatch();timer.Start();for (int i = 0; i < testSize; ++i)testStrA.CountSubstr(matchA);timer.Stop();Console.WriteLine("CS1 chr: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrB.CountSubstr(matchB);timer.Stop();Console.WriteLine("CS1 and: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrC.CountSubstr(matchC);timer.Stop();Console.WriteLine("CS1 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrA.CountSubstr2(matchA);timer.Stop();Console.WriteLine("CS2 chr: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrB.CountSubstr2(matchB);timer.Stop();Console.WriteLine("CS2 and: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrC.CountSubstr2(matchC);timer.Stop();Console.WriteLine("CS2 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrA.CountSubstr3(matchA);timer.Stop();Console.WriteLine("CS3 chr: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrB.CountSubstr3(matchB);timer.Stop();Console.WriteLine("CS3 and: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrC.CountSubstr3(matchC);timer.Stop();Console.WriteLine("CS3 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrA.CountSubstr4(matchA);timer.Stop();Console.WriteLine("CS4 chr: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrB.CountSubstr4(matchB);timer.Stop();Console.WriteLine("CS4 and: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrC.CountSubstr4(matchC);timer.Stop();Console.WriteLine("CS4 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrA.CountChar(matchA);timer.Stop();Console.WriteLine("CC1 chr: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrA.CountChar2(matchA);timer.Stop();Console.WriteLine("CC2 chr: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrA.CountChar3(matchA);timer.Stop();Console.WriteLine("CC3 chr: " + timer.Elapsed.TotalMilliseconds + "ms");
timer.Restart();for (int i = 0; i < testSize; ++i)testStrA.CountChar4(matchA);timer.Stop();Console.WriteLine("CC4 chr: " + timer.Elapsed.TotalMilliseconds + "ms");}

结果:CSX对应CountSubstrX,CCX对应CountCharX。"chr"在字符串中搜索"_","在字符串中搜索"and","mlw"在字符串中搜索"muchlong erword"

CS1 chr: 824.123msCS1 and: 586.1893msCS1 mlw: 486.5414msCS2 chr: 127.8941msCS2 and: 806.3918msCS2 mlw: 497.318msCS3 chr: 201.8896msCS3 and: 124.0675msCS3 mlw: 212.8341msCS4 chr: 81.5183msCS4 and: 92.0615msCS4 mlw: 116.2197msCC1 chr: 66.4078msCC2 chr: 64.0161msCC3 chr: 65.9013msCC4 chr: 65.8206ms

最后,我有一个360万字符的文件。它是“derp adfderdserp dfaerpderp deasderp”重复了100,000次。我用上述方法在文件中搜索“derp”100次这些结果。

CS1Derp: 1501.3444msCS2Derp: 1585.797msCS3Derp: 376.0937msCS4Derp: 271.1663ms

所以我的第四种方法肯定是赢家,但是,实际上,如果一个360万字符文件100次只需要1586ms作为最坏的情况,那么所有这些都是可以忽略不计的。

顺便说一下,我还使用100次CountSubstr和CountChar方法扫描了360万字符文件中的'd'char。

CS1  d : 2606.9513msCS2  d : 339.7942msCS3  d : 960.281msCS4  d : 233.3442msCC1  d : 302.4122msCC2  d : 280.7719msCC3  d : 299.1125msCC4  d : 292.9365ms

根据这一点,原始的海报方法对于大海捞针的单字符针来说是非常糟糕的。

注意:所有值都已更新为发布版本输出。我第一次发布时不小心忘记了在发布模式下构建。我的一些语句已被修改。

**计算char或string**

 string st = "asdfasdfasdfsadfasdf/asdfasdfas/dfsdfsdafsdfsd/fsadfasdf/dff";int count = 0;int location = 0;       
while (st.IndexOf("/", location + 1) > 0){count++;location = st.IndexOf("/", location + 1);}MessageBox.Show(count.ToString());

从. NET 5(Net core 2.1+&NetStandard 2.1)开始,我们有了一个新的迭代速度之王。

"Span"https://learn.microsoft.com/en-us/dotnet/api/system.span-1?view=net-5.0

String有一个内置成员,它返回一个Span

int count = 0;foreach( var c in source.AsSpan()){if (c == '/')count++;}

我的测试显示比直接Foreach快62%。我还与Span[i]上的for()循环以及这里发布的其他一些循环进行了比较。请注意,String上的for()迭代反向现在似乎比直接Foreach运行得慢。

Starting test, 10000000 iterations(base) foreach =   673 ms
fastest to slowestforeach Span =   252 ms   62.6%Span [i--] =   282 ms   58.1%Span [i++] =   402 ms   40.3%for [i++] =   454 ms   32.5%for [i--] =   867 ms  -28.8%Replace =  1905 ms -183.1%Split =  2109 ms -213.4%Linq.Count =  3797 ms -464.2%

更新:2021年12月,Visual Studio 2022,. NET 5和6

.NET 5Starting test, 100000000 iterations set(base) foreach =  7658 msfastest to slowestforeach Span =   3710 ms     51.6%Span [i--] =   3745 ms     51.1%Span [i++] =   3932 ms     48.7%for [i++] =   4593 ms     40.0%for [i--] =   7042 ms      8.0%(base) foreach =   7658 ms      0.0%Replace =  18641 ms   -143.4%Split =  21469 ms   -180.3%Linq =  39726 ms   -418.8%Regex Compiled = 128422 ms -1,577.0%Regex = 179603 ms -2,245.3%         
         
.NET 6Starting test, 100000000 iterations set(base) foreach =  7343 msfastest to slowestforeach Span =   2918 ms     60.3%for [i++] =   2945 ms     59.9%Span [i++] =   3105 ms     57.7%Span [i--] =   5076 ms     30.9%(base) foreach =   7343 ms      0.0%for [i--] =   8645 ms    -17.7%Replace =  18307 ms   -149.3%Split =  21440 ms   -192.0%Linq =  39354 ms   -435.9%Regex Compiled = 114178 ms -1,454.9%Regex = 186493 ms -2,439.7%

我添加了更多的循环并加入了RegEx,这样我们就可以看到在很多迭代中使用它是多么的灾难。我认为for(++)循环比较可能已经在. NET 6中进行了优化,以便在内部使用Span-因为它的速度几乎与Foreach跨度相同。

代码链接

查找char计数与查找string计数有很大不同。这也取决于您是否希望能够检查多个。如果您想检查各种不同的char计数,可以这样做:

var charCounts =haystack.GroupBy(c => c).ToDictionary(g => g.Key, g => g.Count());
var needleCount = charCounts.ContainsKey(needle) ? charCounts[needle] : 0;

注1:分组到字典中非常有用,因此为其编写GroupToDictionary扩展方法很有意义。

注意2:拥有自己的字典实现也很有用,该字典允许默认值,然后您可以自动为不存在的键获取0

Split(可能)胜过IndexOf(对于字符串)。

上面的基准似乎表明Richard Watson是错误的字符串最快的(也许差异来自我们的测试数据,但由于以下原因,这似乎很奇怪)。

如果我们更深入地了解. NET中这些方法的实现(对于Luke H、Richard Watson方法),

  • IndexOf取决于文化,它将尝试检索/创建ReadOnlySpan,检查是否必须忽略case等…然后最后执行不安全/本机调用。
  • Split能够处理多个分隔符并具有一些StringSplitOptions并且必须创建string[]数组并用拆分结果填充它(一些子字符串也是如此)。根据字符串出现的数量,Split可能比IndexOf快。

顺便说一句,我做了一个简化版本的IndexOf(如果我使用指针,它可能会更快,而且不安全但未选中对大多数人来说应该是可以的),它至少比4个数量级快。

基准(来源github

通过搜索一个普通单词(the)或一个小句子来完成莎士比亚理查三世

方法均值错误StdDev比率
Richard_LongInLong67.721 us1.0278 us0.9614 us1.00
Luke_LongInLong1.960 us0.0381 us0.0637 us0.03
Fab_LongInLong1.198 us0.0160 us0.0142 us0.02
-------------------------------:----------:----------:------:
Richard_ShortInLong104.771 us2.8117 us7.9304 us1.00
Luke_ShortInLong2.971 us0.0594 us0.0813 us0.03
Fab_ShortInLong2.206 us0.0419 us0.0411 us0.02
-------------------------------:---------:---------:------:
Richard_ShortInShort115.53 ns1.359 ns1.135 ns1.00
Luke_ShortInShort52.46 ns0.970 ns0.908 ns0.45
Fab_ShortInShort28.47 ns0.552 ns0.542 ns0.25
public int GetOccurrences(string input, string needle){int count = 0;unchecked{if (string.IsNullOrEmpty(input) || string.IsNullOrEmpty(needle)){return 0;}
for (var i = 0; i < input.Length - needle.Length + 1; i++){var c = input[i];if (c == needle[0]){for (var index = 0; index < needle.Length; index++){c = input[i + index];var n = needle[index];
if (c != n){break;}else if (index == needle.Length - 1){count++;}}}}}
return count;}