在 C # 中查找较大字符串中子字符串的所有位置

我有一个需要解析的大字符串,我需要找到 extract"(me,i-have lots. of]punctuation的所有实例,并将每个实例的索引存储到一个列表中。

假设这段字符串位于较大字符串的开头和中间,这两个字符串都会被找到,它们的索引将被添加到 List中。List将包含 0和另一个索引,不管它是什么。

我一直在玩,string.IndexOf差不多什么我在寻找,我已经写了一些代码-但它不工作,我一直无法找出究竟是什么错误:

List<int> inst = new List<int>();
int index = 0;
while (index < source.LastIndexOf("extract\"(me,i-have lots. of]punctuation", 0) + 39)
{
int src = source.IndexOf("extract\"(me,i-have lots. of]punctuation", index);
inst.Add(src);
index = src + 40;
}
  • 名单
  • 大字符串

有更好的主意吗?

130414 次浏览

Here's an example extension method for it:

public static List<int> AllIndexesOf(this string str, string value) {
if (String.IsNullOrEmpty(value))
throw new ArgumentException("the string to find may not be empty", "value");
List<int> indexes = new List<int>();
for (int index = 0;; index += value.Length) {
index = str.IndexOf(value, index);
if (index == -1)
return indexes;
indexes.Add(index);
}
}

If you put this into a static class and import the namespace with using, it appears as a method on any string, and you can just do:

List<int> indexes = "fooStringfooBar".AllIndexesOf("foo");

For more information on extension methods, http://msdn.microsoft.com/en-us/library/bb383977.aspx

Also the same using an iterator:

public static IEnumerable<int> AllIndexesOf(this string str, string value) {
if (String.IsNullOrEmpty(value))
throw new ArgumentException("the string to find may not be empty", "value");
for (int index = 0;; index += value.Length) {
index = str.IndexOf(value, index);
if (index == -1)
break;
yield return index;
}
}

Based on the code I've used for finding multiple instances of a string within a larger string, your code would look like:

List<int> inst = new List<int>();
int index = 0;
while (index >=0)
{
index = source.IndexOf("extract\"(me,i-have lots. of]punctuation", index);
inst.Add(index);
index++;
}
public List<int> GetPositions(string source, string searchString)
{
List<int> ret = new List<int>();
int len = searchString.Length;
int start = -len;
while (true)
{
start = source.IndexOf(searchString, start + len);
if (start == -1)
{
break;
}
else
{
ret.Add(start);
}
}
return ret;
}

Call it like this:

List<int> list = GetPositions("bob is a chowder head bob bob sldfjl", "bob");
// list will contain 0, 22, 26

Why don't you use the built in RegEx class:

public static IEnumerable<int> GetAllIndexes(this string source, string matchString)
{
matchString = Regex.Escape(matchString);
foreach (Match match in Regex.Matches(source, matchString))
{
yield return match.Index;
}
}

If you do need to reuse the expression then compile it and cache it somewhere. Change the matchString param to a Regex matchExpression in another overload for the reuse case.

using LINQ

public static IEnumerable<int> IndexOfAll(this string sourceString, string subString)
{
return Regex.Matches(sourceString, subString).Cast<Match>().Select(m => m.Index);
}

@csam is correct in theory, although his code will not complie and can be refractored to

public static IEnumerable<int> IndexOfAll(this string sourceString, string matchString)
{
matchString = Regex.Escape(matchString);
return from Match match in Regex.Matches(sourceString, matchString) select match.Index;
}

Polished version + case ignoring support:

public static int[] AllIndexesOf(string str, string substr, bool ignoreCase = false)
{
if (string.IsNullOrWhiteSpace(str) ||
string.IsNullOrWhiteSpace(substr))
{
throw new ArgumentException("String or substring is not specified.");
}


var indexes = new List<int>();
int index = 0;


while ((index = str.IndexOf(substr, index, ignoreCase ? StringComparison.OrdinalIgnoreCase : StringComparison.Ordinal)) != -1)
{
indexes.Add(index++);
}


return indexes.ToArray();
}
public static Dictionary<string, IEnumerable<int>> GetWordsPositions(this string input, string[] Susbtrings)
{
Dictionary<string, IEnumerable<int>> WordsPositions = new Dictionary<string, IEnumerable<int>>();
IEnumerable<int> IndexOfAll = null;
foreach (string st in Susbtrings)
{
IndexOfAll = Regex.Matches(input, st).Cast<Match>().Select(m => m.Index);
WordsPositions.Add(st, IndexOfAll);


}
return WordsPositions;
}

Hi nice answer by @Matti Virkkunen

public static List<int> AllIndexesOf(this string str, string value) {
if (String.IsNullOrEmpty(value))
throw new ArgumentException("the string to find may not be empty", "value");
List<int> indexes = new List<int>();
for (int index = 0;; index += value.Length) {
index = str.IndexOf(value, index);
if (index == -1)
return indexes;
indexes.Add(index);
index--;
}
}

But this covers tests cases like AOOAOOA where substring

are AOOA and AOOA

Output 0 and 3

Without Regex, using string comparison type:

string search = "123aa456AA789bb9991AACAA";
string pattern = "AA";
Enumerable.Range(0, search.Length)
.Select(index => { return new { Index = index, Length = (index + pattern.Length) > search.Length ? search.Length - index : pattern.Length }; })
.Where(searchbit => searchbit.Length == pattern.Length && pattern.Equals(search.Substring(searchbit.Index, searchbit.Length),StringComparison.OrdinalIgnoreCase))
.Select(searchbit => searchbit.Index)

This returns {3,8,19,22}. Empty pattern would match all positions.

For multiple patterns:

string search = "123aa456AA789bb9991AACAA";
string[] patterns = new string[] { "aa", "99" };
patterns.SelectMany(pattern => Enumerable.Range(0, search.Length)
.Select(index => { return new { Index = index, Length = (index + pattern.Length) > search.Length ? search.Length - index : pattern.Length }; })
.Where(searchbit => searchbit.Length == pattern.Length && pattern.Equals(search.Substring(searchbit.Index, searchbit.Length), StringComparison.OrdinalIgnoreCase))
.Select(searchbit => searchbit.Index))

This returns {3, 8, 19, 22, 15, 16}

I noticed that at least two proposed solutions don't handle overlapping search hits. I didn't check the one marked with the green checkmark. Here is one that handles overlapping search hits:

    public static List<int> GetPositions(this string source, string searchString)
{
List<int> ret = new List<int>();
int len = searchString.Length;
int start = -1;
while (true)
{
start = source.IndexOf(searchString, start +1);
if (start == -1)
{
break;
}
else
{
ret.Add(start);
}
}
return ret;
}

I found this example and incorporated it into a function:

    public static int solution1(int A, int B)
{
// Check if A and B are in [0...999,999,999]
if ( (A >= 0 && A <= 999999999) && (B >= 0 && B <= 999999999))
{
if (A == 0 && B == 0)
{
return 0;
}
// Make sure A < B
if (A < B)
{
// Convert A and B to strings
string a = A.ToString();
string b = B.ToString();
int index = 0;


// See if A is a substring of B
if (b.Contains(a))
{
// Find index where A is
if (b.IndexOf(a) != -1)
{
while ((index = b.IndexOf(a, index)) != -1)
{
Console.WriteLine(A + " found at position " + index);
index++;
}
Console.ReadLine();
return b.IndexOf(a);
}
else
return -1;
}
else
{
Console.WriteLine(A + " is not in " + B + ".");
Console.ReadLine();


return -1;
}
}
else
{
Console.WriteLine(A + " must be less than " + B + ".");
// Console.ReadLine();


return -1;
}
}
else
{
Console.WriteLine("A or B is out of range.");
//Console.ReadLine();


return -1;
}
}


static void Main(string[] args)
{
int A = 53, B = 1953786;
int C = 78, D = 195378678;
int E = 57, F = 153786;


solution1(A, B);
solution1(C, D);
solution1(E, F);


Console.WriteLine();
}

Returns:

53 found at position 2

78 found at position 4
78 found at position 7

57 is not in 153786

It could be done in efficient time complexity using KMP algorithm in O(N + M) where N is the length of text and M is the length of the pattern.

This is the implementation and usage:

static class StringExtensions
{
public static IEnumerable<int> AllIndicesOf(this string text, string pattern)
{
if (string.IsNullOrEmpty(pattern))
{
throw new ArgumentNullException(nameof(pattern));
}
return Kmp(text, pattern);
}


private static IEnumerable<int> Kmp(string text, string pattern)
{
int M = pattern.Length;
int N = text.Length;


int[] lps = LongestPrefixSuffix(pattern);
int i = 0, j = 0;


while (i < N)
{
if (pattern[j] == text[i])
{
j++;
i++;
}
if (j == M)
{
yield return i - j;
j = lps[j - 1];
}


else if (i < N && pattern[j] != text[i])
{
if (j != 0)
{
j = lps[j - 1];
}
else
{
i++;
}
}
}
}


private static int[] LongestPrefixSuffix(string pattern)
{
int[] lps = new int[pattern.Length];
int length = 0;
int i = 1;


while (i < pattern.Length)
{
if (pattern[i] == pattern[length])
{
length++;
lps[i] = length;
i++;
}
else
{
if (length != 0)
{
length = lps[length - 1];
}
else
{
lps[i] = length;
i++;
}
}
}
return lps;
}

and this is an example of how to use it:

static void Main(string[] args)
{
string text = "this is a test";
string pattern = "is";
foreach (var index in text.AllIndicesOf(pattern))
{
Console.WriteLine(index); // 2 5
}
}

How is this alternative implementation?

 public static class MyExtensions
{
public static int HowMany(this string str, char needle)
{
int counter = 0;
int nextIndex = 0;
for (; nextIndex != -1; )
{
nextIndex = str.IndexOf(needle, nextIndex);
if (nextIndex != -1)
{
counter++;
//step over to the next char
nextIndex++;
}
}
return counter;
}
}

you can use linq to select and enumerate all elements, then find by any string:

I've created a class:

class Pontos
{
//index on string
public int Pos { get; set; }
//caractere
public string Caractere { get; set; }
}

And use like this:

int count = 0;


var pontos = texto.Select(y => new Pontos { Pos = count++, Caractere = y.ToString() }).Where(x=>x.Caractere == ".").ToList();

then: input string: enter image description here

output list: enter image description here

PS: SeForNumero is another field of my class, I need this for my own purposes, but is not necessary to this use.