String.Replace()与 StringBuilder

我有一个字符串,其中需要用字典中的值替换标记。必须尽可能高效。使用 string.place 执行循环只会消耗内存(记住,字符串是不可变的)。StringBuilder.替换()是否更好,因为它是设计来处理字符串操作的?

我希望避免正则快递的费用,但如果这样做会更有效率,那就这样吧。

注意: 我不关心代码的复杂性,只关心它运行的速度和它消耗的内存。

平均属性: 长度为255-1024个字符,字典中为15-30个键。

63411 次浏览

This may be of help: https://learn.microsoft.com/en-us/archive/blogs/debuggingtoolbox/comparing-regex-replace-string-replace-and-stringbuilder-replace-which-has-better-performance

The short answer appears to be that String.Replace is faster, although it may have a larger impact on your memory footprint / garbage collection overhead.

Would stringbuilder.replace be any better [than String.Replace]

Yes, a lot better. And if you can estimate an upper bound for the new string (it looks like you can) then it will probably be fast enough.

When you create it like:

  var sb = new StringBuilder(inputString, pessimisticEstimate);

then the StringBuilder will not have to re-allocate its buffer.

Yes, StringBuilder will give you both gain in speed and memory (basically because it won't create an instance of a string each time you will make a manipulation with it - StringBuilder always operates with the same object). Here is an MSDN link with some details.

Converting data from a String to a StringBuilder and back will take some time. If one is only performing a single replace operation, this time may not be recouped by the efficiency improvements inherent in StringBuilder. On the other hand, if one converts a string to a StringBuilder, then performs many Replace operations on it, and converts it back at the end, the StringBuilder approach is apt to be faster.

Rather than running 15-30 replace operations on the entire string, it might be more efficient to use something like a trie data structure to hold your dictionary. Then you can loop through your input string once to do all your searching/replacing.

Using RedGate Profiler using the following code

class Program
{
static string data = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz";
static Dictionary<string, string> values;


static void Main(string[] args)
{
Console.WriteLine("Data length: " + data.Length);
values = new Dictionary<string, string>()
{
{ "ab", "aa" },
{ "jk", "jj" },
{ "lm", "ll" },
{ "yz", "zz" },
{ "ef", "ff" },
{ "st", "uu" },
{ "op", "pp" },
{ "x", "y" }
};


StringReplace(data);
StringBuilderReplace1(data);
StringBuilderReplace2(new StringBuilder(data, data.Length * 2));


Console.ReadKey();
}


private static void StringReplace(string data)
{
foreach(string k in values.Keys)
{
data = data.Replace(k, values[k]);
}
}


private static void StringBuilderReplace1(string data)
{
StringBuilder sb = new StringBuilder(data, data.Length * 2);
foreach (string k in values.Keys)
{
sb.Replace(k, values[k]);
}
}


private static void StringBuilderReplace2(StringBuilder data)
{
foreach (string k in values.Keys)
{
data.Replace(k, values[k]);
}
}
}
  • String.Replace = 5.843ms
  • StringBuilder.Replace #1 = 4.059ms
  • Stringbuilder.Replace #2 = 0.461ms

String length = 1456

stringbuilder #1 creates the stringbuilder in the method while #2 does not so the performance difference will end up being the same most likely since you're just moving that work out of the method. If you start with a stringbuilder instead of a string then #2 might be the way to go instead.

As far as memory, using RedGateMemory profiler, there is nothing to worry about until you get into MANY replace operations in which stringbuilder is going to win overall.

It will depend a lot on how many of the markers are present in a given string on average.

Performance of searching for a key is likely to be similar between StringBuilder and String, but StringBuilder will win if you have to replace many markers in a single string.

If you only expect one or two markers per string on average, and your dictionary is small, I would just go for the String.Replace.

If there are many markers, you might want to define a custom syntax to identify markers - e.g. enclosing in braces with a suitable escaping rule for a literal brace. You can then implement a parsing algorithm that iterates through the characters of the string once, recognizing and replacing each marker that it finds. Or use a regex.

My two cents here, I just wrote couple of lines of code to test how each method performs and, as expected, result is "it depends".

For longer strings Regex seems to be performing better, for shorter strings, String.Replace it is. I can see that usage of StringBuilder.Replace is not very useful, and if wrongly used, it could be lethal in GC perspective (I tried to share one instance of StringBuilder).

Check my StringReplaceTests GitHub repo.

The problem with @DustinDavis' answer is that it recursively operates on the same string. Unless you're planning on doing a back-and-forth type of manipulation, you really should have separate objects for each manipulation case in this kind of test.

I decided to create my own test because I found some conflicting answers all over the Web, and I wanted to be completely sure. The program I am working on deals with a lot of text (files with tens of thousands of lines in some cases).

So here's a quick method you can copy and paste and see for yourself which is faster. You may have to create your own text file to test, but you can easily copy and paste text from anywhere and make a large enough file for yourself:

using System;
using System.Diagnostics;
using System.IO;
using System.Text;
using System.Windows;


void StringReplace_vs_StringBuilderReplace( string file, string word1, string word2 )
{
using( FileStream fileStream = new FileStream( file, FileMode.Open, FileAccess.Read ) )
using( StreamReader streamReader = new StreamReader( fileStream, Encoding.UTF8 ) )
{
string text = streamReader.ReadToEnd(),
@string = text;
StringBuilder @StringBuilder = new StringBuilder( text );
int iterations = 10000;


Stopwatch watch1 = new Stopwatch.StartNew();
for( int i = 0; i < iterations; i++ )
if( i % 2 == 0 ) @string = @string.Replace( word1, word2 );
else @string = @string.Replace( word2, word1 );
watch1.Stop();
double stringMilliseconds = watch1.ElapsedMilliseconds;


Stopwatch watch2 = new Stopwatch.StartNew();
for( int i = 0; i < iterations; i++ )
if( i % 2 == 0 ) @StringBuilder = @StringBuilder .Replace( word1, word2 );
else @StringBuilder = @StringBuilder .Replace( word2, word1 );
watch2.Stop();
double StringBuilderMilliseconds = watch1.ElapsedMilliseconds;


MessageBox.Show( string.Format( "string.Replace: {0}\nStringBuilder.Replace: {1}",
stringMilliseconds, StringBuilderMilliseconds ) );
}
}

I got that string.Replace() was faster by about 20% every time swapping out 8-10 letter words. Try it for yourself if you want your own empirical evidence.