为什么操作符比方法调用慢那么多? (结构只有在较老的 JIT 上才慢)

简介: 我用 C # 编写高性能代码。是的,我知道 C + + 会给我更好的优化,但是我仍然选择使用 C # 。我不想讨论这个选择。相反,我希望听到那些像我一样试图在。NET 架构。

问题:

  • 为什么下面代码中的运算符比等效的运算符慢 方法调用? ? ?
  • 为什么该方法在下面的代码中传递两个双精度类型 比等效的方法传递一个包含两个 内部的双精度? (A: 较老的 JIT 优化结构很差)
  • 有没有办法让.NET JIT 编译器处理 简单的结构和结构的成员一样有效? (A: 获得更新的 JIT)

我想我知道的是: 原版的。NET JIT 编译器不会内联任何涉及结构的内容。奇怪的给定结构只应该用在需要小值类型的地方,这些类型应该像内置的那样进行优化,但应该是正确的。幸运的是。NET 3.5 SP1及。NET 2.0 SP2,他们对 JIT 优化器做了一些改进,包括内联的改进,特别是对结构。(我猜测他们这样做是因为否则他们引入的新的 Complexstruct 将会表现得非常糟糕... ... 所以复杂团队可能会猛烈抨击 JIT 优化团队。)所以,之前的所有文件。NET 3.5 SP1可能与这个问题不太相关。

我的测试显示: 通过检查 C: Windows Microsoft.NET Framework v2.0.50727 mscowks.dll 文件的版本 > = 3053,我已经确认我有更新的 JIT 优化器,因此应该对 JIT 优化器进行那些改进。然而,即便如此,我对拆卸过程的计时和观察都表明:

JIT 生成的用于传递具有两个双精度类型的 struct 的代码远不如直接传递两个双精度类型的代码有效。

JIT 为 struct 方法生成的代码传入‘ this’的效率远远高于将 struct 作为参数传入的效率。

如果传递两个双精度函数,而不是传递一个带有两个双精度函数的结构,即使使用乘法器,由于明显处于循环中,JIT 仍然内联得更好。

时机: 实际上,看到反汇编,我意识到循环中的大多数时间只是访问 List 中的测试数据。如果剔除循环的开销代码和数据的访问,那么进行相同调用的四种方法之间的区别就会大不相同。我可以从任何地方获得5倍到20倍的加速,因为我使用的是双倍、双倍而不是元素。并且10x 到40x 代替运算符 + = 执行 Plusequals (double,double)。哇。真可怜。

下面是一组时间:

Populating List<Element> took 320ms.
The PlusEqual() method took 105ms.
The 'same' += operator took 131ms.
The 'same' -= operator took 139ms.
The PlusEqual(double, double) method took 68ms.
The do nothing loop took 66ms.
The ratio of operator with constructor to method is 124%.
The ratio of operator without constructor to method is 132%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 64%.
If we remove the overhead time for the loop accessing the elements from the List...
The ratio of operator with constructor to method is 166%.
The ratio of operator without constructor to method is 187%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 5%.

守则:

namespace OperatorVsMethod
{
public struct Element
{
public double Left;
public double Right;


public Element(double left, double right)
{
this.Left = left;
this.Right = right;
}


public static Element operator +(Element x, Element y)
{
return new Element(x.Left + y.Left, x.Right + y.Right);
}


public static Element operator -(Element x, Element y)
{
x.Left += y.Left;
x.Right += y.Right;
return x;
}


/// <summary>
/// Like the += operator; but faster.
/// </summary>
public void PlusEqual(Element that)
{
this.Left += that.Left;
this.Right += that.Right;
}


/// <summary>
/// Like the += operator; but faster.
/// </summary>
public void PlusEqual(double thatLeft, double thatRight)
{
this.Left += thatLeft;
this.Right += thatRight;
}
}


[TestClass]
public class UnitTest1
{
[TestMethod]
public void TestMethod1()
{
Stopwatch stopwatch = new Stopwatch();


// Populate a List of Elements to multiply together
int seedSize = 4;
List<double> doubles = new List<double>(seedSize);
doubles.Add(2.5d);
doubles.Add(100000d);
doubles.Add(-0.5d);
doubles.Add(-100002d);


int size = 2500000 * seedSize;
List<Element> elts = new List<Element>(size);


stopwatch.Reset();
stopwatch.Start();
for (int ii = 0; ii < size; ++ii)
{
int di = ii % seedSize;
double d = doubles[di];
elts.Add(new Element(d, d));
}
stopwatch.Stop();
long populateMS = stopwatch.ElapsedMilliseconds;


// Measure speed of += operator (calls ctor)
Element operatorCtorResult = new Element(1d, 1d);
stopwatch.Reset();
stopwatch.Start();
for (int ii = 0; ii < size; ++ii)
{
operatorCtorResult += elts[ii];
}
stopwatch.Stop();
long operatorCtorMS = stopwatch.ElapsedMilliseconds;


// Measure speed of -= operator (+= without ctor)
Element operatorNoCtorResult = new Element(1d, 1d);
stopwatch.Reset();
stopwatch.Start();
for (int ii = 0; ii < size; ++ii)
{
operatorNoCtorResult -= elts[ii];
}
stopwatch.Stop();
long operatorNoCtorMS = stopwatch.ElapsedMilliseconds;


// Measure speed of PlusEqual(Element) method
Element plusEqualResult = new Element(1d, 1d);
stopwatch.Reset();
stopwatch.Start();
for (int ii = 0; ii < size; ++ii)
{
plusEqualResult.PlusEqual(elts[ii]);
}
stopwatch.Stop();
long plusEqualMS = stopwatch.ElapsedMilliseconds;


// Measure speed of PlusEqual(double, double) method
Element plusEqualDDResult = new Element(1d, 1d);
stopwatch.Reset();
stopwatch.Start();
for (int ii = 0; ii < size; ++ii)
{
Element elt = elts[ii];
plusEqualDDResult.PlusEqual(elt.Left, elt.Right);
}
stopwatch.Stop();
long plusEqualDDMS = stopwatch.ElapsedMilliseconds;


// Measure speed of doing nothing but accessing the Element
Element doNothingResult = new Element(1d, 1d);
stopwatch.Reset();
stopwatch.Start();
for (int ii = 0; ii < size; ++ii)
{
Element elt = elts[ii];
double left = elt.Left;
double right = elt.Right;
}
stopwatch.Stop();
long doNothingMS = stopwatch.ElapsedMilliseconds;


// Report results
Assert.AreEqual(1d, operatorCtorResult.Left, "The operator += did not compute the right result!");
Assert.AreEqual(1d, operatorNoCtorResult.Left, "The operator += did not compute the right result!");
Assert.AreEqual(1d, plusEqualResult.Left, "The operator += did not compute the right result!");
Assert.AreEqual(1d, plusEqualDDResult.Left, "The operator += did not compute the right result!");
Assert.AreEqual(1d, doNothingResult.Left, "The operator += did not compute the right result!");


// Report speeds
Console.WriteLine("Populating List<Element> took {0}ms.", populateMS);
Console.WriteLine("The PlusEqual() method took {0}ms.", plusEqualMS);
Console.WriteLine("The 'same' += operator took {0}ms.", operatorCtorMS);
Console.WriteLine("The 'same' -= operator took {0}ms.", operatorNoCtorMS);
Console.WriteLine("The PlusEqual(double, double) method took {0}ms.", plusEqualDDMS);
Console.WriteLine("The do nothing loop took {0}ms.", doNothingMS);


// Compare speeds
long percentageRatio = 100L * operatorCtorMS / plusEqualMS;
Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);
percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;
Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);
percentageRatio = 100L * plusEqualDDMS / plusEqualMS;
Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);


operatorCtorMS -= doNothingMS;
operatorNoCtorMS -= doNothingMS;
plusEqualMS -= doNothingMS;
plusEqualDDMS -= doNothingMS;
Console.WriteLine("If we remove the overhead time for the loop accessing the elements from the List...");
percentageRatio = 100L * operatorCtorMS / plusEqualMS;
Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);
percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;
Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);
percentageRatio = 100L * plusEqualDDMS / plusEqualMS;
Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);
}
}
}

IL: (也就是上面的一些代码被编译成什么)

public void PlusEqual(Element that)
{
00000000 push    ebp
00000001 mov     ebp,esp
00000003 push    edi
00000004 push    esi
00000005 push    ebx
00000006 sub     esp,30h
00000009 xor     eax,eax
0000000b mov     dword ptr [ebp-10h],eax
0000000e xor     eax,eax
00000010 mov     dword ptr [ebp-1Ch],eax
00000013 mov     dword ptr [ebp-3Ch],ecx
00000016 cmp     dword ptr ds:[04C87B7Ch],0
0000001d je     00000024
0000001f call    753081B1
00000024 nop
this.Left += that.Left;
00000025 mov     eax,dword ptr [ebp-3Ch]
00000028 fld     qword ptr [ebp+8]
0000002b fadd    qword ptr [eax]
0000002d fstp    qword ptr [eax]
this.Right += that.Right;
0000002f mov     eax,dword ptr [ebp-3Ch]
00000032 fld     qword ptr [ebp+10h]
00000035 fadd    qword ptr [eax+8]
00000038 fstp    qword ptr [eax+8]
}
0000003b nop
0000003c lea     esp,[ebp-0Ch]
0000003f pop     ebx
00000040 pop     esi
00000041 pop     edi
00000042 pop     ebp
00000043 ret     10h
public void PlusEqual(double thatLeft, double thatRight)
{
00000000 push    ebp
00000001 mov     ebp,esp
00000003 push    edi
00000004 push    esi
00000005 push    ebx
00000006 sub     esp,30h
00000009 xor     eax,eax
0000000b mov     dword ptr [ebp-10h],eax
0000000e xor     eax,eax
00000010 mov     dword ptr [ebp-1Ch],eax
00000013 mov     dword ptr [ebp-3Ch],ecx
00000016 cmp     dword ptr ds:[04C87B7Ch],0
0000001d je     00000024
0000001f call    75308159
00000024 nop
this.Left += thatLeft;
00000025 mov     eax,dword ptr [ebp-3Ch]
00000028 fld     qword ptr [ebp+10h]
0000002b fadd    qword ptr [eax]
0000002d fstp    qword ptr [eax]
this.Right += thatRight;
0000002f mov     eax,dword ptr [ebp-3Ch]
00000032 fld     qword ptr [ebp+8]
00000035 fadd    qword ptr [eax+8]
00000038 fstp    qword ptr [eax+8]
}
0000003b nop
0000003c lea     esp,[ebp-0Ch]
0000003f pop     ebx
00000040 pop     esi
00000041 pop     edi
00000042 pop     ebp
00000043 ret     10h
3080 次浏览

I would imagine as when you are accessing members of the struct, that it is infact doing an extra operation to access the member, the THIS pointer + offset.

I'm having some difficulty replicating your results.

I took your code:

  • made it a standalone console application
  • built an optimized (release) build
  • increased the "size" factor from 2.5M to 10M
  • ran it from the command line (outside the IDE)

When I did so, I got the following timings which are far different from yours. For the avoidance of doubt, I'll post exactly the code I used.

Here are my timings

Populating List<Element> took 527ms.
The PlusEqual() method took 450ms.
The 'same' += operator took 386ms.
The 'same' -= operator took 446ms.
The PlusEqual(double, double) method took 413ms.
The do nothing loop took 229ms.
The ratio of operator with constructor to method is 85%.
The ratio of operator without constructor to method is 99%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 91%.
If we remove the overhead time for the loop accessing the elements from the List...
The ratio of operator with constructor to method is 71%.
The ratio of operator without constructor to method is 98%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 83%.

And these are my edits to your code:

namespace OperatorVsMethod
{
public struct Element
{
public double Left;
public double Right;


public Element(double left, double right)
{
this.Left = left;
this.Right = right;
}


public static Element operator +(Element x, Element y)
{
return new Element(x.Left + y.Left, x.Right + y.Right);
}


public static Element operator -(Element x, Element y)
{
x.Left += y.Left;
x.Right += y.Right;
return x;
}


/// <summary>
/// Like the += operator; but faster.
/// </summary>
public void PlusEqual(Element that)
{
this.Left += that.Left;
this.Right += that.Right;
}


/// <summary>
/// Like the += operator; but faster.
/// </summary>
public void PlusEqual(double thatLeft, double thatRight)
{
this.Left += thatLeft;
this.Right += thatRight;
}
}


public class UnitTest1
{
public static void Main()
{
Stopwatch stopwatch = new Stopwatch();


// Populate a List of Elements to multiply together
int seedSize = 4;
List<double> doubles = new List<double>(seedSize);
doubles.Add(2.5d);
doubles.Add(100000d);
doubles.Add(-0.5d);
doubles.Add(-100002d);


int size = 10000000 * seedSize;
List<Element> elts = new List<Element>(size);


stopwatch.Reset();
stopwatch.Start();
for (int ii = 0; ii < size; ++ii)
{
int di = ii % seedSize;
double d = doubles[di];
elts.Add(new Element(d, d));
}
stopwatch.Stop();
long populateMS = stopwatch.ElapsedMilliseconds;


// Measure speed of += operator (calls ctor)
Element operatorCtorResult = new Element(1d, 1d);
stopwatch.Reset();
stopwatch.Start();
for (int ii = 0; ii < size; ++ii)
{
operatorCtorResult += elts[ii];
}
stopwatch.Stop();
long operatorCtorMS = stopwatch.ElapsedMilliseconds;


// Measure speed of -= operator (+= without ctor)
Element operatorNoCtorResult = new Element(1d, 1d);
stopwatch.Reset();
stopwatch.Start();
for (int ii = 0; ii < size; ++ii)
{
operatorNoCtorResult -= elts[ii];
}
stopwatch.Stop();
long operatorNoCtorMS = stopwatch.ElapsedMilliseconds;


// Measure speed of PlusEqual(Element) method
Element plusEqualResult = new Element(1d, 1d);
stopwatch.Reset();
stopwatch.Start();
for (int ii = 0; ii < size; ++ii)
{
plusEqualResult.PlusEqual(elts[ii]);
}
stopwatch.Stop();
long plusEqualMS = stopwatch.ElapsedMilliseconds;


// Measure speed of PlusEqual(double, double) method
Element plusEqualDDResult = new Element(1d, 1d);
stopwatch.Reset();
stopwatch.Start();
for (int ii = 0; ii < size; ++ii)
{
Element elt = elts[ii];
plusEqualDDResult.PlusEqual(elt.Left, elt.Right);
}
stopwatch.Stop();
long plusEqualDDMS = stopwatch.ElapsedMilliseconds;


// Measure speed of doing nothing but accessing the Element
Element doNothingResult = new Element(1d, 1d);
stopwatch.Reset();
stopwatch.Start();
for (int ii = 0; ii < size; ++ii)
{
Element elt = elts[ii];
double left = elt.Left;
double right = elt.Right;
}
stopwatch.Stop();
long doNothingMS = stopwatch.ElapsedMilliseconds;


// Report speeds
Console.WriteLine("Populating List<Element> took {0}ms.", populateMS);
Console.WriteLine("The PlusEqual() method took {0}ms.", plusEqualMS);
Console.WriteLine("The 'same' += operator took {0}ms.", operatorCtorMS);
Console.WriteLine("The 'same' -= operator took {0}ms.", operatorNoCtorMS);
Console.WriteLine("The PlusEqual(double, double) method took {0}ms.", plusEqualDDMS);
Console.WriteLine("The do nothing loop took {0}ms.", doNothingMS);


// Compare speeds
long percentageRatio = 100L * operatorCtorMS / plusEqualMS;
Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);
percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;
Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);
percentageRatio = 100L * plusEqualDDMS / plusEqualMS;
Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);


operatorCtorMS -= doNothingMS;
operatorNoCtorMS -= doNothingMS;
plusEqualMS -= doNothingMS;
plusEqualDDMS -= doNothingMS;
Console.WriteLine("If we remove the overhead time for the loop accessing the elements from the List...");
percentageRatio = 100L * operatorCtorMS / plusEqualMS;
Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);
percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;
Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);
percentageRatio = 100L * plusEqualDDMS / plusEqualMS;
Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);
}
}
}

Running .NET 4.0 here. I compiled with "Any CPU", targeting .NET 4.0 in release mode. Execution was from the command line. It ran in 64-bit mode. My timings are a bit different.

Populating List<Element> took 442ms.
The PlusEqual() method took 115ms.
The 'same' += operator took 201ms.
The 'same' -= operator took 200ms.
The PlusEqual(double, double) method took 129ms.
The do nothing loop took 93ms.
The ratio of operator with constructor to method is 174%.
The ratio of operator without constructor to method is 173%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 112%.
If we remove the overhead time for the loop accessing the elements from the List
...
The ratio of operator with constructor to method is 490%.
The ratio of operator without constructor to method is 486%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 163%.

In particular, PlusEqual(Element) is slightly faster than PlusEqual(double, double).

Whatever the problem is in .NET 3.5, it doesn't appear to exist in .NET 4.0.

Like @Corey Kosak, I just ran this code in VS 2010 Express as a simple Console App in Release mode. I get very different numbers. But I also have Fx4.5 so these might not be the results for a clean Fx4.0 .

Populating List<Element> took 435ms.
The PlusEqual() method took 109ms.
The 'same' += operator took 217ms.
The 'same' -= operator took 157ms.
The PlusEqual(double, double) method took 118ms.
The do nothing loop took 79ms.
The ratio of operator with constructor to method is 199%.
The ratio of operator without constructor to method is 144%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 108%.
If we remove the overhead time for the loop accessing the elements from the List
...
The ratio of operator with constructor to method is 460%.
The ratio of operator without constructor to method is 260%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 130%.

Edit: and now run from the cmd line. That does make a difference, and less variation in the numbers.

I'm getting very different results, much less dramatic. But didn't use the test runner, I pasted the code into a console mode app. The 5% result is ~87% in 32-bit mode, ~100% in 64-bit mode when I try it.

Alignment is critical on doubles, the .NET runtime can only promise an alignment of 4 on a 32-bit machine. Looks to me the test runner is starting the test methods with a stack address that's aligned to 4 instead of 8. The misalignment penalty gets very large when the double crosses a cache line boundary.

Not sure if this is relevant, but here's the numbers for .NET 4.0 64-bit on Windows 7 64-bit. My mscorwks.dll version is 2.0.50727.5446. I just pasted the code into LINQPad and ran it from there. Here's the result:

Populating List<Element> took 496ms.
The PlusEqual() method took 189ms.
The 'same' += operator took 295ms.
The 'same' -= operator took 358ms.
The PlusEqual(double, double) method took 148ms.
The do nothing loop took 103ms.
The ratio of operator with constructor to method is 156%.
The ratio of operator without constructor to method is 189%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 78%.
If we remove the overhead time for the loop accessing the elements from the List
...
The ratio of operator with constructor to method is 223%.
The ratio of operator without constructor to method is 296%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 52%.

May be instead of List you should use double[] with "well known" offsets and index increments?

In addition to JIT compiler differences mentioned in other answers, another difference between a struct method call and a struct operator is that a struct method call will pass this as a ref parameter (and may be written to accept other parameters as ref parameters as well), while a struct operator will pass all operands by value. The cost to pass a structure of any size as a ref parameter is fixed, no matter how large the structure is, while the cost to pass larger structures is proportional to structure size. There is nothing wrong with using large structures (even hundreds of bytes) if one can avoid copying them unnecessarily; while unnecessary copies can often be prevented when using methods, they cannot be prevented when using operators.