使用带有大量数据的 SqlCommand 异步方法的性能糟糕透顶

我在使用异步调用时遇到了一些主要的 SQL 性能问题。

我已经在一个 SQLServer2016上创建了一个数据库,它位于我们的局域网中(所以不是 localDB)。

在这个数据库中,我有一个包含两列的 WorkingCopy表:

Id (nvarchar(255, PK))
Value (nvarchar(max))

DDL

CREATE TABLE [dbo].[Workingcopy]
(
[Id] [nvarchar](255) NOT NULL,
[Value] [nvarchar](max) NULL,


CONSTRAINT [PK_Workingcopy]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

在该表中,我插入了一条记录(id = ‘ PerfUnitTest’,Value是一个1.5 mb 的字符串(一个更大的 JSON 数据集的 zip))。

现在,如果我在 SSMS 中执行查询:

SELECT [Value]
FROM [Workingcopy]
WHERE id = 'perfunittest'

我立即得到了结果,并且在 SQLServreProfiler 中看到执行时间大约为20毫秒。一切正常。

当使用纯 SqlConnection从.NET (4.6)代码执行查询时:

// at this point, the connection is already open
var command = new SqlCommand($"SELECT Value FROM WorkingCopy WHERE Id = @Id", _connection);
command.Parameters.Add("@Id", SqlDbType.NVarChar, 255).Value = key;


string value = command.ExecuteScalar() as string;

它的执行时间也是大约20-30毫秒。

但是,当将其更改为异步代码时:

string value = await command.ExecuteScalarAsync() as string;

行刑时间突然变成了 1800毫秒!同样在 SQLServer 事件探查器中,我看到查询执行持续时间超过一秒。尽管探查器报告的执行查询与非异步版本完全相同。

但还有更糟的。如果我在连接字符串中使用 Packet Size,我会得到以下结果:

数据包大小32768: [ TIMING ] : SqlValueStore-> 中的 ExecuteScalarAsync 运行时间: 450毫秒

数据包大小4096: [ TIMING ] : SqlValueStore 中的 ExecuteScalarAsync-> 运行时间: 3667毫秒

数据包大小512: [ TIMING ] : SqlValueStore-> 中的 ExecuteScalarAsync 运行时间: 30776毫秒

30,000毫秒! !这比非异步版本慢了1000倍。SQLServerProfiler 报告查询执行时间超过10秒。那也不能解释剩下的20秒去了哪里!

然后我又切换回了同步版本,并使用了 Packet Size,尽管它确实对执行时间产生了一点影响,但它并不像异步版本那样引人注目。

作为旁注,如果只在值中放入一个小字符串(< 100字节) ,则异步查询的执行速度与同步版本一样快(结果为1或2ms)。

我真的很困惑,尤其是因为我使用的是内置的 SqlConnection,甚至没有 ORM。而且,当我四处寻找的时候,我也没有发现任何可以解释这种行为的东西。有什么想法吗?

17924 次浏览

On a system without significant load, an async call has a slightly bigger overhead. While the I/O operation itself is asynchronous regardless, blocking can be faster than thread-pool task switching.

How much overhead? Let's look at your timing numbers. 30ms for a blocking call, 450ms for an asynchronous call. 32 kiB packet size means you need you need about fifty individual I/O operations. That means we have roughly 8ms of overhead on each packet, which corresponds pretty well with your measurements over different packet sizes. That doesn't sound like overhead just from being asynchronous, even though the asynchronous versions need to do a lot more work than the synchronous. It sounds like the synchronous version is (simplified) 1 request -> 50 responses, while the asynchronous version ends up being 1 request -> 1 response -> 1 request -> 1 response -> ..., paying the cost over and over again.

Going deeper. ExecuteReader works just as well as ExecuteReaderAsync. The next operation is Read followed by a GetFieldValue - and an interesting thing happens there. If either of the two is async, the whole operation is slow. So there's certainly something very different happening once you start making things truly asynchronous - a Read will be fast, and then the async GetFieldValueAsync will be slow, or you can start with the slow ReadAsync, and then both GetFieldValue and GetFieldValueAsync are fast. The first asynchronous read from the stream is slow, and the slowness depends entirely on the size of the whole row. If I add more rows of the same size, reading each row takes the same amount of time as if I only have one row, so it's obvious that the data ExecuteReaderAsync0 still being streamed row by row - it just seems to prefer to read the whole row at once once you start ExecuteReaderAsync1 asynchronous read. If I read the first row asynchronously, and the second synchronously - the second row being read will be fast again.

So we can see that the problem is a big size of an individual row and/or column. It doesn't matter how much data you have in total - reading a million small rows asynchronously is just as fast as synchronously. But add just a single field that's too big to fit in a single packet, and you mysteriously incur a cost at asynchronously reading that data - as if each packet needed a separate request packet, and the server couldn't just send all the data at once. Using CommandBehavior.SequentialAccess does improve the performance as expected, but the massive gap between sync and async still exists.

The best performance I got was when doing the whole thing properly. That means using CommandBehavior.SequentialAccess, as well as streaming the data explicitly:

using (var reader = await cmd.ExecuteReaderAsync(CommandBehavior.SequentialAccess))
{
while (await reader.ReadAsync())
{
var data = await reader.GetTextReader(0).ReadToEndAsync();
}
}

With this, the difference between sync and async becomes hard to measure, and changing the packet size no longer incurs the ridiculous overhead as before.

If you want good performance in edge cases, make sure to use the best tools available - in this case, stream large column data rather than relying on helpers like ExecuteScalar or GetFieldValue.