Bcp/BULK INSERT 与表值参数的性能

我将不得不使用 SQL Server 的 BULK INSERT命令重写一些相当老的代码，因为模式已经发生了变化，我突然想到，也许我应该考虑切换到使用 TVP 的存储过程，但我想知道它可能会对性能产生什么影响。

一些背景信息可能有助于解释我为什么要问这个问题:

数据实际上是通过 Web 服务进入的。Web 服务将一个文本文件写入数据库服务器上的一个共享文件夹，该文件夹执行 BULK INSERT。这个过程最初是在 SQL Server 2000上实现的，当时除了在服务器上丢弃几百个 INSERT语句之外别无选择，这实际上是最初的过程，是一个性能灾难。
数据被大容量地插入到一个永久的临时表中，然后合并到一个更大的表中(之后从临时表中删除数据)。
要插入的数据量是“大”的，但不是“巨大”的——通常是几百行，在极少数情况下最多可能是5-10k 行。因此，我的直觉是，BULK INSERT作为一个未记录的操作不会使那个有很大的不同(当然，我不确定，所以才有这个问题)。
插入实际上是一个更大的流水线批处理过程的一部分，需要连续多次插入; 因此性能是至关重要。

我想用 TVP 代替 BULK INSERT的原因是:

通过 NetBIOS 编写文本文件可能已经花费了一些时间，而且从架构的角度来看，这是相当可怕的。
我相信可以(也应该)消除暂存表。它存在的主要原因是插入的数据需要在插入的同时用于其他几个更新，并且尝试从大规模生产表进行更新比使用一个几乎空的临时表要昂贵得多。对于 TVP，参数基本上是是这个暂存表，我可以在主插入之前/之后对它做任何我想做的事情。
我几乎可以消除欺骗检查、清除代码以及与批量插入相关的所有开销。
如果服务器同时获得几个这样的事务，则不必担心临时表或 temdb 上的锁争用(我们试图避免这种情况，但它确实发生了)。

很明显，在投入生产之前，我会对此进行分析，但我认为，在我花费所有时间之前，最好先四处打听一下，看看是否有人对使用 TVP 的目的提出了严厉的警告。

因此，对于那些对 SQLServer2008足够熟悉并尝试过或至少调查过这个问题的人来说，结论是什么呢？对于插入，比方说，几百到几千行，发生在相当频繁的基础上，是否 TVP 削减芥末？与批量插入相比，性能有显著差异吗？

更新: 现在减少了92% 的问号！

(又名: 测试结果)

在经历了36个阶段的部署过程之后，最终的结果现在已经投入生产。两种解决方案都进行了广泛测试:

删除共享文件夹代码并直接使用 SqlBulkCopy类;
使用 TVP 切换到存储过程。

为了让读者了解什么确实经过了测试，为了消除对这些数据可靠性的任何疑问，这里有一个关于导入过程 确实如此的更详细的解释:

从一个通常大约20-50个数据点的时间数据序列开始(尽管它有时可能高达几百个) ;
对它进行一系列疯狂的处理，这些处理基本上是独立于数据库的。这个过程是并行的，因此(1)中的大约8-10个序列同时被处理。每个并行进程生成3个额外的序列。
取出所有3个序列和原始序列，并将它们组合成一批。
将所有8-10个现已完成的处理任务的批合并为一个大的超级批。
使用 BULK INSERT策略(参见下一步)或 TVP 策略(跳到第8步)导入它。
使用 SqlBulkCopy类将整个超级批处理转储到4个永久暂存表中。
运行一个存储过程，(a)对其中的2个表执行一系列聚合步骤，包括几个 JOIN条件，然后(b)使用聚合数据和非聚合数据对6个生产表执行 MERGE。(完成)

或者
生成4个包含要合并的数据的 DataTable对象; 其中3个包含 CLR 类型，但不幸的是 ADO.NET TVP 并不支持这些类型，因此它们必须作为字符串表示形式存在，这有点影响性能。
将 TVP 提供给一个存储过程，该存储过程实质上与(7)进行相同的处理，但是直接与接收到的表进行处理。(完成)

结果相当接近，但是 TVP 方法最终的平均性能更好，即使数据超过1000行时也是如此。

注意，这个导入过程连续运行了数千次，所以只要计算完成所有合并所花费的时间(是的，小时) ，就很容易得到平均时间。

最初，平均合并几乎只需要8秒钟就能完成(在正常负载下)。删除 NetBIOS 软件包并切换到 SqlBulkCopy可以将时间减少到几乎刚好7秒。切换到 TVP 进一步减少了每批次的时间为 5.2秒。对于一个运行时间以小时为单位的进程来说，这是一个 35% 的进步吞吐量-所以一点也不差。它也比 SqlBulkCopy提高了约25% 。

实际上，我相当有信心，真正的进步远不止于此。在测试过程中，很明显，最终的合并不再是关键路径; 相反，执行所有数据处理的 Web 服务开始受到进入的请求数量的影响。CPU 和数据库 I/O 都没有达到最大值，也没有显著的锁定活动。在某些情况下，我们看到在连续的合并之间有几秒的空闲时间。在使用 SqlBulkCopy时，有一个微小的差距，但是要小得多(半秒左右)。但是我想这将成为另一天的故事。

结论: 对于操作中等规模数据集的复杂导入 + 转换过程，表值参数的确比 BULK INSERT操作执行得更好。

我还想补充一点，只是为了减轻那些支持分段表的人的忧虑。在某种程度上，这整个服务是一个巨大的临时过程。流程的每一步都经过了严格的审计，所以我们不会用需要临时表来确定某些特定合并失败的原因(尽管在实践中这种情况几乎从未发生过)。我们所要做的就是在服务中设置一个调试标志，它将中断到调试器，或者将其数据转储到一个文件而不是数据库中。

换句话说，我们已经对这个过程有了足够的了解，不需要安全的临时表; 我们之所以有临时表的唯一原因是为了避免在不使用临时表的情况下必须使用的所有 INSERT和 UPDATE语句。在最初的流程中，暂存数据只在暂存表中存在几分之一秒，因此它在维护/可维护性方面没有增加任何价值。

还要注意，我们已经用 TVP 替换了没有的每一个 BULK INSERT操作。一些处理大量数据的操作和/或不需要对数据做任何特殊处理，只需要将数据丢到 DB 上即可，这些操作仍然使用 SqlBulkCopy。我并不是说 TVP 是一个性能万能药，只是说在这个特定的实例中，TVP 在初始阶段和最终合并之间进行了几次转换，从而成功地超越了 SqlBulkCopy。

这就是了。点去 TToni 找到最相关的链接，但我欣赏其他回应以及。再次感谢！

36192

小开

I think I'd still stick with a bulk insert approach. You may find that tempdb still gets hit using a TVP with a reasonable number of rows. This is my gut feeling, I can't say I've tested the performance of using TVP (I am interested in hearing others input too though)

You don't mention if you use .NET, but the approach that I've taken to optimise previous solutions was to do a bulk load of data using the SqlBulkCopy class - you don't need to write the data to a file first before loading, just give the SqlBulkCopy class (e.g.) a DataTable - that's the fastest way to insert data into the DB. 5-10K rows isn't much, I've used this for up to 750K rows. I suspect that in general, with a few hundred rows it wouldn't make a vast difference using a TVP. But scaling up would be limited IMHO.

Perhaps the new MERGE functionality in SQL 2008 would benefit you?

Also, if your existing staging table is a single table that is used for each instance of this process and you're worried about contention etc, have you considered creating a new "temporary" but physical staging table each time, then dropping it when it's finished with?

Note you can optimize the loading into this staging table, by populating it without any indexes. Then once populated, add any required indexes on at that point (FILLFACTOR=100 for optimal read performance, as at this point it will not be updated).

小开

最佳答案

I don't really have experience with TVP yet, however there is an nice performance comparison chart vs. BULK INSERT in MSDN here.

They say that BULK INSERT has higher startup cost, but is faster thereafter. In a remote client scenario they draw the line at around 1000 rows (for "simple" server logic). Judging from their description I would say you should be fine with using TVP's. The performance hit - if any - is probably negligible and the architectural benefits seem very good.

Edit: On a side note you can avoid the server-local file and still use bulk copy by using the SqlBulkCopy object. Just populate a DataTable, and feed it into the "WriteToServer"-Method of an SqlBulkCopy instance. Easy to use, and very fast.

小开

Staging tables are good! Really I wouldn't want to do it any other way. Why? Because data imports can change unexpectedly (And often in ways you can't foresee, like the time the columns were still called first name and last name but had the first name data in the last name column, for instance, to pick an example not at random.) Easy to research the problem with a staging table so you can see exactly what data was in the columns the import handled. Harder to find I think when you use an in memory table. I know a lot of people who do imports for a living as I do and all of them recommend using staging tables. I suspect there is a reason for this.

Further fixing a small schema change to a working process is easier and less time consuming than redesigning the process. If it is working and no one is willing to pay for hours to change it, then only fix what needs to be fixed due to the schema change. By changing the whole process, you introduce far more potential new bugs than by making a small change to an existing, tested working process.

And just how are you going to do away with all the data cleanup tasks? You may be doing them differently, but they still need to be done. Again, changing the process the way you describe is very risky.

Personally it sounds to me like you are just offended by using older techniques rather than getting the chance to play with new toys. You seem to have no real basis for wanting to change other than bulk insert is so 2000.

小开

The chart mentioned with regards to the link provided in @TToni's answer needs to be taken in context. I am not sure how much actual research went into those recommendations (also note that the chart seems to only be available in the 2008 and 2008 R2 versions of that documentation).

On the other hand there is this whitepaper from the SQL Server Customer Advisory Team: Maximizing Throughput with TVP

I have been using TVPs since 2009 and have found, at least in my experience, that for anything other than simple insert into a destination table with no additional logic needs (which is rarely ever the case), then TVPs are typically the better option.

I tend to avoid staging tables as data validation should be done at the app layer. By using TVPs, that is easily accommodated and the TVP Table Variable in the stored procedure is, by its very nature, a localized staging table (hence no conflict with other processes running at the same time like you get when using a real table for staging).

Regarding the testing done in the Question, I think it could be shown to be even faster than what was originally found:

You should not be using a DataTable, unless your application has use for it outside of sending the values to the TVP. Using the IEnumerable<SqlDataRecord> interface is faster and uses less memory as you are not duplicating the collection in memory only to send it to the DB. I have this documented in the following places:
- How can I insert 10 million records in the shortest time possible? (lots of extra info and links here as well)
- Pass Dictionary<string,int> to Stored Procedure T-SQL
- Streaming Data Into SQL Server 2008 From an Application (on SQLServerCentral.com ; free registration required)
TVPs are Table Variables and as such do not maintain statistics. Meaning, they report only having 1 row to the Query Optimizer. So, in your proc, either:
- Use statement-level recompile on any queries using the TVP for anything other than a simple SELECT: OPTION (RECOMPILE)
- Create a local temporary table (i.e. single #) and copy the contents of the TVP into the temp table