在创建1000个实体框架对象时,我应该何时调用 SaveChanges()

我正在运行一个导入,每次运行将有1000条记录。只是想确认一下我的假设:

以下哪一个最合理:

  1. 每次 AddToClassName()呼叫都运行 SaveChanges()
  2. 运行 SaveChanges()每个 AddToClassName()调用的 N号码。
  3. AddToClassName()调用的 所有之后运行 SaveChanges()

第一个选项可能比较慢,对吧?因为它需要分析内存中的 EF 对象,生成 SQL 等等。

我假设第二个选项是两个世界中最好的,因为我们可以在 SaveChanges()调用周围包装一个 try catch,并且一次只丢失 N数量的记录,如果其中一个失败的话。也许将每批产品存储在 List < > 中。如果 SaveChanges()调用成功,则删除列表。如果失败,记录项目。

最后一个选项可能也会非常慢,因为每个 EF 对象都必须在内存中,直到调用 SaveChanges()。如果保存失败,就不会提交任何内容,对吗?

40636 次浏览

If you need to import thousands of records, I'd use something like SqlBulkCopy, and not the Entity Framework for that.

I would test it first to be sure. Performance doesn't have to be that bad.

If you need to enter all rows in one transaction, call it after all of AddToClassName class. If rows can be entered independently, save changes after every row. Database consistence is important.

Second option I don't like. It would be confusing for me (from final user perspective) if I made import to system and it would decline 10 rows out of 1000, just because 1 is bad. You can try to import 10 and if it fails, try one by one and then log.

Test if it takes long time. Don't write 'propably'. You don't know it yet. Only when it is actually a problem, think about other solution (marc_s).

EDIT

I've done some tests (time in miliseconds):

10000 rows:

SaveChanges() after 1 row:18510,534
SaveChanges() after 100 rows:4350,3075
SaveChanges() after 10000 rows:5233,0635

50000 rows:

SaveChanges() after 1 row:78496,929
SaveChanges() after 500 rows:22302,2835
SaveChanges() after 50000 rows:24022,8765

So it is actually faster to commit after n rows than after all.

My recommendation is to:

  • SaveChanges() after n rows.
  • If one commit fails, try it one by one to find faulty row.

Test classes:

TABLE:

CREATE TABLE [dbo].[TestTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[SomeInt] [int] NOT NULL,
[SomeVarchar] [varchar](100) NOT NULL,
[SomeOtherVarchar] [varchar](50) NOT NULL,
[SomeOtherInt] [int] NULL,
CONSTRAINT [PkTestTable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Class:

public class TestController : Controller
{
//
// GET: /Test/
private readonly Random _rng = new Random();
private const string _chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";


private string RandomString(int size)
{
var randomSize = _rng.Next(size);


char[] buffer = new char[randomSize];


for (int i = 0; i < randomSize; i++)
{
buffer[i] = _chars[_rng.Next(_chars.Length)];
}
return new string(buffer);
}




public ActionResult EFPerformance()
{
string result = "";


TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(10000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 100 rows:" + EFPerformanceTest(10000, 100).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 10000 rows:" + EFPerformanceTest(10000, 10000).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(50000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 500 rows:" + EFPerformanceTest(50000, 500).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 50000 rows:" + EFPerformanceTest(50000, 50000).TotalMilliseconds + "<br/>";
TruncateTable();


return Content(result);
}


private void TruncateTable()
{
using (var context = new CamelTrapEntities())
{
var connection = ((EntityConnection)context.Connection).StoreConnection;
connection.Open();
var command = connection.CreateCommand();
command.CommandText = @"TRUNCATE TABLE TestTable";
command.ExecuteNonQuery();
}
}


private TimeSpan EFPerformanceTest(int noOfRows, int commitAfterRows)
{
var startDate = DateTime.Now;


using (var context = new CamelTrapEntities())
{
for (int i = 1; i <= noOfRows; ++i)
{
var testItem = new TestTable();
testItem.SomeVarchar = RandomString(100);
testItem.SomeOtherVarchar = RandomString(50);
testItem.SomeInt = _rng.Next(10000);
testItem.SomeOtherInt = _rng.Next(200000);
context.AddToTestTable(testItem);


if (i % commitAfterRows == 0) context.SaveChanges();
}
}


var endDate = DateTime.Now;


return endDate.Subtract(startDate);
}
}

I just optimized a very similar problem in my own code and would like to point out an optimization that worked for me.

I found that much of the time in processing SaveChanges, whether processing 100 or 1000 records at once, is CPU bound. So, by processing the contexts with a producer/consumer pattern (implemented with BlockingCollection), I was able to make much better use of CPU cores and got from a total of 4000 changes/second (as reported by the return value of SaveChanges) to over 14,000 changes/second. CPU utilization moved from about 13 % (I have 8 cores) to about 60%. Even using multiple consumer threads, I barely taxed the (very fast) disk IO system and CPU utilization of SQL Server was no higher than 15%.

By offloading the saving to multiple threads, you have the ability to tune both the number of records prior to commit and the number of threads performing the commit operations.

I found that creating 1 producer thread and (# of CPU Cores)-1 consumer threads allowed me to tune the number of records committed per batch such that the count of items in the BlockingCollection fluctuated between 0 and 1 (after a consumer thread took one item). That way, there was just enough work for the consuming threads to work optimally.

This scenario of course requires creating a new context for every batch, which I find to be faster even in a single-threaded scenario for my use case.

Use a stored procedure.

  1. Create a User-Defined Data Type in Sql Server.
  2. Create and populate an array of this type in your code (very fast).
  3. Pass the array to your stored procedure with one call (very fast).

I believe this would be the easiest and fastest way to do this.

Sorry, I know this thread is old, but I think this could help other people with this problem.

I had the same problem, but there is a possibility to validate the changes before you commit them. My code looks like this and it is working fine. With the chUser.LastUpdated I check if it is a new entry or only a change. Because it is not possible to reload an Entry that is not in the database yet.

// Validate Changes
var invalidChanges = _userDatabase.GetValidationErrors();
foreach (var ch in invalidChanges)
{
// Delete invalid User or Change
var chUser  =  (db_User) ch.Entry.Entity;
if (chUser.LastUpdated == null)
{
// Invalid, new User
_userDatabase.db_User.Remove(chUser);
Console.WriteLine("!Failed to create User: " + chUser.ContactUniqKey);
}
else
{
// Invalid Change of an Entry
_userDatabase.Entry(chUser).Reload();
Console.WriteLine("!Failed to update User: " + chUser.ContactUniqKey);
}
}


_userDatabase.SaveChanges();