为什么 LINQJOIN 比 WHERE 链接快得多?

我最近升级到 VS 2010,我在玩 LINQ 到数据集。我在 ASP.NET WebApplication 的 HttpCache 中有一个用于授权的强类型数据集。

所以我想知道什么是检查用户是否被授权做某事的最快方法。给你是我的数据模型和其他一些信息,如果有人感兴趣的话。

我查了三种方法:

  1. 直接 资料库
  2. 使用 在哪里条件作为“ Join”的 LINQ 查询-语法
  3. 使用 加入语法的 LINQ 查询

以下是每个函数有1000个调用的结果:

1. 迭代:

  1. 4,2841519秒。
  2. 115,7796925秒。
  3. 2024749秒。

2. 迭代:

  1. 3,1954857秒。
  2. 8497047秒。
  3. 1,5783397秒。

3. 迭代:

  1. 27922143秒。
  2. 97,8713267秒。
  3. 1,8432163秒。

平均数:

  1. 数据库: 3,4239506333秒。
  2. 位置: 99,5404964秒。
  3. 加入: 1,815435秒。

为什么加入版本比 where 语法快得多,这使得它没有用处,虽然作为一个 LINQ 新手它似乎是最易读的。还是我在查询中遗漏了什么?

下面是 LINQ 查询,我跳过了数据库:

在哪里 :

Public Function hasAccessDS_Where(ByVal accessRule As String) As Boolean
Dim userID As Guid = DirectCast(Membership.GetUser.ProviderUserKey, Guid)
Dim query = From accRule In Authorization.dsAuth.aspnet_AccessRule, _
roleAccRule In Authorization.dsAuth.aspnet_RoleAccessRule, _
role In Authorization.dsAuth.aspnet_Roles, _
userRole In Authorization.dsAuth.aspnet_UsersInRoles _
Where accRule.idAccessRule = roleAccRule.fiAccessRule _
And roleAccRule.fiRole = role.RoleId _
And userRole.RoleId = role.RoleId _
And userRole.UserId = userID And accRule.RuleName.Contains(accessRule)
Select accRule.idAccessRule
Return query.Any
End Function

加入:

Public Function hasAccessDS_Join(ByVal accessRule As String) As Boolean
Dim userID As Guid = DirectCast(Membership.GetUser.ProviderUserKey, Guid)
Dim query = From accRule In Authorization.dsAuth.aspnet_AccessRule _
Join roleAccRule In Authorization.dsAuth.aspnet_RoleAccessRule _
On accRule.idAccessRule Equals roleAccRule.fiAccessRule _
Join role In Authorization.dsAuth.aspnet_Roles _
On role.RoleId Equals roleAccRule.fiRole _
Join userRole In Authorization.dsAuth.aspnet_UsersInRoles _
On userRole.RoleId Equals role.RoleId _
Where userRole.UserId = userID And accRule.RuleName.Contains(accessRule)
Select accRule.idAccessRule
Return query.Any
End Function

先谢谢你。


编辑 : 在对这两个查询进行了一些改进以获得更有意义的性能值之后,JOIN 的优势甚至比以前大了许多倍:

加入 :

Public Overloads Shared Function hasAccessDS_Join(ByVal userID As Guid, ByVal idAccessRule As Int32) As Boolean
Dim query = From accRule In Authorization.dsAuth.aspnet_AccessRule _
Join roleAccRule In Authorization.dsAuth.aspnet_RoleAccessRule _
On accRule.idAccessRule Equals roleAccRule.fiAccessRule _
Join role In Authorization.dsAuth.aspnet_Roles _
On role.RoleId Equals roleAccRule.fiRole _
Join userRole In Authorization.dsAuth.aspnet_UsersInRoles _
On userRole.RoleId Equals role.RoleId _
Where accRule.idAccessRule = idAccessRule And userRole.UserId = userID
Select role.RoleId
Return query.Any
End Function

在哪里 :

Public Overloads Shared Function hasAccessDS_Where(ByVal userID As Guid, ByVal idAccessRule As Int32) As Boolean
Dim query = From accRule In Authorization.dsAuth.aspnet_AccessRule, _
roleAccRule In Authorization.dsAuth.aspnet_RoleAccessRule, _
role In Authorization.dsAuth.aspnet_Roles, _
userRole In Authorization.dsAuth.aspnet_UsersInRoles _
Where accRule.idAccessRule = roleAccRule.fiAccessRule _
And roleAccRule.fiRole = role.RoleId _
And userRole.RoleId = role.RoleId _
And accRule.idAccessRule = idAccessRule And userRole.UserId = userID
Select role.RoleId
Return query.Any
End Function

1000次调用的结果(在更快的计算机上)

  1. 加入2号,在哪里

1. 迭代:

  1. 0,0713669秒。
  2. 12,7395299秒。

2. 迭代:

  1. 0,0492458秒。
  2. 12,3885925秒。

3. 迭代:

  1. 0,0501982秒。
  2. 13,3474216秒。

平均数:

  1. 加入: 0,0569367秒。
  2. 位置: 12,8251813秒。

加入要快225倍

结论: 避免 WHERE 指定关系并尽可能使用 JOIN (一般在 从 LINQ 到 DataSetLinq-To-Objects中明确)。

27302 次浏览

The Join is much faster, because the method knows how to combine the tables to reduce the result to the relevant combinations. When you use Where to specify the relation, it has to create every possible combination, and then test the condition to see which combinations are relevant.

The Join method can set up a hash table to use as an index to quicky zip two tables together, while the Where method runs after all the combinations are already created, so it can't use any tricks to reduce the combinations beforehand.

what you really need to know is the sql that was created for the two statements. There are a few ways of getting to it but the simplest is to use LinqPad. There are several buttons right above the query results that will change to the sql. That will give you a lot more information than anything else.

Great information you shared there though.

  1. Your first approach (SQL query in the DB) is quite efficient because the DB knows how to perform a join. But it doesn't really make sense to compare it with the other approaches, since they work directly in memory (Linq to DataSet)

  2. The query with multiple tables and a Where condition actually performs a cartesian product of all the tables, then filters the rows that satisfy the condition. This means the Where condition is evaluated for each combination of rows (n1 * n2 * n3 * n4)

  3. The Join operator takes the rows from the first tables, then takes only the rows with a matching key from the second table, then only the rows with a matching key from the third table, and so on. This is much more efficient, because it doesn't need to perform as many operations