完全外部连接

我有一份人们的身份证和名字的名单,还有一份人们的身份证和姓氏的名单。有些人没有名字,有些人没有姓;我想在这两个列表上做一个完整的外部连接。

下面列出:

ID  FirstName
--  ---------
1  John
2  Sue


ID  LastName
--  --------
1  Doe
3  Smith

应该生产:

ID  FirstName  LastName
--  ---------  --------
1  John       Doe
2  Sue
3             Smith

我是LINQ的新手(所以原谅我,如果我是跛脚的),并且已经找到了相当多的解决方案的“LINQ外部连接”,它们看起来都很相似,但似乎真的是离开外部连接。

到目前为止我的尝试是这样的:

private void OuterJoinTest()
{
List<FirstName> firstNames = new List<FirstName>();
firstNames.Add(new FirstName { ID = 1, Name = "John" });
firstNames.Add(new FirstName { ID = 2, Name = "Sue" });


List<LastName> lastNames = new List<LastName>();
lastNames.Add(new LastName { ID = 1, Name = "Doe" });
lastNames.Add(new LastName { ID = 3, Name = "Smith" });


var outerJoin = from first in firstNames
join last in lastNames
on first.ID equals last.ID
into temp
from last in temp.DefaultIfEmpty()
select new
{
id = first != null ? first.ID : last.ID,
firstname = first != null ? first.Name : string.Empty,
surname = last != null ? last.Name : string.Empty
};
}
}


public class FirstName
{
public int ID;


public string Name;
}


public class LastName
{
public int ID;


public string Name;
}

但结果是:

ID  FirstName  LastName
--  ---------  --------
1  John       Doe
2  Sue

我做错了什么?

160484 次浏览

正如您所发现的,Linq没有“外部连接”结构。您所能得到的最接近的是使用您所声明的查询的左外连接。为此,你可以添加任何没有在join中表示的姓氏列表元素:

outerJoin = outerJoin.Concat(lastNames.Select(l=>new
{
id = l.ID,
firstname = String.Empty,
surname = l.Name
}).Where(l=>!outerJoin.Any(o=>o.id == l.id)));

我不知道这是否适用于所有情况,但从逻辑上讲,这似乎是正确的。其思想是取一个左外连接和一个右外连接,然后取结果的并集。

var firstNames = new[]
{
new { ID = 1, Name = "John" },
new { ID = 2, Name = "Sue" },
};
var lastNames = new[]
{
new { ID = 1, Name = "Doe" },
new { ID = 3, Name = "Smith" },
};
var leftOuterJoin =
from first in firstNames
join last in lastNames on first.ID equals last.ID into temp
from last in temp.DefaultIfEmpty()
select new
{
first.ID,
FirstName = first.Name,
LastName = last?.Name,
};
var rightOuterJoin =
from last in lastNames
join first in firstNames on last.ID equals first.ID into temp
from first in temp.DefaultIfEmpty()
select new
{
last.ID,
FirstName = first?.Name,
LastName = last.Name,
};
var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);

这就像写的一样,因为它是在LINQ to Objects中。如果LINQ to SQL或其他,查询处理器可能不支持安全导航或其他操作。你必须使用条件操作符有条件地获取值。

也就是说,

var leftOuterJoin =
from first in firstNames
join last in lastNames on first.ID equals last.ID into temp
from last in temp.DefaultIfEmpty()
select new
{
first.ID,
FirstName = first.Name,
LastName = last != null ? last.Name : default,
};
< p > 更新1:提供了一个真正广义的扩展方法FullOuterJoin
更新2:可选地接受自定义IEqualityComparer作为键类型
更新3:这个实现有最近成为MoreLinq的一部分 -谢谢大家!

编辑添加了FullOuterGroupJoin (ideone)。我重用了GetOuter<>实现,使其性能比它可能的要低一些,但我的目标是“高级”代码,而不是前沿优化,现在。

http://ideone.com/O36nWc上看到它

static void Main(string[] args)
{
var ax = new[] {
new { id = 1, name = "John" },
new { id = 2, name = "Sue" } };
var bx = new[] {
new { id = 1, surname = "Doe" },
new { id = 3, surname = "Smith" } };


ax.FullOuterJoin(bx, a => a.id, b => b.id, (a, b, id) => new {a, b})
.ToList().ForEach(Console.WriteLine);
}

打印输出:

{ a = { id = 1, name = John }, b = { id = 1, surname = Doe } }
{ a = { id = 2, name = Sue }, b =  }
{ a = , b = { id = 3, surname = Smith } }

你也可以提供默认值:http://ideone.com/kG4kqO

    ax.FullOuterJoin(
bx, a => a.id, b => b.id,
(a, b, id) => new { a.name, b.surname },
new { id = -1, name    = "(no firstname)" },
new { id = -2, surname = "(no surname)" }
)

印刷:

{ name = John, surname = Doe }
{ name = Sue, surname = (no surname) }
{ name = (no firstname), surname = Smith }

使用术语的解释:

连接是一个借用自关系数据库设计的术语:

  • 加入将重复a中的元素的次数与b 有相应的键中的元素相同(即:如果b为空,则什么都没有)。数据库行话称之为inner (equi)join
  • 外连接包含来自a的元素,其中没有对应的元素 >存在于b中。(即:即使b为空的结果)。李这通常被称为left join。< / >
  • 完全外连接包含来自a 以及b的记录,如果无对应元素存在于另一个。(即即使a为空的结果)

在RDBMS中看到的不是通常的东西是组join[1]:

  • A 组织加入,做的和上面描述的一样,不是为多个对应的b重复来自a的元素,而是具有相应键的记录。当您希望根据公共键枚举“已连接”记录时,这通常更方便。

也请参见GroupJoin,其中包含一些一般的背景解释。


[1](我相信Oracle和MSSQL对此有专有扩展)

完整代码

一个通用的“drop-in”扩展类

internal static class MyExtensions
{
internal static IEnumerable<TResult> FullOuterGroupJoin<TA, TB, TKey, TResult>(
this IEnumerable<TA> a,
IEnumerable<TB> b,
Func<TA, TKey> selectKeyA,
Func<TB, TKey> selectKeyB,
Func<IEnumerable<TA>, IEnumerable<TB>, TKey, TResult> projection,
IEqualityComparer<TKey> cmp = null)
{
cmp = cmp?? EqualityComparer<TKey>.Default;
var alookup = a.ToLookup(selectKeyA, cmp);
var blookup = b.ToLookup(selectKeyB, cmp);


var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
keys.UnionWith(blookup.Select(p => p.Key));


var join = from key in keys
let xa = alookup[key]
let xb = blookup[key]
select projection(xa, xb, key);


return join;
}


internal static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
this IEnumerable<TA> a,
IEnumerable<TB> b,
Func<TA, TKey> selectKeyA,
Func<TB, TKey> selectKeyB,
Func<TA, TB, TKey, TResult> projection,
TA defaultA = default(TA),
TB defaultB = default(TB),
IEqualityComparer<TKey> cmp = null)
{
cmp = cmp?? EqualityComparer<TKey>.Default;
var alookup = a.ToLookup(selectKeyA, cmp);
var blookup = b.ToLookup(selectKeyB, cmp);


var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
keys.UnionWith(blookup.Select(p => p.Key));


var join = from key in keys
from xa in alookup[key].DefaultIfEmpty(defaultA)
from xb in blookup[key].DefaultIfEmpty(defaultB)
select projection(xa, xb, key);


return join;
}
}

下面是一个扩展方法:

public static IEnumerable<KeyValuePair<TLeft, TRight>> FullOuterJoin<TLeft, TRight>(this IEnumerable<TLeft> leftItems, Func<TLeft, object> leftIdSelector, IEnumerable<TRight> rightItems, Func<TRight, object> rightIdSelector)
{
var leftOuterJoin = from left in leftItems
join right in rightItems on leftIdSelector(left) equals rightIdSelector(right) into temp
from right in temp.DefaultIfEmpty()
select new { left, right };


var rightOuterJoin = from right in rightItems
join left in leftItems on rightIdSelector(right) equals leftIdSelector(left) into temp
from left in temp.DefaultIfEmpty()
select new { left, right };


var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);


return fullOuterJoin.Select(x => new KeyValuePair<TLeft, TRight>(x.left, x.right));
}

我猜@sehe的方法更强大,但在我更好地理解它之前,我发现自己跳过了@MichaelSander的扩展。我修改了它,以匹配在这里描述的内置Enumerable.Join()方法的语法和返回类型。我在@JeffMercado的解决方案下为@cadrell0的注释添加了“distinct”后缀。

public static class MyExtensions {


public static IEnumerable<TResult> FullJoinDistinct<TLeft, TRight, TKey, TResult> (
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector
) {


var leftJoin =
from left in leftItems
join right in rightItems
on leftKeySelector(left) equals rightKeySelector(right) into temp
from right in temp.DefaultIfEmpty()
select resultSelector(left, right);


var rightJoin =
from right in rightItems
join left in leftItems
on rightKeySelector(right) equals leftKeySelector(left) into temp
from left in temp.DefaultIfEmpty()
select resultSelector(left, right);


return leftJoin.Union(rightJoin);
}


}

在这个例子中,你可以这样使用它:

var test =
firstNames
.FullJoinDistinct(
lastNames,
f=> f.ID,
j=> j.ID,
(f,j)=> new {
ID = f == null ? j.ID : f.ID,
leftName = f == null ? null : f.Name,
rightName = j == null ? null : j.Name
}
);

在未来,随着我了解更多,我有一种感觉,我会迁移到@sehe的逻辑,因为它很受欢迎。但即使这样,我也必须小心,因为我觉得如果可行的话,至少有一个重载与现有的“.Join()”方法的语法匹配是很重要的,原因有两个:

  1. 方法的一致性有助于节省时间、避免错误和避免意外行为。
  2. 如果将来有一个开箱即用的“. fulljoin()”方法,我想它会尽量保持当前存在的“. join()”方法的语法。如果是这样,那么如果您想迁移到它,您可以简单地重命名函数,而不需要更改参数或担心不同的返回类型破坏您的代码。

对于泛型、扩展、Func语句和其他特性,我还是新手,所以当然欢迎反馈。

我没花很长时间就意识到我的代码有问题。我在LINQPad中做了一个. dump(),并查看返回类型。它只是IEnumerable,所以我试着匹配它。但是当我实际上在我的扩展上做了. where()或. select()时,我得到了一个错误:“系统集合。IEnumerable'不包含'Select'和…"的定义。因此,最后我能够匹配. join()的输入语法,但不能匹配返回行为。

编辑:将"TResult"添加到函数的返回类型中。在阅读微软的文章时错过了这一点,当然这是有道理的。有了这个修复,现在看来返回行为是符合我的目标毕竟。

我真的很讨厌这些linq表达式,这就是SQL存在的原因:

select isnull(fn.id, ln.id) as id, fn.firstname, ln.lastname
from firstnames fn
full join lastnames ln on ln.id=fn.id

在数据库中创建此sql视图并将其作为实体导入。

当然,(不同的)左连接和右连接的并集也可以做到这一点,但这是愚蠢的。

我喜欢她的回答,但它没有使用延迟执行(输入序列被调用ToLookup急切地枚举)。因此,在查看了LINQ-to-objects的.NET源代码后,我想到了这个:

public static class LinqExtensions
{
public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> left,
IEnumerable<TRight> right,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TKey, TResult> resultSelector,
IEqualityComparer<TKey> comparator = null,
TLeft defaultLeft = default(TLeft),
TRight defaultRight = default(TRight))
{
if (left == null) throw new ArgumentNullException("left");
if (right == null) throw new ArgumentNullException("right");
if (leftKeySelector == null) throw new ArgumentNullException("leftKeySelector");
if (rightKeySelector == null) throw new ArgumentNullException("rightKeySelector");
if (resultSelector == null) throw new ArgumentNullException("resultSelector");


comparator = comparator ?? EqualityComparer<TKey>.Default;
return FullOuterJoinIterator(left, right, leftKeySelector, rightKeySelector, resultSelector, comparator, defaultLeft, defaultRight);
}


internal static IEnumerable<TResult> FullOuterJoinIterator<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> left,
IEnumerable<TRight> right,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TKey, TResult> resultSelector,
IEqualityComparer<TKey> comparator,
TLeft defaultLeft,
TRight defaultRight)
{
var leftLookup = left.ToLookup(leftKeySelector, comparator);
var rightLookup = right.ToLookup(rightKeySelector, comparator);
var keys = leftLookup.Select(g => g.Key).Union(rightLookup.Select(g => g.Key), comparator);


foreach (var key in keys)
foreach (var leftValue in leftLookup[key].DefaultIfEmpty(defaultLeft))
foreach (var rightValue in rightLookup[key].DefaultIfEmpty(defaultRight))
yield return resultSelector(leftValue, rightValue, key);
}
}

该实现具有以下重要属性:

  • 延迟执行,在枚举输出序列之前不会枚举输入序列。
  • 每个输入序列只枚举一次。
  • 保留输入序列的顺序,在某种意义上,它将以左序列和右序列的顺序生成元组(对于不在左序列中出现的键)。

这些属性很重要,因为它们是那些刚接触FullOuterJoin但有LINQ经验的人所期望的。

我大约在6年前为一个应用程序编写了这个扩展类,并且从那时起一直在许多解决方案中使用它,没有任何问题。希望能有所帮助。

编辑:我注意到有些人可能不知道如何使用扩展类。

要使用此扩展类,只需在类中添加以下行引用其名称空间 使用joinext; < / p >

^这应该允许你在任何你碰巧使用的IEnumerable对象集合上看到扩展函数的智能感知。

希望这能有所帮助。如果仍然不清楚,请告诉我,我希望写一个关于如何使用它的示例。

下面是这个类:

namespace joinext
{
public static class JoinExtensions
{
public static IEnumerable<TResult> FullOuterJoin<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer,
IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector,
Func<TInner, TKey> innerKeySelector,
Func<TOuter, TInner, TResult> resultSelector)
where TInner : class
where TOuter : class
{
var innerLookup = inner.ToLookup(innerKeySelector);
var outerLookup = outer.ToLookup(outerKeySelector);


var innerJoinItems = inner
.Where(innerItem => !outerLookup.Contains(innerKeySelector(innerItem)))
.Select(innerItem => resultSelector(null, innerItem));


return outer
.SelectMany(outerItem =>
{
var innerItems = innerLookup[outerKeySelector(outerItem)];


return innerItems.Any() ? innerItems : new TInner[] { null };
}, resultSelector)
.Concat(innerJoinItems);
}




public static IEnumerable<TResult> LeftJoin<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer,
IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector,
Func<TInner, TKey> innerKeySelector,
Func<TOuter, TInner, TResult> resultSelector)
{
return outer.GroupJoin(
inner,
outerKeySelector,
innerKeySelector,
(o, i) =>
new { o = o, i = i.DefaultIfEmpty() })
.SelectMany(m => m.i.Select(inn =>
resultSelector(m.o, inn)
));


}






public static IEnumerable<TResult> RightJoin<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer,
IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector,
Func<TInner, TKey> innerKeySelector,
Func<TOuter, TInner, TResult> resultSelector)
{
return inner.GroupJoin(
outer,
innerKeySelector,
outerKeySelector,
(i, o) =>
new { i = i, o = o.DefaultIfEmpty() })
.SelectMany(m => m.o.Select(outt =>
resultSelector(outt, m.i)
));


}


}
}

在两个输入上执行内存流枚举,并为每一行调用选择器。如果在当前迭代中没有相关性,则其中一个选择器参数将为空.;

例子:

   var result = left.FullOuterJoin(
right,
x=>left.Key,
x=>right.Key,
(l,r) => new { LeftKey = l?.Key, RightKey=r?.Key });
  • 相关类型需要一个ic比较器,使用比较器。如果没有提供,则默认。

  • 要求'OrderBy'应用于输入枚举对象

    /// <summary>
    /// Performs a full outer join on two <see cref="IEnumerable{T}" />.
    /// </summary>
    /// <typeparam name="TLeft"></typeparam>
    /// <typeparam name="TValue"></typeparam>
    /// <typeparam name="TRight"></typeparam>
    /// <typeparam name="TResult"></typeparam>
    /// <param name="left"></param>
    /// <param name="right"></param>
    /// <param name="leftKeySelector"></param>
    /// <param name="rightKeySelector"></param>
    /// <param name="selector">Expression defining result type</param>
    /// <param name="keyComparer">A comparer if there is no default for the type</param>
    /// <returns></returns>
    [System.Diagnostics.DebuggerStepThrough]
    public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TValue, TResult>(
    this IEnumerable<TLeft> left,
    IEnumerable<TRight> right,
    Func<TLeft, TValue> leftKeySelector,
    Func<TRight, TValue> rightKeySelector,
    Func<TLeft, TRight, TResult> selector,
    IComparer<TValue> keyComparer = null)
    where TLeft: class
    where TRight: class
    where TValue : IComparable
    {
    
    
    keyComparer = keyComparer ?? Comparer<TValue>.Default;
    
    
    using (var enumLeft = left.OrderBy(leftKeySelector).GetEnumerator())
    using (var enumRight = right.OrderBy(rightKeySelector).GetEnumerator())
    {
    
    
    var hasLeft = enumLeft.MoveNext();
    var hasRight = enumRight.MoveNext();
    while (hasLeft || hasRight)
    {
    
    
    var currentLeft = enumLeft.Current;
    var valueLeft = hasLeft ? leftKeySelector(currentLeft) : default(TValue);
    
    
    var currentRight = enumRight.Current;
    var valueRight = hasRight ? rightKeySelector(currentRight) : default(TValue);
    
    
    int compare =
    !hasLeft ? 1
    : !hasRight ? -1
    : keyComparer.Compare(valueLeft, valueRight);
    
    
    switch (compare)
    {
    case 0:
    // The selector matches. An inner join is achieved
    yield return selector(currentLeft, currentRight);
    hasLeft = enumLeft.MoveNext();
    hasRight = enumRight.MoveNext();
    break;
    case -1:
    yield return selector(currentLeft, default(TRight));
    hasLeft = enumLeft.MoveNext();
    break;
    case 1:
    yield return selector(default(TLeft), currentRight);
    hasRight = enumRight.MoveNext();
    break;
    }
    }
    
    
    }
    
    
    }
    

我认为其中大部分都存在问题,包括公认的答案,因为它们与Linq相比IQueryable工作不太好,要么是由于做了太多的服务器往返和太多的数据返回,要么是做了太多的客户端执行。

对于IEnumerable,我不喜欢Sehe的答案或类似的答案,因为它占用了过多的内存(一个简单的10000000两个列表测试在我的32GB机器上运行Linqpad内存)。

另外,其他大多数方法实际上并没有实现正确的Full Outer Join,因为它们使用的是带有右连接的Union,而不是带有右反半连接的Concat,这不仅消除了结果中重复的内部连接行,而且消除了最初存在于左或右数据中的任何正确的副本。

所以这里是我的扩展,处理所有这些问题,生成SQL以及直接实现LINQ到SQL的连接,在服务器上执行,比其他枚举对象更快,内存更少:

public static class Ext {
public static IEnumerable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) {


return from left in leftItems
join right in rightItems on leftKeySelector(left) equals rightKeySelector(right) into temp
from right in temp.DefaultIfEmpty()
select resultSelector(left, right);
}


public static IEnumerable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) {


return from right in rightItems
join left in leftItems on rightKeySelector(right) equals leftKeySelector(left) into temp
from left in temp.DefaultIfEmpty()
select resultSelector(left, right);
}


public static IEnumerable<TResult> FullOuterJoinDistinct<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) {


return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Union(leftItems.RightOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}


public static IEnumerable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) {


var hashLK = new HashSet<TKey>(from l in leftItems select leftKeySelector(l));
return rightItems.Where(r => !hashLK.Contains(rightKeySelector(r))).Select(r => resultSelector(default(TLeft),r));
}


public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector)  where TLeft : class {


return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}


private static Expression<Func<TP, TC, TResult>> CastSMBody<TP, TC, TResult>(LambdaExpression ex, TP unusedP, TC unusedC, TResult unusedRes) => (Expression<Func<TP, TC, TResult>>)ex;


public static IQueryable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {


var sampleAnonLR = new { left = default(TLeft), rightg = default(IEnumerable<TRight>) };
var parmP = Expression.Parameter(sampleAnonLR.GetType(), "p");
var parmC = Expression.Parameter(typeof(TRight), "c");
var argLeft = Expression.PropertyOrField(parmP, "left");
var newleftrs = CastSMBody(Expression.Lambda(Expression.Invoke(resultSelector, argLeft, parmC), parmP, parmC), sampleAnonLR, default(TRight), default(TResult));


return leftItems.AsQueryable().GroupJoin(rightItems, leftKeySelector, rightKeySelector, (left, rightg) => new { left, rightg }).SelectMany(r => r.rightg.DefaultIfEmpty(), newleftrs);
}


public static IQueryable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {


var sampleAnonLR = new { leftg = default(IEnumerable<TLeft>), right = default(TRight) };
var parmP = Expression.Parameter(sampleAnonLR.GetType(), "p");
var parmC = Expression.Parameter(typeof(TLeft), "c");
var argRight = Expression.PropertyOrField(parmP, "right");
var newrightrs = CastSMBody(Expression.Lambda(Expression.Invoke(resultSelector, parmC, argRight), parmP, parmC), sampleAnonLR, default(TLeft), default(TResult));


return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).SelectMany(l => l.leftg.DefaultIfEmpty(), newrightrs);
}


public static IQueryable<TResult> FullOuterJoinDistinct<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {


return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Union(leftItems.RightOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}


private static Expression<Func<TP, TResult>> CastSBody<TP, TResult>(LambdaExpression ex, TP unusedP, TResult unusedRes) => (Expression<Func<TP, TResult>>)ex;


public static IQueryable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {


var sampleAnonLgR = new { leftg = default(IEnumerable<TLeft>), right = default(TRight) };
var parmLgR = Expression.Parameter(sampleAnonLgR.GetType(), "lgr");
var argLeft = Expression.Constant(default(TLeft), typeof(TLeft));
var argRight = Expression.PropertyOrField(parmLgR, "right");
var newrightrs = CastSBody(Expression.Lambda(Expression.Invoke(resultSelector, argLeft, argRight), parmLgR), sampleAnonLgR, default(TResult));


return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).Where(lgr => !lgr.leftg.Any()).Select(newrightrs);
}


public static IQueryable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {


return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}
}

右反半连接之间的区别在Linq to Objects或源代码中大多是没有意义的,但在最终的答案中,在服务器(SQL)端产生了差异,删除了不必要的JOIN

手工编码Expression来处理将Expression<Func<>>合并为lambda可以通过LinqKit进行改进,但如果语言/编译器为此添加了一些帮助就更好了。为了完整起见,包含了FullOuterJoinDistinctRightOuterJoin函数,但我还没有重新实现FullOuterGroupJoin

我为IEnumerable编写了一个完整的外部连接的另一个版本,用于键是可排序的情况,这比组合左外部连接和右反半连接快大约50%,至少在小型集合上是这样。它只对每个集合进行一次排序。

我还为一个使用EF的版本添加了另一个答案,用自定义展开替换了Invoke

我决定把这个作为一个单独的答案,因为我不确定它是否经过了足够的测试。这是FullOuterJoin方法的重新实现,本质上是使用ExpressionLINQKit Invoke/Expand的简化自定义版本,以便它可以工作在实体框架中。没有太多的解释,因为这和我之前的答案几乎一样。

public static class Ext {
private static Expression<Func<TP, TC, TResult>> CastSMBody<TP, TC, TResult>(LambdaExpression ex, TP unusedP, TC unusedC, TResult unusedRes) => (Expression<Func<TP, TC, TResult>>)ex;


public static IQueryable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {


// (lrg,r) => resultSelector(lrg.left, r)
var sampleAnonLR = new { left = default(TLeft), rightg = default(IEnumerable<TRight>) };
var parmP = Expression.Parameter(sampleAnonLR.GetType(), "lrg");
var parmC = Expression.Parameter(typeof(TRight), "r");
var argLeft = Expression.PropertyOrField(parmP, "left");
var newleftrs = CastSMBody(Expression.Lambda(resultSelector.Apply(argLeft, parmC), parmP, parmC), sampleAnonLR, default(TRight), default(TResult));


return leftItems.GroupJoin(rightItems, leftKeySelector, rightKeySelector, (left, rightg) => new { left, rightg }).SelectMany(r => r.rightg.DefaultIfEmpty(), newleftrs);
}


public static IQueryable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {


// (lgr,l) => resultSelector(l, lgr.right)
var sampleAnonLR = new { leftg = default(IEnumerable<TLeft>), right = default(TRight) };
var parmP = Expression.Parameter(sampleAnonLR.GetType(), "lgr");
var parmC = Expression.Parameter(typeof(TLeft), "l");
var argRight = Expression.PropertyOrField(parmP, "right");
var newrightrs = CastSMBody(Expression.Lambda(resultSelector.Apply(parmC, argRight), parmP, parmC), sampleAnonLR, default(TLeft), default(TResult));


return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right })
.SelectMany(l => l.leftg.DefaultIfEmpty(), newrightrs);
}


private static Expression<Func<TParm, TResult>> CastSBody<TParm, TResult>(LambdaExpression ex, TParm unusedP, TResult unusedRes) => (Expression<Func<TParm, TResult>>)ex;


public static IQueryable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {


// newrightrs = lgr => resultSelector(default(TLeft), lgr.right)
var sampleAnonLgR = new { leftg = (IEnumerable<TLeft>)null, right = default(TRight) };
var parmLgR = Expression.Parameter(sampleAnonLgR.GetType(), "lgr");
var argLeft = Expression.Constant(default(TLeft), typeof(TLeft));
var argRight = Expression.PropertyOrField(parmLgR, "right");
var newrightrs = CastSBody(Expression.Lambda(resultSelector.Apply(argLeft, argRight), parmLgR), sampleAnonLgR, default(TResult));


return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).Where(lgr => !lgr.leftg.Any()).Select(newrightrs);
}


public static IQueryable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector)  where TLeft : class where TRight : class where TResult : class {


return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}


public static Expression Apply(this LambdaExpression e, params Expression[] args) {
var b = e.Body;


foreach (var pa in e.Parameters.Cast<ParameterExpression>().Zip(args, (p, a) => (p, a))) {
b = b.Replace(pa.p, pa.a);
}


return b.PropagateNull();
}


public static Expression Replace(this Expression orig, Expression from, Expression to) => new ReplaceVisitor(from, to).Visit(orig);
public class ReplaceVisitor : System.Linq.Expressions.ExpressionVisitor {
public readonly Expression from;
public readonly Expression to;


public ReplaceVisitor(Expression _from, Expression _to) {
from = _from;
to = _to;
}


public override Expression Visit(Expression node) => node == from ? to : base.Visit(node);
}


public static Expression PropagateNull(this Expression orig) => new NullVisitor().Visit(orig);
public class NullVisitor : System.Linq.Expressions.ExpressionVisitor {
public override Expression Visit(Expression node) {
if (node is MemberExpression nme && nme.Expression is ConstantExpression nce && nce.Value == null)
return Expression.Constant(null, nce.Type.GetMember(nme.Member.Name).Single().GetMemberType());
else
return base.Visit(node);
}
}


public static Type GetMemberType(this MemberInfo member) {
switch (member) {
case FieldInfo mfi:
return mfi.FieldType;
case PropertyInfo mpi:
return mpi.PropertyType;
case EventInfo mei:
return mei.EventHandlerType;
default:
throw new ArgumentException("MemberInfo must be if type FieldInfo, PropertyInfo or EventInfo", nameof(member));
}
}
}

两个或多个表的完全外部连接: 首先提取你想要加入的列

var DatesA = from A in db.T1 select A.Date;
var DatesB = from B in db.T2 select B.Date;
var DatesC = from C in db.T3 select C.Date;


var Dates = DatesA.Union(DatesB).Union(DatesC);

然后在提取的列和主表之间使用左外连接。

var Full_Outer_Join =


(from A in Dates
join B in db.T1
on A equals B.Date into AB


from ab in AB.DefaultIfEmpty()
join C in db.T2
on A equals C.Date into ABC


from abc in ABC.DefaultIfEmpty()
join D in db.T3
on A equals D.Date into ABCD


from abcd in ABCD.DefaultIfEmpty()
select new { A, ab, abc, abcd })
.AsEnumerable();

对于键在两个枚举对象中都是唯一的情况,我的干净解决方案:

 private static IEnumerable<TResult> FullOuterJoin<Ta, Tb, TKey, TResult>(
IEnumerable<Ta> a, IEnumerable<Tb> b,
Func<Ta, TKey> key_a, Func<Tb, TKey> key_b,
Func<Ta, Tb, TResult> selector)
{
var alookup = a.ToLookup(key_a);
var blookup = b.ToLookup(key_b);
var keys = new HashSet<TKey>(alookup.Select(p => p.Key));
keys.UnionWith(blookup.Select(p => p.Key));
return keys.Select(key => selector(alookup[key].FirstOrDefault(), blookup[key].FirstOrDefault()));
}

所以

    var ax = new[] {
new { id = 1, first_name = "ali" },
new { id = 2, first_name = "mohammad" } };
var bx = new[] {
new { id = 1, last_name = "rezaei" },
new { id = 3, last_name = "kazemi" } };


var list = FullOuterJoin(ax, bx, a => a.id, b => b.id, (a, b) => "f: " + a?.first_name + " l: " + b?.last_name).ToArray();

输出:

f: ali l: rezaei
f: mohammad l:
f:  l: kazemi

我认为LINQ join子句并不是这个问题的正确解决方案,因为join子句的目的并不是按照这个任务解决方案所需的方式来积累数据。合并创建的独立集合的代码变得太复杂了,也许这对于学习目的来说是可以的,但对于真正的应用程序来说不是。解决这个问题的方法之一是下面的代码:

class Program
{
static void Main(string[] args)
{
List<FirstName> firstNames = new List<FirstName>();
firstNames.Add(new FirstName { ID = 1, Name = "John" });
firstNames.Add(new FirstName { ID = 2, Name = "Sue" });


List<LastName> lastNames = new List<LastName>();
lastNames.Add(new LastName { ID = 1, Name = "Doe" });
lastNames.Add(new LastName { ID = 3, Name = "Smith" });


HashSet<int> ids = new HashSet<int>();
foreach (var name in firstNames)
{
ids.Add(name.ID);
}
foreach (var name in lastNames)
{
ids.Add(name.ID);
}
List<FullName> fullNames = new List<FullName>();
foreach (int id in ids)
{
FullName fullName = new FullName();
fullName.ID = id;
FirstName firstName = firstNames.Find(f => f.ID == id);
fullName.FirstName = firstName != null ? firstName.Name : string.Empty;
LastName lastName = lastNames.Find(l => l.ID == id);
fullName.LastName = lastName != null ? lastName.Name : string.Empty;
fullNames.Add(fullName);
}
}
}
public class FirstName
{
public int ID;


public string Name;
}


public class LastName
{
public int ID;


public string Name;
}
class FullName
{
public int ID;


public string FirstName;


public string LastName;
}

如果真正的集合对于HashSet的形成很大,而不是foreach循环可以使用下面的代码:

List<int> firstIds = firstNames.Select(f => f.ID).ToList();
List<int> LastIds = lastNames.Select(l => l.ID).ToList();
HashSet<int> ids = new HashSet<int>(firstIds.Union(LastIds));//Only unique IDs will be included in HashSet

谢谢大家的有趣帖子!

我修改了代码,因为在我的情况下我需要

  • 一个个性化连接谓词
  • 一个个性化结合鲜明的比较者

这是我修改过的代码(不好意思,用VB写的)

    Module MyExtensions
<Extension()>
Friend Function FullOuterJoin(Of TA, TB, TResult)(ByVal a As IEnumerable(Of TA), ByVal b As IEnumerable(Of TB), ByVal joinPredicate As Func(Of TA, TB, Boolean), ByVal projection As Func(Of TA, TB, TResult), ByVal comparer As IEqualityComparer(Of TResult)) As IEnumerable(Of TResult)
Dim joinL =
From xa In a
From xb In b.Where(Function(x) joinPredicate(xa, x)).DefaultIfEmpty()
Select projection(xa, xb)
Dim joinR =
From xb In b
From xa In a.Where(Function(x) joinPredicate(x, xb)).DefaultIfEmpty()
Select projection(xa, xb)
Return joinL.Union(joinR, comparer)
End Function
End Module


Dim fullOuterJoin = lefts.FullOuterJoin(
rights,
Function(left, right) left.Code = right.Code And (left.Amount [...] Or left.Description.Contains [...]),
Function(left, right) New CompareResult(left, right),
New MyEqualityComparer
)


Public Class MyEqualityComparer
Implements IEqualityComparer(Of CompareResult)


Private Function GetMsg(obj As CompareResult) As String
Dim msg As String = ""
msg &= obj.Code & "_"
[...]
Return msg
End Function


Public Overloads Function Equals(x As CompareResult, y As CompareResult) As Boolean Implements IEqualityComparer(Of CompareResult).Equals
Return Me.GetMsg(x) = Me.GetMsg(y)
End Function


Public Overloads Function GetHashCode(obj As CompareResult) As Integer Implements IEqualityComparer(Of CompareResult).GetHashCode
Return Me.GetMsg(obj).GetHashCode
End Function
End Class

这是另一个完整的外部连接

由于对其他命题的简单性和可读性不太满意,我最后得出了这样的结论:

它没有快速的自命(在2020m CPU上加入1000 * 1000大约800毫秒:2.4ghz / 2核)。对我来说,它只是一个紧凑而随意的完全外部连接。

它的工作原理与SQL FULL OUTER JOIN相同(重复保存)

欢呼;-)

using System;
using System.Collections.Generic;
using System.Linq;
namespace NS
{
public static class DataReunion
{
public static List<Tuple<T1, T2>> FullJoin<T1, T2, TKey>(List<T1> List1, Func<T1, TKey> KeyFunc1, List<T2> List2, Func<T2, TKey> KeyFunc2)
{
List<Tuple<T1, T2>> result = new List<Tuple<T1, T2>>();


Tuple<TKey, T1>[] identifiedList1 = List1.Select(_ => Tuple.Create(KeyFunc1(_), _)).OrderBy(_ => _.Item1).ToArray();
Tuple<TKey, T2>[] identifiedList2 = List2.Select(_ => Tuple.Create(KeyFunc2(_), _)).OrderBy(_ => _.Item1).ToArray();


identifiedList1.Where(_ => !identifiedList2.Select(__ => __.Item1).Contains(_.Item1)).ToList().ForEach(_ => {
result.Add(Tuple.Create<T1, T2>(_.Item2, default(T2)));
});


result.AddRange(
identifiedList1.Join(identifiedList2, left => left.Item1, right => right.Item1, (left, right) => Tuple.Create<T1, T2>(left.Item2, right.Item2)).ToList()
);


identifiedList2.Where(_ => !identifiedList1.Select(__ => __.Item1).Contains(_.Item1)).ToList().ForEach(_ => {
result.Add(Tuple.Create<T1, T2>(default(T1), _.Item2));
});


return result;
}
}
}

这个想法是

  1. 基于提供的关键函数生成器构建id
  2. 处理仅剩下的项
  3. 流程内部连接
  4. 只处理正确的项目

下面是一个与之相关的简单测试:

在结束处放置断点,以手动验证它的行为是否符合预期

using System;
using System.Collections.Generic;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using NS;


namespace Tests
{
[TestClass]
public class DataReunionTest
{
[TestMethod]
public void Test()
{
List<Tuple<Int32, Int32, String>> A = new List<Tuple<Int32, Int32, String>>();
List<Tuple<Int32, Int32, String>> B = new List<Tuple<Int32, Int32, String>>();


Random rnd = new Random();


/* Comment the testing block you do not want to run
/* Solution to test a wide range of keys*/


for (int i = 0; i < 500; i += 1)
{
A.Add(Tuple.Create(rnd.Next(1, 101), rnd.Next(1, 101), "A"));
B.Add(Tuple.Create(rnd.Next(1, 101), rnd.Next(1, 101), "B"));
}


/* Solution for essential testing*/


A.Add(Tuple.Create(1, 2, "B11"));
A.Add(Tuple.Create(1, 2, "B12"));
A.Add(Tuple.Create(1, 3, "C11"));
A.Add(Tuple.Create(1, 3, "C12"));
A.Add(Tuple.Create(1, 3, "C13"));
A.Add(Tuple.Create(1, 4, "D1"));


B.Add(Tuple.Create(1, 1, "A21"));
B.Add(Tuple.Create(1, 1, "A22"));
B.Add(Tuple.Create(1, 1, "A23"));
B.Add(Tuple.Create(1, 2, "B21"));
B.Add(Tuple.Create(1, 2, "B22"));
B.Add(Tuple.Create(1, 2, "B23"));
B.Add(Tuple.Create(1, 3, "C2"));
B.Add(Tuple.Create(1, 5, "E2"));


Func<Tuple<Int32, Int32, String>, Tuple<Int32, Int32>> key = (_) => Tuple.Create(_.Item1, _.Item2);


var watch = System.Diagnostics.Stopwatch.StartNew();
var res = DataReunion.FullJoin(A, key, B, key);
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
String aser = JToken.FromObject(res).ToString(Formatting.Indented);
Console.Write(elapsedMs);
}
}