在MongoDB中查找重复的记录

小开

你可以使用下面的aggregate管道找到duplicate名称的list:

Group所有具有类似name的记录。
Match那些有大于1记录的groups。
然后group再次以project所有重复的名称作为array。

代码:

db.collection.aggregate([
{$group:{"_id":"$name","name":{$first:"$name"},"count":{$sum:1}}},
{$match:{"count":{$gt:1}}},
{$project:{"name":1,"_id":0}},
{$group:{"_id":null,"duplicateNames":{$push:"$name"}}},
{$project:{"_id":0,"duplicateNames":1}}
])

o / p:

{ "duplicateNames" : [ "ksqn291", "ksqn29123213Test" ] }

小开

最佳答案

在name上使用聚合，并通过count > 1得到name:

db.collection.aggregate([
{"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },
{"$project": {"name" : "$_id", "_id" : 0} }
]);

按重复数从多到少对结果进行排序:

db.collection.aggregate([
{"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },
{"$sort": {"count" : -1} },
{"$project": {"name" : "$_id", "_id" : 0} }
]);

要与“;name”以外的列名一起使用，请更改“;美元的名字"“;column_name美元"

小开

db.getCollection('orders').aggregate([
{$group: {
_id: {name: "$name"},
uniqueIds: {$addToSet: "$_id"},
count: {$sum: 1}
}
},
{$match: {
count: {"$gt": 1}
}
}
])

第一组根据字段查询分组。

然后我们检查唯一Id并对其计数，如果count大于1，则该字段在整个集合中是重复的，因此将由$match query处理。

小开

如果您有一个大型数据库，并且属性名只出现在一些文档中，anhic给出的答案可能非常低效。

为了提高效率，您可以向聚合添加$匹配。

db.collection.aggregate(
{"$match": {"name" :{ "$ne" : null } } },
{"$group" : {"_id": "$name", "count": { "$sum": 1 } } },
{"$match": {"count" : {"$gt": 1} } },
{"$project": {"name" : "$_id", "_id" : 0} }
)

小开

如果有人正在查找带有额外"$和"where子句，如“;someOtherField为真”;

诀窍是从另一个$match开始，因为分组之后就不再有所有可用的数据了

// Do a first match before the grouping
{ $match: { "someOtherField": true }},
{ $group: {
_id: { name: "$name" },
count: { $sum: 1 }
}},
{ $match: { count: { $gte: 2 } }},

我找了很长时间才找到这个符号，希望我能帮助有同样问题的人

小开

如果你需要查看所有复制的行:

db.collection.aggregate([
{"$group" : { "_id": "$name", "count": { "$sum": 1 },"data": { "$push": "$$ROOT" }}},
{"$unwind": "$data"}
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },
]);

小开

这就是我们如何在mongoDB compass中实现这一点

小开

另一个选项是使用$sortByCount stage。

db.collection.aggregate([
{ $sortByCount: '$name' }
]

这是$group &$sort。

$sortByCount阶段等价于以下$group + $sort序列:
    { $group: { _id: <expression>, count: { $sum: 1 } } },
{ $sort: { count: -1 } }

小开

更新======工作每一次!

db.users.aggregate([
// Group by the key and compute the number of documents that match the key
{
$group: {
_id: "$username",  // or if you want to use multiple fields _id: { a: "$FirstName", b: "$LastName" }
count: { $sum: 1 }
}
},
// Filter group having more than 1 item, which means that at least 2 documents have the same key
{
$match: {
count: { $gt: 1 }
}
}
])

＝＝＝＝＝＝＝＝＝＝

这种聚合对我也有用……

db.collection.aggregate([
{"$group" : { "_id": "$username", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },
{"$project": {"username" : "$_id", "_id" : 0} }
]);

你也可以尝试$sortByCount

db.collection.aggregate([
{ $sortByCount: '$username' }
]

小开

使用$sortByCount
搜索Compass Mongo db中的重复项【截图】:https://i.stack.imgur.com/L85QV.png < / p >

小开

例如，当您想创建一个不区分大小写的索引时，有时您希望无论大小写都能找到重复项。在这种情况下，你可以使用这个aggregation管道

db.collection.aggregate([
{'$group': {'_id': {'$toLower': '$name'}, 'count': { '$sum': 1 }, 'duplicates': { '$push': '$$ROOT' } } },
{'$match': { 'count': { '$gt': 1 } }
]);

解释:

group由name，但首先将大小写改为小写，并将docs推入duplicates数组。
match那些记录大于1的组(重复项)。