在MySQL中查找重复记录

我想在MySQL数据库中提取重复记录。这可以通过以下方式完成:

SELECT address, count(id) as cnt FROM list
GROUP BY address HAVING cnt > 1

其结果是:

100 MAIN ST    2

我想拉取它,以便它显示每一行是重复的。类似于:

JIM    JONES    100 MAIN ST
JOHN   SMITH    100 MAIN ST

有什么想法可以做到这一点吗?我试图避免做第一个,然后在代码中使用第二个查询查找重复项。

893172 次浏览

不会很有效,但它应该工作:

SELECT *
FROM list AS outer
WHERE (SELECT COUNT(*)
FROM list AS inner
WHERE inner.address = outer.address) > 1;

关键是重写此查询,以便它可以用作子查询。

SELECT firstname,
lastname,
list.address
FROM list
INNER JOIN (SELECT address
FROM   list
GROUP  BY address
HAVING COUNT(id) > 1) dup
ON list.address = dup.address;

这将在一个表传递中选择重复项,没有子查询。

SELECT  *
FROM    (
SELECT  ao.*, (@r := @r + 1) AS rn
FROM    (
SELECT  @_address := 'N'
) vars,
(
SELECT  *
FROM
list a
ORDER BY
address, id
) ao
WHERE   CASE WHEN @_address <> address THEN @r := 0 ELSE 0 END IS NOT NULL
AND (@_address := address ) IS NOT NULL
) aoo
WHERE   rn > 1

此查询实际模拟OracleSQL Server中存在的ROW_NUMBER()

有关详细信息,请参阅我博客中的文章:

    SELECT *
FROM (SELECT  address, COUNT(id) AS cnt
FROM list
GROUP BY address
HAVING ( COUNT(id) > 1 ))

为什么不只是INNER JOIN表本身?

SELECT a.firstname, a.lastname, a.address
FROM list a
INNER JOIN list b ON a.address = b.address
WHERE a.id <> b.id

如果地址可以存在两次以上,则需要DISTINCT

 SELECT firstname, lastname, address FROM list
WHERE
Address in
(SELECT address FROM list
GROUP BY address
HAVING count(*) > 1)
select `cityname` from `codcities` group by `cityname` having count(*)>=2

这是您要求的类似查询,其200%的工作和简单。 好好享受!!!

查找重复的地址比看起来要复杂得多,特别是如果您需要准确性。在这种情况下,MySQL查询是不够的……

我在智能街道工作,我们在那里解决验证和重复删除等问题,我看到了很多类似问题的不同挑战。

有几个第三方服务会为您标记列表中的重复项。仅使用MySQL子查询执行此操作不会考虑地址格式和标准的差异。USPS(用于美国地址)有一定的指导方针来制定这些标准,但只有少数供应商获得执行此类操作的认证。

因此,我建议您最好的解决方案是将表导出到CSV文件中,例如,并将其提交给有能力的列表处理器。其中之一是直播地址,它将在几秒钟到几分钟内自动为您完成。它将使用名为“Dupl的”的新字段标记重复行,其中的值为Y

使用此查询通过电子邮件地址查找重复用户…

SELECT users.name, users.uid, users.mail, from_unixtime(created)
FROM users
INNER JOIN (
SELECT mail
FROM users
GROUP BY mail
HAVING count(mail) > 1
) dupes ON users.mail = dupes.mail
ORDER BY users.mail;
SELECT date FROM logs group by date having count(*) >= 2

另一种解决方案是使用表别名,如下所示:

SELECT p1.id, p2.id, p1.address
FROM list AS p1, list AS p2
WHERE p1.address = p2.address
AND p1.id != p2.id

在这种情况下,您真正要做的就是获取原始的列表表,创建两个p retend表——p1p2——然后对地址列执行连接(第3行)。第4行确保同一记录不会在您的结果集中多次出现(“重复重复”)。

select * from table_name t1 inner join (select distinct <attribute list> from table_name as temp)t2 where t1.attribute_name = t2.attribute_name

对于你的桌子来说,它就像

select * from list l1 inner join (select distinct address from list as list2)l2 where l1.address=l2.address

此查询将为您提供列表表中所有不同的地址条目……如果您有任何名称等主键值,我不确定这将如何工作。

最快的重复删除查询过程:

/* create temp table with one primary column id */
INSERT INTO temp(id) SELECT MIN(id) FROM list GROUP BY (isbn) HAVING COUNT(*)>1;
DELETE FROM list WHERE id IN (SELECT id FROM temp);
DELETE FROM temp;
SELECT t.*,(select count(*) from city as tt where tt.name=t.name) as count FROM `city` as t where (select count(*) from city as tt where tt.name=t.name) > 1 order by count desc

城市替换为您的表。 将姓名替换为您的字段名

我们可以发现重复项也依赖于多个字段。对于这些情况,您可以使用以下格式。

SELECT COUNT(*), column1, column2
FROM tablename
GROUP BY column1, column2
HAVING COUNT(*)>1;

就个人而言,这个查询解决了我的问题:

SELECT `SUB_ID`, COUNT(SRV_KW_ID) as subscriptions FROM `SUB_SUBSCR` group by SUB_ID, SRV_KW_ID HAVING subscriptions > 1;

此脚本的作用是在表中显示存在多次的所有订阅者ID以及找到的重复项数。

这是表列:

| SUB_SUBSCR_ID | int(11)     | NO   | PRI | NULL    | auto_increment |
| MSI_ALIAS     | varchar(64) | YES  | UNI | NULL    |                |
| SUB_ID        | int(11)     | NO   | MUL | NULL    |                |
| SRV_KW_ID     | int(11)     | NO   | MUL | NULL    |                |

希望对你也有帮助!

我尝试了为这个问题选择的最佳答案,但它让我有点困惑。实际上,我只需要在我的表中的一个字段上使用它。此链接中的以下示例对我来说效果很好:

SELECT COUNT(*) c,title FROM `data` GROUP BY title HAVING c > 1;

select address from list where address = any (select address from (select address, count(id) cnt from list group by address having cnt > 1 ) as t1) order by address

内部子查询返回具有重复地址的行,然后 外部子查询返回具有重复项的地址的地址列。 外部子查询必须仅返回一列,因为它用作操作符'=any'的操作数

霸王的回答确实是最好的,我建议再做一个更改:使用LIMIT确保数据库不会过载:

SELECT firstname, lastname, list.address FROM list
INNER JOIN (SELECT address FROM list
GROUP BY address HAVING count(id) > 1) dup ON list.address = dup.address
LIMIT 10

如果没有WHERE并且在进行连接时使用LIMIT是一个好习惯。从小值开始,检查查询有多重,然后增加限制。

这也将显示您有多少重复,并将在没有连接的情况下对结果进行排序

SELECT  `Language` , id, COUNT( id ) AS how_many
FROM  `languages`
GROUP BY  `Language`
HAVING how_many >=2
ORDER BY how_many DESC
    Find duplicate Records:


Suppose we have table : Student
student_id int
student_name varchar
Records:
+------------+---------------------+
| student_id | student_name        |
+------------+---------------------+
|        101 | usman               |
|        101 | usman               |
|        101 | usman               |
|        102 | usmanyaqoob         |
|        103 | muhammadusmanyaqoob |
|        103 | muhammadusmanyaqoob |
+------------+---------------------+


Now we want to see duplicate records
Use this query:




select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;


+--------------------+------------+---+
| student_name        | student_id | c |
+---------------------+------------+---+
| usman               |        101 | 3 |
| muhammadusmanyaqoob |        103 | 2 |
+---------------------+------------+---+

这不是更容易:

SELECT *
FROM tc_tariff_groups
GROUP BY group_id
HAVING COUNT(group_id) >1

?

要快速查看重复的行,您可以运行一个简单的查询

在这里,我查询表并列出具有相同user_id,market_place和sku的所有重复行:

select user_id, market_place,sku, count(id)as totals from sku_analytics group by user_id, market_place,sku having count(id)>1;

要删除重复的行,您必须决定要删除的行。例如具有较低id(通常较旧)或其他日期信息的行。在我的情况下,我只想删除较低的id,因为较新的id是最新的信息。

首先仔细检查是否正确的记录将被删除。这里我选择将被删除的重复记录(通过唯一ID)。

select a.user_id, a.market_place,a.sku from sku_analytics a inner join sku_analytics b where a.id< b.id and a.user_id= b.user_id and a.market_place= b.market_place and a.sku = b.sku;

然后我运行删除查询来删除这些傻瓜:

delete a from sku_analytics a inner join sku_analytics b where a.id< b.id and a.user_id= b.user_id and a.market_place= b.market_place and a.sku = b.sku;

备份,双重检查,验证,验证备份,然后执行。

我使用以下内容:

SELECT * FROM mytable
WHERE id IN (
SELECT id FROM mytable
GROUP BY column1, column2, column3
HAVING count(*) > 1
)

选择*frombookings WHERE DATE(created_at)='2022-01-11' code IN( 从bookings中选择code GROUP BYcode 计数(code)>1 )ORDER BYid DESC

当你有不止一个重复结果和/或当你有不止一个列要检查重复时,这里的大多数答案都无法处理这种情况。当你在这种情况下,你可以使用此查询来获取所有重复ID:

SELECT address, email, COUNT(*) AS QUANTITY_DUPLICATES, GROUP_CONCAT(id) AS ID_DUPLICATES
FROM list
GROUP BY address, email
HAVING COUNT(*)>1;

第一个查询截图示例

如果你想将每个结果列出为一行,你需要一个更复杂的查询。这是我发现有效的:

CREATE TEMPORARY TABLE IF NOT EXISTS temptable AS (
SELECT GROUP_CONCAT(id) AS ID_DUPLICATES
FROM list
GROUP BY address, email
HAVING COUNT(*)>1
);
SELECT d.*
FROM list AS d, temptable AS t
WHERE FIND_IN_SET(d.id, t.ID_DUPLICATES)
ORDER BY d.id;

第二个查询截图示例

SELECT id, count(*) as c
FROM 'list'
GROUP BY id HAVING c > 1

这将返回你的ID与重复的次数,或什么都没有,在这种情况下,你不会有重复的id。

将组中的id更改为(例如:地址),它将返回由第一个找到的具有该地址的id标识的地址重复的次数。

SELECT id, count(*) as c
FROM 'list'
GROUP BY address HAVING c > 1

我希望它能帮助。享受;)

会和这样的东西一起去:

SELECT  t1.firstname t1.lastname t1.address FROM list  t1
INNER JOIN  list t2
WHERE
t1.id < t2.id AND
t1.address = t2.address;