在SQL表中查找重复值

很容易找到一个字段的重复项:

SELECT email, COUNT(email)FROM usersGROUP BY emailHAVING COUNT(email) > 1

所以如果我们有桌子

ID   NAME   EMAIL1    John   asd@asd.com2    Sam    asd@asd.com3    Tom    asd@asd.com4    Bob    bob@asd.com5    Tom    asd@asd.com

此查询将为我们提供John、Sam、Tom、Tom,因为它们都具有相同的email

但是,我想要的是获得具有相同emailname的副本。

也就是说,我想得到“汤姆”,“汤姆”。

我需要这个的原因:我犯了一个错误,允许插入重复的nameemail值。现在我需要删除/更改重复项,所以我需要先找到它们。

3441389 次浏览
SELECTname, email, COUNT(*)FROMusersGROUP BYname, emailHAVINGCOUNT(*) > 1

简单地在两个列上分组。

注意:较旧的ANSI标准是在GROUP BY中包含所有非聚合列,但随着“功能依赖”的想法而改变:

在关系数据库理论中,函数依赖是数据库关系中两组属性之间的约束。换句话说,函数依赖是描述关系中属性之间关系的约束。

支持不一致:

试试这个:

SELECT name, emailFROM usersGROUP BY name, emailHAVING ( COUNT(*) > 1 )

试试这个:

declare @YourTable table (id int, name varchar(10), email varchar(50))
INSERT @YourTable VALUES (1,'John','John-email')INSERT @YourTable VALUES (2,'John','John-email')INSERT @YourTable VALUES (3,'fred','John-email')INSERT @YourTable VALUES (4,'fred','fred-email')INSERT @YourTable VALUES (5,'sam','sam-email')INSERT @YourTable VALUES (6,'sam','sam-email')
SELECTname,email, COUNT(*) AS CountOfFROM @YourTableGROUP BY name,emailHAVING COUNT(*)>1

输出:

name       email       CountOf---------- ----------- -----------John       John-email  2sam        sam-email   2
(2 row(s) affected)

如果你想要dups的ID,使用这个:

SELECTy.id,y.name,y.emailFROM @YourTable yINNER JOIN (SELECTname,email, COUNT(*) AS CountOfFROM @YourTableGROUP BY name,emailHAVING COUNT(*)>1) dt ON y.name=dt.name AND y.email=dt.email

输出:

id          name       email----------- ---------- ------------1           John       John-email2           John       John-email5           sam        sam-email6           sam        sam-email
(4 row(s) affected)

要删除重复项,请尝试:

DELETE dFROM @YourTable dINNER JOIN (SELECTy.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRankFROM @YourTable yINNER JOIN (SELECTname,email, COUNT(*) AS CountOfFROM @YourTableGROUP BY name,emailHAVING COUNT(*)>1) dt ON y.name=dt.name AND y.email=dt.email) dt2 ON d.id=dt2.idWHERE dt2.RowRank!=1SELECT * FROM @YourTable

输出:

id          name       email----------- ---------- --------------1           John       John-email3           fred       John-email4           fred       fred-email5           sam        sam-email
(4 row(s) affected)

与其他答案相反,您可以查看包含所有列的整个记录(如果有)。在row_number函数的PARTITION BY部分中,选择所需的唯一/双列。

SELECT  *FROM    (SELECT a.*,      Row_Number() OVER (PARTITION BY Name, Age ORDER BY Name) AS rFROM   Customers AS a)       AS bWHERE   r > 1;

当你想选择所有重复记录所有字段,你可以这样写

CREATE TABLE test (id      bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,       c1      integer,       c2      text,       d       date DEFAULT now(),       v       text);
INSERT INTO test (c1, c2, v) VALUES(1, 'a', 'Select'),(1, 'a', 'ALL'),(1, 'a', 'multiple'),(1, 'a', 'records'),(2, 'b', 'in columns'),(2, 'b', 'c1 and c2'),(3, 'c', '.');SELECT * FROM test ORDER BY 1;
SELECT  *FROM    testWHERE   (c1, c2) IN (SELECT c1, c2FROM   testGROUP  BY 1,2HAVING count(*) > 1)ORDER   BY 1;

测试在PostgreSQL

如果您使用Oracle,这种方式更可取:

create table my_users(id number, name varchar2(100), email varchar2(100));
insert into my_users values (1, 'John', 'asd@asd.com');insert into my_users values (2, 'Sam', 'asd@asd.com');insert into my_users values (3, 'Tom', 'asd@asd.com');insert into my_users values (4, 'Bob', 'bob@asd.com');insert into my_users values (5, 'Tom', 'asd@asd.com');
commit;
select *from my_userswhere rowid not in (select min(rowid) from my_users group by name, email);

如果您想查看表中是否有重复行,我在下面使用查询:

create table my_table(id int, name varchar(100), email varchar(100));
insert into my_table values (1, 'shekh', 'shekh@rms.com');insert into my_table values (1, 'shekh', 'shekh@rms.com');insert into my_table values (2, 'Aman', 'aman@rms.com');insert into my_table values (3, 'Tom', 'tom@rms.com');insert into my_table values (4, 'Raj', 'raj@rms.com');

Select COUNT(1) As Total_Rows from my_tableSelect Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc

试试这个代码

WITH CTE AS
( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)FROM ccnmaster )select * from CTE
 select emp.ename, emp.empno, dept.locfrom empinner join depton dept.deptno=emp.deptnoinner join(select ename, count(*) fromempgroup by ename, deptnohaving count(*) > 1)t on emp.ename=t.ename order by emp.ename/

我们如何计算重复的值??重复2次或大于2次。只是数数他们,而不是组明智的。

select COUNT(distinct col_01) from Table_01

如果要查找重复数据(按一个或多个标准)并选择实际行。

with MYCTE as (SELECT DuplicateKey1,DuplicateKey2 --optional,count(*) XFROM MyTablegroup by DuplicateKey1, DuplicateKey2having count(*) > 1)SELECT E.*FROM MyTable EJOIN MYCTE cteON E.DuplicateKey1=cte.DuplicateKey1AND E.DuplicateKey2=cte.DuplicateKey2ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt

http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/

SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;

我认为这将在特定列中搜索重复值时正常工作。

 SELECT name, emailFROM usersWHERE email in(SELECT email FROM usersGROUP BY emailHAVING COUNT(*)>1)

派对有点晚了,但我发现了一个非常酷的解决方法来查找所有重复的ID:

SELECT email, GROUP_CONCAT(id)FROM   usersGROUP  BY emailHAVING COUNT(email) > 1;

如果您想删除重复项,这里有一种比在三重子选择中查找偶数/奇数行更简单的方法:

SELECT id, name, emailFROM users u, users u2WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id

因此删除:

DELETE FROM usersWHERE id IN (SELECT id/*, name, email*/FROM users u, users u2WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id)

更容易阅读和理解IMHO

备注:唯一的问题是您必须执行请求直到没有删除行,因为每次只删除每个重复的1行

这也应该奏效,也许可以试试。

  Select * from Users awhere EXISTS (Select * from Users bwhere (     a.name = b.nameOR  a.email = b.email)and a.ID != b.id)

特别好,在你的情况下,如果你搜索重复的人有某种前缀或一般的变化,如邮件中的新域。那么你可以在这些列中使用替换()

这是我想出来的简单的东西。它使用一个通用表表达式(CTE)和一个分区窗口(我认为这些特性是在SQL2008年及以后)。

此示例查找具有重复名称和dob的所有学生。要检查重复的字段位于OVER子句中。您可以在投影中包含任何其他字段。

with cte (StudentId, Fname, LName, DOB, RowCnt)as (SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCntFROM tblStudent)SELECT * from CTE where RowCnt > 1ORDER BY DOB, LName
SELECT * FROM users u where rowid = (select max(rowid) from users u1 whereu.email=u1.email);
select name, email, casewhen ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes'else 'No'end "duplicated ?"from users
select id,name,COUNT(*) from user group by Id,Name having COUNT(*)>1

通过使用CTE,我们也可以找到像这样的重复值

with MyCTEas(select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]
)select * from MyCTE where Duplicate>1

这会选择/删除除每组重复记录中的一条记录之外的所有重复记录。因此,删除会留下所有唯一记录+每组重复记录中的一条记录。

选择重复项:

SELECT *FROM tableWHEREid NOT IN (SELECT MIN(id)FROM tableGROUP BY column1, column2);

删除重复项:

DELETE FROM tableWHEREid NOT IN (SELECT MIN(id)FROM tableGROUP BY column1, column2);

请注意较大数量的记录,它可能会导致性能问题。

SELECT column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1, HAVING COUNT(*) > 1;

如何获取表中的重复记录

 SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1GROUP BY EmpCode HAVING COUNT(EmpCode) > 1

我们可以使用在这里工作的聚合函数,如下所示

create table #TableB (id_account int, data int, [date] date)insert into #TableB values (1 ,-50, '10/20/2018'),(1, 20, '10/09/2018'),(2 ,-900, '10/01/2018'),(1 ,20, '09/25/2018'),(1 ,-100, '08/01/2018')
SELECT id_account , data, COUNT(*)FROM #TableBGROUP BY id_account , dataHAVING COUNT(id_account) > 1
drop table #TableB

这里有两个字段id_account和data与Count(*)一起使用。因此,它将给出所有在两列中具有超过一次相同值的记录。

由于某种原因,我们错过了在服务器表中添加任何约束SQL,并且记录已在前端应用程序的所有列中插入重复。然后我们可以使用下面的查询从表中删除重复查询。

SELECT DISTINCT * INTO #TemNewTable FROM #OriginalTableTRUNCATE TABLE #OriginalTableINSERT INTO #OriginalTable SELECT * FROM #TemNewTableDROP TABLE #TemNewTable

在这里,我们获取了原始表的所有不同记录并删除了原始表的记录。我们再次将新表中的所有不同值插入到原始表中,然后删除新表。

删除名称重复的记录

;WITH CTE AS(
SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS T FROM     @YourTable)
DELETE FROM CTE WHERE T > 1

从表中的重复记录检查。

select * from users swhere rowid < any(select rowid from users k where s.name = k.name and s.email = k.email);

select * from users swhere rowid not in(select max(rowid) from users k where s.name = k.name and s.email = k.email);

删除表中的重复记录。

delete from users swhere rowid < any(select rowid from users k where s.name = k.name and s.email = k.email);

delete from users swhere rowid not in(select max(rowid) from users k where s.name = k.name and s.email = k.email);

您可以使用SELECT DISTINCT关键字来删除重复项。您还可以按名称过滤并将具有该名称的每个人都放在表中。

你也许想试试这个

SELECT NAME, EMAIL, COUNT(*)FROM USERSGROUP BY 1,2HAVING COUNT(*) > 1
SELECT name, email,COUNT(email)FROM usersWHERE email IN (SELECT emailFROM usersGROUP BY emailHAVING COUNT(email) > 1)

确切的代码会有所不同,这取决于您是想查找重复的行,还是只想查找具有相同电子邮件和名称的不同id。如果id是主键或具有唯一约束,则不存在这种区别,但问题并未指定这一点。在前一种情况下,您可以使用其他几个答案中给出的代码:

SELECT name, email, COUNT(*)FROM usersGROUP BY name, emailHAVING COUNT(*) > 1

在后一种情况下,您将使用:

SELECT name, email, COUNT(DISTINCT id)FROM usersGROUP BY name, emailHAVING COUNT(DISTINCT id) > 1ORDER BY COUNT(DISTINCT id) DESC

这里最重要的是拥有最快的函数。还应该识别重复的索引。自连接是一个不错的选择,但要拥有更快的函数,最好先找到具有重复项的行,然后与原始表连接以查找重复行的id。最后按除id之外的任何列排序以使重复行彼此相邻。

SELECT u.*FROM users AS uJOIN (SELECT username, emailFROM usersGROUP BY username, emailHAVING COUNT(*)>1) AS wON u.username=w.username AND u.email=w.emailORDER BY u.email;

另一种简单的方法,你也可以使用分析函数来尝试:

SELECT * from
(SELECT name, email,
COUNT(name) OVER (PARTITION BY name, email) cnt
FROM users)
WHERE cnt >1;

表结构:

ID   NAME   EMAIL1    John   asd@asd.com2    Sam    asd@asd.com3    Tom    asd@asd.com4    Bob    bob@asd.com5    Tom    asd@asd.com

解决方案1:

SELECT *,COUNT(*)FROM users t1INNER JOIN users t2WHERE t1.id > t2.idAND t1.name = t2.nameAND t1.email=t2.email

解决方案2:

SELECT name,email,COUNT(*)FROM usersGROUP BY name,emailHAVING COUNT(*) > 1

如果您使用Microsoft Access,这种方式可以工作:

CREATE TABLE users (id int, name varchar(10), email varchar(50));
INSERT INTO users VALUES (1, 'John', 'asd@asd.com');INSERT INTO users VALUES (2, 'Sam', 'asd@asd.com');INSERT INTO users VALUES (3, 'Tom', 'asd@asd.com');INSERT INTO users VALUES (4, 'Bob', 'bob@asd.com');INSERT INTO users VALUES (5, 'Tom', 'asd@asd.com');
SELECT name, email, COUNT(*) AS CountOfFROM usersGROUP BY name, emailHAVING COUNT(*)>1;
DELETE *FROM usersWHERE id IN (SELECT u1.idFROM users u1, users u2WHERE u1.name = u2.name AND u1.email = u2.email AND u1.id > u2.id);

感谢Tancrede Chazallet的删除代码。

请尝试

SELECT UserID, COUNT(UserID)FROM dbo.UserGROUP BY UserIDHAVING COUNT(UserID) > 1

您使用我使用的以下查询:

   select *FROM TABLENAMEWHERE PrimaryCoumnID NOT IN(SELECT MAX(PrimaryCoumnID)FROM  TABLENAMEGROUP BY AnyCoumnID);

我想这对你有帮助

SELECT name, email, COUNT(* )FROM usersGROUP BY name, emailHAVING COUNT(*)>1

这个问题已经在上面的所有答案中得到了很好的回答。但我想列出所有可能的方式,我们可以以各种方式做到这一点,这可能会让求职者了解我们如何做到这一点,并且求职者可以选择最适合他/她需求的解决方案之一,因为这是开发人员遇到不同业务用途或有时在采访中最常见SQL问题之一。

创建示例数据

我将仅从这个问题开始设置一些示例数据。

Create table NewTable (id int, name varchar(10), email varchar(50))INSERT  NewTable VALUES (1,'John','asd@asd.com')INSERT  NewTable VALUES (2,'Sam','asd@asd.com')INSERT  NewTable VALUES (3,'Tom','asd@asd.com')INSERT  NewTable VALUES (4,'Bob','bob@asd.com')INSERT  NewTable VALUES (5,'Tom','asd@asd.com')

在此处输入图片描述

1.按条款使用组

SELECTname,email, COUNT(*) AS OccurenceFROM NewTableGROUP BY name,emailHAVING COUNT(*)>1

在此处输入图片描述

它是如何工作的:

  • GROUP BY子句将行按值分组名称和电子邮件列。
  • 然后,COUNT()函数返回数字每个组的出现次数(姓名,电子邮件)。
  • 然后,HAVING子句保持仅重复组,即具有多个组的组事件。

2.使用CTE:

要返回每个重复行的整行,您可以使用公共表表达式(CTE)将上述查询的结果与NewTable表连接起来:

WITH cte AS (SELECTname,email,COUNT(*) occurrencesFROM NewTableGROUP BYname,emailHAVING COUNT(*) > 1)SELECTt1.Id,t1.name,t1.emailFROM  NewTable t1INNER JOIN cte ONcte.name = t1.name ANDcte.email = t1.emailORDER BYt1.name,t1.email;

在此处输入图片描述

3.使用ROW_NUMBER()函数

WITH cte AS (SELECTname,email,ROW_NUMBER() OVER (PARTITION BY name,emailORDER BY name,email) rownumFROMNewTable t1)SELECT*FROMcteWHERErownum > 1;

在此处输入图片描述

它是如何工作的:

  • ROW_NUMBER()nameemail列中的值将NewTable表的行分配到分区中。重复的行将在nameemail列中具有重复的值,但行号不同
  • 外部查询删除每个组中的第一行。

好吧,现在我相信,你可以有一个合理的想法,如何找到重复,并应用逻辑在所有可能的情况下找到重复。谢谢!

试试这个:

                DECLARE @myTable TABLE(id INT,name VARCHAR(10),email VARCHAR(50));
INSERT @myTableVALUES(1, 'John', 'John-email');INSERT @myTableVALUES(2, 'John', 'John-email');INSERT @myTableVALUES(3, 'fred', 'John-email');INSERT @myTableVALUES(4, 'fred', 'fred-email');INSERT @myTableVALUES(5, 'sam', 'sam-email');INSERT @myTableVALUES(6, 'sam', 'sam-email');

WITH cteAS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rowNum,*FROM @myTable)SELECT c1.id,c1.name,c1.emailFROM cte AS c1WHERE 1 <(SELECT COUNT(c2.rowNum)FROM cte AS c2WHERE c1.name = c2.nameAND c1.email = c2.email);