在SQL Server中查找重复的行

我有一个组织的SQL Server数据库,有许多重复的行。我想运行一个选择语句来获取所有这些和被欺骗的数量,同时还返回与每个组织相关的id。

这样的陈述:

SELECT     orgName, COUNT(*) AS dupes
FROM         organizations
GROUP BY orgName
HAVING      (COUNT(*) > 1)

将返回如下内容

orgName        | dupes
ABC Corp       | 7
Foo Federation | 5
Widget Company | 2

但我也想要他们的id。有什么办法可以做到吗?也许就像

orgName        | dupeCount | id
ABC Corp       | 1         | 34
ABC Corp       | 2         | 5
...
Widget Company | 1         | 10
Widget Company | 2         | 2

原因是还有一个单独的用户表链接到这些组织,我想把它们统一起来(因此删除dupes,用户链接到同一个组织,而不是dupe组织)。但我想手动部分,所以我不会搞砸任何事情,但我仍然需要一个语句返回所有的dupe组织的id,这样我就可以通过用户列表。

566190 次浏览
select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
SELECT orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName

你可以运行下面的查询,用max(id)找到重复的行并删除这些行。

SELECT orgName, COUNT(*), Max(ID) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)

但是您必须运行这个查询几次。

你可以这样做:

SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName

如果你只想返回可以删除的记录(每个记录只留下一个),你可以使用:

SELECT
id, orgName
FROM (
SELECT
orgName, id,
ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
FROM organizations
) AS d
WHERE intRow != 1

编辑:SQL Server 2000没有ROW_NUMBER()函数。相反,你可以使用:

SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id
select orgname, count(*) as dupes, id
from organizations
where orgname in (
select orgname
from organizations
group by orgname
having (count(*) > 1)
)
group by orgname, id

标记为正确的解决方案并不适用于我,但我发现这个答案工作得很好:获取MySql中重复行的列表

SELECT n1.*
FROM myTable n1
INNER JOIN myTable n2
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id

你有几种方法来选择duplicate rows

对于我的解决方案,首先考虑以下表格为例

CREATE TABLE #Employee
(
ID          INT,
FIRST_NAME  NVARCHAR(100),
LAST_NAME   NVARCHAR(300)
)


INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );

第一个解决方案:

SELECT DISTINCT *
FROM   #Employee;


WITH #DeleteEmployee AS (
SELECT ROW_NUMBER()
OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
RNUM
FROM   #Employee
)


SELECT *
FROM   #DeleteEmployee
WHERE  RNUM > 1


SELECT DISTINCT *
FROM   #Employee

第二个解决方案:使用identity字段

SELECT DISTINCT *
FROM   #Employee;


ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)


SELECT *
FROM   #Employee
WHERE  UNIQ_ID < (
SELECT MAX(UNIQ_ID)
FROM   #Employee a2
WHERE  #Employee.ID = a2.ID
AND #Employee.FIRST_NAME = a2.FIRST_NAME
AND #Employee.LAST_NAME = a2.LAST_NAME
)


ALTER TABLE #Employee DROP COLUMN UNIQ_ID


SELECT DISTINCT *
FROM   #Employee

所有解决方案的最后使用这个命令

DROP TABLE #Employee

你可以试试这个,这对你是最好的

 WITH CTE AS
(
SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations
)
select * from CTE where RN>1
go

试一试

SELECT orgName, id, count(*) as dupes
FROM organizations
GROUP BY orgName, id
HAVING count(*) > 1;
我想我知道你需要什么 我需要混合答案,我认为我得到了他想要的解决方案:

select o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName
from organizations o
inner join (
SELECT MAX(id) as id, orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName

有了Max id,你就会得到副本的id和他要求的原件的id:

id org name , dublicate count (missing out in this case)
id doublicate org name , doub count (missing out again because does not help in this case)

唯一可悲的是你把它写成了这种形式

id , name , dubid , name

希望它仍然有用

Select * from (Select orgName,id,
ROW_NUMBER() OVER(Partition By OrgName ORDER by id DESC) Rownum
From organizations )tbl Where Rownum>1
所以rowum> 1的记录将是表中的重复记录。'分区由'第一组记录,然后通过给他们序列号序列化他们。 所以rownum> 1将是重复的记录,可以这样删除
select column_name, count(column_name)
from table_name
group by column_name
having count (column_name) > 1;

Src: https://stackoverflow.com/a/59242/1465252

select a.orgName,b.duplicate, a.id
from organizations a
inner join (
SELECT orgName, COUNT(*) AS duplicate
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) b on o.orgName = oc.orgName
group by a.orgName,a.id

如果要删除重复项:

WITH CTE AS(
SELECT orgName,id,
RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id)
FROM organizations
)
DELETE FROM CTE WHERE RN > 1
select * from [Employees]

.

查找重复记录 1)使用CTE < / p >

with mycte
as
(
select Name,EmailId,ROW_NUMBER() over(partition by Name,EmailId order by id) as Duplicate from [Employees]
)
select * from mycte

.

2)使用GroupBy

select Name,EmailId,COUNT(name) as Duplicate from  [Employees] group by Name,EmailId

假设我们有一个表'Student',有两列:

  • student_id int
  • < p > student_name varchar

    Records:
    +------------+---------------------+
    | student_id | student_name        |
    +------------+---------------------+
    |        101 | usman               |
    |        101 | usman               |
    |        101 | usman               |
    |        102 | usmanyaqoob         |
    |        103 | muhammadusmanyaqoob |
    |        103 | muhammadusmanyaqoob |
    +------------+---------------------+
    

Now we want to see duplicate records Use this query:

select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;

+---------------------+------------+---+
| student_name        | student_id | c |
+---------------------+------------+---+
| usman               |        101 | 3 |
| muhammadusmanyaqoob |        103 | 2 |
+---------------------+------------+---+

我有一个更好的选择,把重复的记录放在一个表中

SELECT x.studid, y.stdname, y.dupecount
FROM student AS x INNER JOIN
(SELECT a.stdname, COUNT(*) AS dupecount
FROM student AS a INNER JOIN
studmisc AS b ON a.studid = b.studid
WHERE (a.studid LIKE '2018%') AND (b.studstatus = 4)
GROUP BY a.stdname
HAVING (COUNT(*) > 1)) AS y ON x.stdname = y.stdname INNER JOIN
studmisc AS z ON x.studid = z.studid
WHERE (x.studid LIKE '2018%') AND (z.studstatus = 4)
ORDER BY x.stdname

上述查询的结果显示了所有具有唯一学生id的重复名称和重复出现的次数

点击这里查看sql . sql的结果

 /*To get duplicate data in table */


SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1
GROUP BY EmpCode HAVING COUNT(EmpCode) > 1
我使用两个方法来查找重复的行。 第一种方法是最著名的用分组的方法。 第二种方法是使用CTE - 通用表表达式.

正如@RedFilter提到的那样,这种方式也是正确的。很多时候我发现CTE方法对我也很有用。

WITH TempOrg (orgName,RepeatCount)
AS
(
SELECT orgName,ROW_NUMBER() OVER(PARTITION by orgName ORDER BY orgName)
AS RepeatCount
FROM dbo.organizations
)
select t.*,e.id from organizations   e
inner join TempOrg t on t.orgName= e.orgName
where t.RepeatCount>1

在上面的例子中,我们通过使用ROW_NUMBER和PARTITION by找到重复出现的情况来收集结果。然后应用where子句只选择重复次数大于1的行。将所有结果收集到CTE表中,并与Organizations表进行联接。

来源:CodoBee