如何删除SQL Server中的重复行?

我怎么能删除重复的行没有unique row id存在?

我的座位是

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

我想留下以下重复删除后:

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

我尝试了一些查询,但我认为他们取决于有一个行id,因为我没有得到想要的结果。例如:

DELETE
FROM table
WHERE col1 IN (
SELECT id
FROM table
GROUP BY id
HAVING (COUNT(col1) > 1)
)
1273261 次浏览

我喜欢CTEs和ROW_NUMBER,因为这两个组合可以让我们看到哪些行被删除(或更新),因此只需将DELETE FROM CTE...更改为SELECT * FROM CTE:

WITH CTE AS(
SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
FROM dbo.Table1
)
DELETE FROM CTE WHERE RN > 1

DEMO(结果不同;我想这是由于你的打字错误。

COL1    COL2    COL3    COL4    COL5    COL6    COL7
john    1        1       1       1       1       1
sally   2        2       2       2       2       2

这个示例通过单个列col1确定重复项,因为有PARTITION BY col1。如果你想包含多个列,只需将它们添加到PARTITION BY:

ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)

微软有一个关于如何删除重复文件的非常简洁的指南。看看http://support.microsoft.com/kb/139444

简而言之,当你只有几行要删除时,下面是删除重复项的最简单方法:

SET rowcount 1;
DELETE FROM t1 WHERE myprimarykey=1;

myprimarykey是行标识符。

我将rowcount设置为1,因为我只复制了两行。如果我复制了3行,那么我就会将rowcount设置为2,这样它就会删除它看到的前两行,只在表t1中留下一行。

另一种在不丢失信息的情况下删除重复行的方法如下:

delete from dublicated_table t1 (nolock)
join (
select t2.dublicated_field
, min(len(t2.field_kept)) as min_field_kept
from dublicated_table t2 (nolock)
group by t2.dublicated_field having COUNT(*)>1
) t3
on t1.dublicated_field=t3.dublicated_field
and len(t1.field_kept)=t3.min_field_kept
DELETE from search
where id not in (
select min(id) from search
group by url
having count(*)=1


union


SELECT min(id) FROM search
group by url
having count(*) > 1
)

如果你没有引用,比如外键,你可以这样做。在测试概念证明和测试数据重复时,我经常这样做。

SELECT DISTINCT [col1],[col2],[col3],[col4],[col5],[col6],[col7]


INTO [newTable]


FROM [oldTable]

进入对象资源管理器并删除旧表。

用旧表的名称重命名新表。

我更喜欢CTE从sql server表中删除重复的行

强烈建议遵循这篇文章:http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/

保持原创性

WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable
)


DELETE FROM CTE WHERE RN<>1

不保留原创

WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)
 
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)

请参见下面的删除方式。

Declare @table table
(col1 varchar(10),col2 int,col3 int, col4 int, col5 int, col6 int, col7 int)
Insert into @table values
('john',1,1,1,1,1,1),
('john',1,1,1,1,1,1),
('sally',2,2,2,2,2,2),
('sally',2,2,2,2,2,2)

创建一个名为@table的示例表,并用给定的数据加载它。

enter image description here

Delete  aliasName from (
Select  *,
ROW_NUMBER() over (Partition by col1,col2,col3,col4,col5,col6,col7 order by col1) as rowNumber
From    @table) aliasName
Where   rowNumber > 1


Select * from @table

enter image description here

注意:如果你给出了Partition by部分的所有列,那么order by没有太大的意义。

我知道,这个问题是三年前问的,我的答案是蒂姆发布的另一个版本,但发布只是为了对任何人有帮助。

with myCTE
as


(
select productName,ROW_NUMBER() over(PARTITION BY productName order by slno) as Duplicate from productDetails
)
Delete from myCTE where Duplicate>1
-- this query will keep only one instance of a duplicate record.
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY col1, col2, col3-- based on what? --can be multiple columns
ORDER BY ( SELECT 0)) RN
FROM   Mytable)






delete  FROM cte
WHERE  RN > 1

不使用CTEROW_NUMBER(),你可以只使用组by和MAX函数删除记录,这里是一个例子

DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyDuplicateTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

With reference to https://support.microsoft.com/en-us/help/139444/how-to-remove-duplicate-rows-from-a-table-in-sql-server

删除重复的想法涉及

  • a)保护那些不重复的行
  • b)保留众多符合条件的重复行中的一行。

循序渐进的

  • 1)首先确定满足重复定义的行 并将它们插入到临时表中,写入#tableAll .
  • 2)选择不重复的(单行)或不同的行到临时表中 李说# tableUnique。< / >
  • 3)从源表中删除连接#tableAll,删除 李重复。< / >
  • 4)插入源表中所有来自#tableUnique的行。
  • 5)删除#tableAll和#tableUnique

如果你有能力临时添加一个列到表中,这是一个适合我的解决方案:

ALTER TABLE dbo.DUPPEDTABLE ADD RowID INT NOT NULL IDENTITY(1,1)

然后使用MIN和GROUP BY的组合执行DELETE

DELETE b
FROM dbo.DUPPEDTABLE b
WHERE b.RowID NOT IN (
SELECT MIN(RowID) AS RowID
FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
GROUP BY a.ITEM_NUMBER,
a.CHARACTERISTIC,
a.INTVALUE,
a.FLOATVALUE,
a.STRINGVALUE
);

验证DELETE执行正确:

SELECT a.ITEM_NUMBER,
a.CHARACTERISTIC,
a.INTVALUE,
a.FLOATVALUE,
a.STRINGVALUE, COUNT(*)--MIN(RowID) AS RowID
FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
GROUP BY a.ITEM_NUMBER,
a.CHARACTERISTIC,
a.INTVALUE,
a.FLOATVALUE,
a.STRINGVALUE
ORDER BY COUNT(*) DESC

结果中不应有计数大于1的行。最后,删除rowid列:

ALTER TABLE dbo.DUPPEDTABLE DROP COLUMN RowID;

哦,哇,我觉得准备这些答案太愚蠢了,他们就像专家的答案,包括所有的CTE和临时表等。

为了让它工作,我所做的只是使用MAX聚合ID列。

DELETE FROM table WHERE col1 IN (
SELECT MAX(id) FROM table GROUP BY id HAVING ( COUNT(col1) > 1 )
)

注意:您可能需要多次运行它来删除重复,因为这一次只会删除一组重复的行。

在尝试了上面建议的解决方案后,这适用于小型中型表。 我可以为非常大的表提出这个解决方案。因为它在迭代中运行
  1. 删除LargeSourceTable . xml文件的所有依赖项视图
  2. 你可以使用sql management studio找到依赖项,右键单击表格并单击“查看依赖项”。
  3. 重命名表:
  4. # EYZ0
  5. 再次创建LargeSourceTable,但是现在,添加一个带有定义重复的所有列的主键,添加WITH (IGNORE_DUP_KEY = ON)
  6. 例如< p >:

    CREATE TABLE [dbo].[LargeSourceTable] ( ID int ID (1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [columnn1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); < /代码> < / p > < /李>

  7. 再次为新创建的表创建最初删除的视图

  8. 现在,运行下面的sql脚本,您将看到每页1,000,000行的结果,您可以更改每页的行数以更频繁地看到结果。

  9. 注意,我将IDENTITY_INSERT设置为开启和关闭,因为其中一列包含自动增量id,我也复制了它

SET IDENTITY_INSERT LargeSourceTable ON 声明@PageNumber为INT, @RowspPage为INT 声明@TotalRows为INT 声明@dt varchar(19) SET @PageNumber = 0 SET @RowspPage = 1000000 # EYZ0 < / p >

While ((@PageNumber - 1) * @RowspPage < @TotalRows )
Begin
begin transaction tran_inner
; with cte as
(
SELECT * FROM LargeSourceTable_TEMP ORDER BY ID
OFFSET ((@PageNumber) * @RowspPage) ROWS
FETCH NEXT @RowspPage ROWS ONLY
)


INSERT INTO LargeSourceTable
(
ID
,[CreateDate]
,[Column1]
,[Column2]
,[Column3]
)
select
ID
,[CreateDate]
,[Column1]
,[Column2]
,[Column3]
from cte


commit transaction tran_inner


PRINT 'Page: ' + convert(varchar(10), @PageNumber)
PRINT 'Transfered: ' + convert(varchar(20), @PageNumber * @RowspPage)
PRINT 'Of: ' + convert(varchar(20), @TotalRows)


SELECT @dt = convert(varchar(19), getdate(), 121)
RAISERROR('Inserted on: %s', 0, 1, @dt) WITH NOWAIT
SET @PageNumber = @PageNumber + 1
End

SET IDENTITY_INSERT LargeSourceTable OFF < /代码> < / p >

您需要根据字段将重复的记录分组,然后保留其中一条记录,删除其余记录。 例如:< / p >

DELETE prg.Person WHERE Id IN (
SELECT dublicateRow.Id FROM
(
select MIN(Id) MinId, NationalCode
from  prg.Person group by NationalCode  having count(NationalCode ) > 1
) GroupSelect
JOIN  prg.Person dublicateRow ON dublicateRow.NationalCode = GroupSelect.NationalCode
WHERE dublicateRow.Id <> GroupSelect.MinId)

试着使用:

SELECT linkorder
,Row_Number() OVER (
PARTITION BY linkorder ORDER BY linkorder DESC
) AS RowNum
FROM u_links

enter image description here

在sql server中可以通过多种方式来实现 最简单的方法是: 将重复行表中的不同行插入到新的临时表中。然后从重复的行表中删除所有数据,然后从没有重复的临时表中插入所有数据,如下所示
select distinct * into #tmp From table
delete from table
insert into table
select * from #tmp drop table #tmp


select * from table

使用公共表表达式(CTE)删除重复行

With CTE_Duplicates as
(select id,name , row_number()
over(partition by id,name order by id,name ) rownumber  from table  )
delete from CTE_Duplicates where rownumber!=1
DECLARE @TB TABLE(NAME VARCHAR(100));
INSERT INTO @TB VALUES ('Red'),('Red'),('Green'),('Blue'),('White'),('White')
--**Delete by Rank**
;WITH CTE AS(SELECT NAME,DENSE_RANK() OVER (PARTITION BY NAME ORDER BY NEWID()) ID FROM @TB)
DELETE FROM CTE WHERE ID>1
SELECT NAME FROM @TB;
--**Delete by Row Number**
;WITH CTE AS(SELECT NAME,ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB)
DELETE FROM CTE WHERE ID>1;
SELECT NAME FROM @TB;

从一个巨大的(几百万条记录)表中删除重复项可能需要很长时间。我建议将所选行的批量插入到临时表中,而不是删除。

--REWRITING YOUR CODE(TAKE NOTE OF THE 3RD LINE) WITH CTE AS(SELECT NAME,ROW_NUMBER()
OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB) SELECT * INTO #unique_records FROM
CTE WHERE ID =1;

mysql中有两个解决方案:

使用< em > # EYZ0 < / em >语句删除重复的行

DELETE t1 FROM contacts t1
INNER JOIN contacts t2
WHERE
t1.id < t2.id AND
t1.email = t2.email;

该查询两次引用联系人表,因此,它使用表别名t1t2

输出结果为:

< p > 1 查询OK, 4行受影响(0.10秒)

如果你想删除重复的行并保留lowest id,你可以使用下面的语句:

DELETE c1 FROM contacts c1
INNER JOIN contacts c2
WHERE
c1.id > c2.id AND
c1.email = c2.email;
< p >, , < / p >

使用中间表删除重复的行

下面是使用中间表删除重复行的步骤:

,,1. 创建一个新表,其结构与要删除重复行的原始表相同。

,,2. 将原始表中的不同行插入到直接表中。

,,3.将原始表中的不同行插入到直接表中。

,

步骤1。创建一个与原表结构相同的新表:

CREATE TABLE source_copy LIKE source;

步骤2。从原表中插入不同的行到新表中:

INSERT INTO source_copy
SELECT * FROM source
GROUP BY col; -- column that has duplicate values

步骤3。删除原始表并将直接表重命名为原始表

DROP TABLE source;
ALTER TABLE source_copy RENAME TO source;

来源:# EYZ0

DELETE FROM TBL1  WHERE ID  IN
(SELECT ID FROM TBL1  a WHERE ID!=
(select MAX(ID) from TBL1  where DUPVAL=a.DUPVAL
group by DUPVAL
having count(DUPVAL)>1))

删除所有重复项,但删除第一个重复项(具有最小ID)

应该同样适用于其他SQL服务器,如Postgres:

DELETE FROM table
WHERE id NOT IN (
select min(id) from table
group by col1, col2, col3, col4, col5, col6, col7
)

这可能对你的情况有帮助

DELETE t1 FROM table t1 INNER JOIN table t2 WHERE t1.id > t2.id AND t1.col1 = t2.col1

要在SQL Server中删除表中的重复行,请执行以下步骤:

  1. 使用GROUP BY子句或ROW_NUMBER()函数查找重复的行。
  2. 使用DELETE语句删除重复的行。

设置一个示例表

DROP TABLE IF EXISTS contacts;


CREATE TABLE contacts(
contact_id INT IDENTITY(1,1) PRIMARY KEY,
first_name NVARCHAR(100) NOT NULL,
last_name NVARCHAR(100) NOT NULL,
email NVARCHAR(255) NOT NULL,
);

插入的值

INSERT INTO contacts
(first_name,last_name,email)
VALUES
('Syed','Abbas','syed.abbas@example.com'),
('Catherine','Abel','catherine.abel@example.com'),
('Kim','Abercrombie','kim.abercrombie@example.com'),
('Kim','Abercrombie','kim.abercrombie@example.com'),
('Kim','Abercrombie','kim.abercrombie@example.com'),
('Hazem','Abolrous','hazem.abolrous@example.com'),
('Hazem','Abolrous','hazem.abolrous@example.com'),
('Humberto','Acevedo','humberto.acevedo@example.com'),
('Humberto','Acevedo','humberto.acevedo@example.com'),
('Pilar','Ackerman','pilar.ackerman@example.com');

enter image description here

查询

    SELECT
contact_id,
first_name,
last_name,
email
FROM
contacts;

从表中删除重复的行

   WITH cte AS (
SELECT
contact_id,
first_name,
last_name,
email,
ROW_NUMBER() OVER (
PARTITION BY
first_name,
last_name,
email
ORDER BY
first_name,
last_name,
email
) row_num
FROM
contacts
)
DELETE FROM cte
WHERE row_num > 1;

现在要删除记录吗

enter image description here

DELETE p1 FROM Person p1,
Person p2
WHERE
p1.Email = p2.Email AND p1.Id > p2.Id