如何比较两个表相同的数据内容?

我有两个表 TableATableB,它们有相同的列格式,这意味着两个表都有列

A B C D E F

其中 AB是主键。

如何编写一个 SQL 查询来检查 TableATableB(它们有相同的主键)是否在每一列中包含完全相同的值?

这意味着这两个表具有完全相同的数据。

589790 次浏览

根据 DBMS 使用的 SQL 的风格,您应该能够使用“ MINUS”或“ EXCEPT”。

select * from tableA
minus
select * from tableB

如果查询不返回任何行,则数据完全相同。

节食佛有一个很好的答案。在没有 MINUS 或 EXCEPT 的情况下,一种选择是在所有表之间进行联合,对所有列进行分组,并确保所有表中都有两个:

SELECT col1, col2, col3
FROM
(SELECT * FROM tableA
UNION ALL
SELECT * FROM tableB) data
GROUP BY col1, col2, col3
HAVING count(*)!=2

在 MySQL 中,不支持“减号”,并且考虑到性能,这是一个快速

query:
SELECT
t1.id,
t1.id
FROM t1 inner join t2 using (id) where concat(t1.C, t1.D, ...)<>concat(t2.C, t2.D, ...)

使用关系运算符:

SELECT * FROM TableA
UNION
SELECT * FROM TableB
EXCEPT
SELECT * FROM TableA
INTERSECT
SELECT * FROM TableB;

将 Oracle 的 EXCEPT更改为 MINUS

稍微挑剔一点: 以上依赖于操作符的优先级,其中根据 SQL 标准是实现依赖的,所以 YMMV。它适用于 SQLServer,其优先级为:

  1. Expressions in parentheses
  2. INTERSECT
  3. 从左到右评估 EXCEPTUNION

为了完成这一任务,使用“除法”方法存储了一个进程,比较了两个表,并给出了同一个表的 ADD、 DEL、 GAP 3个错误状态的结果 表必须有相同的 PK,你声明2个表和字段来比较1或两个表

像这样使用 ps_TableGap 'tbl1','Tbl2','fld1,fld2,fld3','fld4'fld5'fld6' (optional)

/****** Object:  StoredProcedure [dbo].[ps_TableGap]    Script Date: 10/03/2013 16:03:44 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO


-- =============================================
-- Author:       Arnaud ALLAVENA
-- Create date: 03.10.2013
-- Description: Compare tables
-- =============================================
create PROCEDURE [dbo].[ps_TableGap]
-- Add the parameters for the stored procedure here
@Tbl1 as varchar(100),@Tbl2 as varchar(100),@Fld1 as varchar(1000), @Fld2 as varchar(1000)= ''
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.


SET NOCOUNT ON;
--Variables
--@Tbl1 = table 1
--@Tbl2 = table 2
--@Fld1 = Fields to compare from table 1
--@Fld2 Fields to compare from table 2
Declare @SQL varchar(8000)= '' --SQL statements
Declare @nLoop int = 1 --loop counter
Declare @Pk varchar(1000)= '' --primary key(s)
Declare @Pk1 varchar(1000)= '' --first field of primary key
declare @strTmp varchar(50) = '' --returns value in Pk determination
declare @FldTmp varchar (1000) = '' --temporarily fields for alias calculation


--If @Fld2 empty we take @Fld1
--fields rules: fields to be compare must be in same order and type - always returns Gap
If @Fld2 = '' Set @Fld2 = @Fld1


--Change @Fld2 with Alias prefix xxx become _xxx
while charindex(',',@Fld2)>0
begin
Set @FldTmp = @FldTmp + (select substring(@Fld2,1,charindex(',',@Fld2)-1) + ' as _' + substring(@Fld2,1,charindex(',',@Fld2)-1) + ',')
Set @Fld2 = (select ltrim(right(@Fld2,len(@Fld2)-charindex(',',@Fld2))))
end
Set @FldTmp = @FldTmp + @Fld2 + ' as _' + @Fld2
Set @Fld2 = @FldTmp


--Determinate primary key jointure
--rule: same pk in both tables
Set @nLoop = 1
Set @SQL = 'Declare crsr cursor for select COLUMN_NAME from INFORMATION_SCHEMA.KEY_COLUMN_USAGE where TABLE_NAME = '''
+ @Tbl1 + ''' or TABLE_SCHEMA + ''.'' + TABLE_NAME = ''' + @Tbl1 +  ''' or TABLE_CATALOG + ''.'' + TABLE_SCHEMA + ''.'' + TABLE_NAME = ''' + @Tbl1
+ ''' order by ORDINAL_POSITION'
exec(@SQL)
open crsr
fetch next from crsr into @strTmp
while @@fetch_status = 0
begin
if @nLoop = 1
begin
Set @Pk = 's.' + @strTmp + ' = b._' + @strTmp
Set @Pk1 = @strTmp
set @nLoop = @nLoop + 1
end
Else
Set @Pk = @Pk + ' and s.' + @strTmp + ' = b._' + @strTmp
fetch next from crsr into @strTmp


end
close crsr
deallocate crsr


--SQL statement build
set @SQL = 'select case when s.' + @Pk1 + ' is null then ''Del'' when b._' + @Pk1 + ' is null then ''Add'' else ''Gap'' end as TypErr, '''
set @SQL = @SQL + @Tbl1 +''' as Tbl1, s.*, ''' + @Tbl2 +''' as Tbl2 ,b.* from (Select ' + @Fld1 + ' from ' + @Tbl1
set @SQL = @SQL + ' EXCEPT SELECT ' + @Fld2 + ' from ' + @Tbl2 + ')s full join (Select ' + @Fld2 + ' from ' + @Tbl2
set @SQL = @SQL + ' EXCEPT SELECT ' + @Fld1 + ' from ' + @Tbl1 +')b on '+ @Pk


--Run SQL statement
Exec(@SQL)
END

从 onedaywhen 获取脚本,我对其进行了修改,以显示每个条目来自哪个表。

DECLARE @table1 NVARCHAR(80)= 'table 1 name'
DECLARE @table2 NVARCHAR(80)= 'table 2 name'
DECLARE @sql NVARCHAR (1000)


SET @sql =
'
SELECT ''' + @table1 + ''' AS table_name,* FROM
(
SELECT * FROM ' + @table1 + '
EXCEPT
SELECT * FROM ' + @table2 + '
) x


UNION


SELECT ''' + @table2 + ''' AS table_name,* FROM
(
SELECT * FROM ' + @table2 + '
EXCEPT
SELECT * FROM ' + @table1 + '
) y
'


EXEC sp_executesql @stmt = @sql

增强食佛的答案..。

select * from
(
select * from tableA
minus
select * from tableB
)
union all
select * from
(
select * from tableB
minus
select * from tableA
)
SELECT c.ID
FROM clients c
WHERE EXISTS(SELECT c2.ID
FROM clients2 c2
WHERE c2.ID = c.ID);

将返回两个表中相同的所有 ID。

    SELECT unnest(ARRAY[1,2,2,3,3])
EXCEPT
SELECT unnest(ARRAY[1,1,2,3,3])
UNION
SELECT unnest(ARRAY[1,1,2,3,3])
EXCEPT
SELECT unnest(ARRAY[1,2,2,3,3])

结果为空,但源是不同的!

但是:

(
SELECT unnest(ARRAY[1,2,2,3])
EXCEPT ALL
SELECT unnest(ARRAY[2,1,2,3])
)
UNION
(
SELECT unnest(ARRAY[2,1,2,3])
EXCEPT ALL
SELECT unnest(ARRAY[1,2,2,3])
)

工程。

我在 SQL Server 中也遇到过同样的问题,所以编写了这个 T-SQL 脚本来自动化这个过程(实际上这是一个打了折扣的版本,我的版本将所有的差异写到一个表中以便于报告)。

将“ MyTable”和“ MyOtherTable”更新为要比较的表的名称。

DECLARE @ColName varchar(100)
DECLARE @Table1 varchar(100) = 'MyTable'
DECLARE @Table2 varchar(100) = 'MyOtherTable'




IF (OBJECT_ID('tempdb..#col') IS NOT NULL) DROP TABLE #col
SELECT  IDENTITY(INT, 1, 1) RowNum , c.name
INTO    #col
FROM    SYS.Objects o
JOIN SYS.columns c on o.object_id = c.object_id
WHERE   o.name = @Table1 AND NOT c.Name IN ('List','Columns','YouWantToIgnore')


DECLARE @Counter INT = (SELECT MAX(RowNum) FROM #col)


WHILE @Counter > 0


BEGIN
SET @ColName = (SELECT name FROM #Col WHERE RowNum= @Counter)
EXEC ('SELECT  t1.Identifier
,t1.'+@ColName+' AS '+@Table1+@ColName+'
,t2.'+@ColName+' AS '+@Table2+@ColName+'
FROM    '+@Table1+' t1
LEFT JOIN '+@Table2+' t2 ON t1.Identifier = t2.Identifier
WHERE   t1.'+@ColName+' <> t2.'+@ColName)
SET @Counter = @Counter - 1
END

I wrote this to compare the results of a pretty nasty view I ported from Oracle to SQL Server. It creates a pair of temp tables, #DataVariances and #SchemaVariances, with differences in (you guessed it) the data in the tables and the schema of the tables themselves.

它要求两个表都有一个主键,但是如果源表没有标识列,那么可以将其放入 temdb 中。

declare @TableA_ThreePartName nvarchar(max) = ''
declare @TableB_ThreePartName nvarchar(max) = ''
declare @KeyName nvarchar(max) = ''


/***********************************************************************************************


Script to compare two tables and return differneces in schema and data.


Author: Devin Lamothe       2017-08-11


***********************************************************************************************/
set nocount on


-- Split three part name into database/schema/table
declare @Database_A nvarchar(max) = (
select  left(@TableA_ThreePartName,charindex('.',@TableA_ThreePartName) - 1))
declare @Table_A nvarchar(max) = (
select  right(@TableA_ThreePartName,len(@TableA_ThreePartName) - charindex('.',@TableA_ThreePartName,len(@Database_A) + 2)))
declare @Schema_A nvarchar(max) = (
select  replace(replace(@TableA_ThreePartName,@Database_A + '.',''),'.' + @Table_A,''))


declare @Database_B nvarchar(max) = (
select  left(@TableB_ThreePartName,charindex('.',@TableB_ThreePartName) - 1))
declare @Table_B nvarchar(max) = (
select  right(@TableB_ThreePartName,len(@TableB_ThreePartName) - charindex('.',@TableB_ThreePartName,len(@Database_B) + 2)))
declare @Schema_B nvarchar(max) = (
select  replace(replace(@TableB_ThreePartName,@Database_B + '.',''),'.' + @Table_B,''))


-- Get schema for both tables
declare @GetTableADetails nvarchar(max) = '
use [' + @Database_A +']
select  COLUMN_NAME
,  DATA_TYPE
from  INFORMATION_SCHEMA.COLUMNS
where  TABLE_NAME = ''' + @Table_A + '''
and  TABLE_SCHEMA = ''' + @Schema_A + '''
'
create table #Table_A_Details (
ColumnName nvarchar(max)
,   DataType nvarchar(max)
)
insert into #Table_A_Details
exec (@GetTableADetails)


declare @GetTableBDetails nvarchar(max) = '
use [' + @Database_B +']
select  COLUMN_NAME
,  DATA_TYPE
from  INFORMATION_SCHEMA.COLUMNS
where  TABLE_NAME = ''' + @Table_B + '''
and  TABLE_SCHEMA = ''' + @Schema_B + '''
'
create table #Table_B_Details (
ColumnName nvarchar(max)
,   DataType nvarchar(max)
)
insert into #Table_B_Details
exec (@GetTableBDetails)




-- Get differences in table schema
select  ROW_NUMBER() over (order by
a.ColumnName
,   b.ColumnName) as RowKey
,  a.ColumnName as A_ColumnName
,  a.DataType as A_DataType
,  b.ColumnName as B_ColumnName
,  b.DataType as B_DataType
into  #FieldList
from  #Table_A_Details a
full outer join  #Table_B_Details b
on  a.ColumnName = b.ColumnName
where  a.ColumnName is null
or  b.ColumnName is null
or  a.DataType <> b.DataType


drop table  #Table_A_Details
drop table  #Table_B_Details


select  coalesce(A_ColumnName,B_ColumnName) as ColumnName
,  A_DataType
,  B_DataType
into  #SchemaVariances
from  #FieldList


-- Get differences in table data
declare @LastColumn int = (select max(RowKey) from #FieldList)
declare @RowNumber int = 1
declare @ThisField nvarchar(max)
declare @TestSql nvarchar(max)






create table #DataVariances (
TableKey            nvarchar(max)
,   FieldName           nvarchar(max)
,   TableA_Value        nvarchar(max)
,   TableB_Value        nvarchar(max)
)


delete from #FieldList where A_DataType in ('varbinary','image') or B_DataType in ('varbinary','image')


while @RowNumber <= @LastColumn begin
set @TestSql = '
select  coalesce(a.[' + @KeyName + '],b.[' + @KeyName + ']) as TableKey
,  ''' + @ThisField + ''' as FieldName
,  a.[' + @ThisField + '] as [TableA_Value]
,  b.[' + @ThisField + '] as [TableB_Value]
from  [' + @Database_A + '].[' + @Schema_A + '].[' + @Table_A + '] a
inner join  [' + @Database_B + '].[' + @Schema_B + '].[' + @Table_B + '] b
on  a.[' + @KeyName + '] = b.[' + @KeyName + ']
where  ltrim(rtrim(a.[' + @ThisField + '])) <> ltrim(rtrim(b.[' + @ThisField + ']))
or (a.[' + @ThisField + '] is null and  b.[' + @ThisField + '] is not null)
or (a.[' + @ThisField + '] is not null and  b.[' + @ThisField + '] is null)
'


insert into #DataVariances
exec (@TestSql)


set @RowNumber = @RowNumber + 1
set @ThisField = (select coalesce(A_ColumnName,B_ColumnName) from #FieldList a where RowKey = @RowNumber)


end


drop table #FieldList


print 'Query complete.  Select from #DataVariances to verify data integrity or #SchemaVariances to verify schemas match.  Data types varbinary and image are not checked.'

基于 dietbuddha & IanMc 的答案的替代的、增强的查询。 查询包含描述,有助于显示哪些行存在,哪些行不存在

(
select 'InTableA_NoMatchInTableB' as Msg, * from tableA
except
select 'InTableA_NoMatchInTableB' , * from tableB
)
union all
(
select 'InTableB_NoMatchInTableA' as Msg, * from tableB
except
select 'InTableB_NNoMatchInTableA' ,* from tableA
)
SELECT *
FROM TABLE A
WHERE NOT EXISTS (SELECT 'X'
FROM  TABLE B
WHERE B.KEYFIELD1 = A.KEYFIELD1
AND   B.KEYFIELD2 = A.KEYFIELD2
AND   B.KEYFIELD3 = A.KEYFIELD3)
;

“ X”是任何值。

切换表格以查看不同的差异。

确保将表中的关键字段联接起来。

或者仅仅使用带有2个 select 语句的 MINUS 操作符,然而,MINUS 只能在 Oracle 中工作。

大多数回应似乎忽略了卡米尔提出的问题。(即表包含相同的行,但在每个表中重复不同的行。)不幸的是,我不能使用他的解决方案,因为我在甲骨文。我能想到的最好答案是:

SELECT * FROM
(
SELECT column1, column2, ..., COUNT(*) AS the_count
FROM tableA
GROUP BY column1, column2, ...
MINUS
SELECT column1, column2, ..., COUNT(*) AS the_count
FROM tableB
GROUP BY column1, column2, ...
)
UNION ALL
(
SELECT column1, column2, ..., COUNT(*) AS the_count
FROM tableB
GROUP BY column1, column2, ...
MINUS
SELECT column1, column2, ..., COUNT(*) AS the_count
FROM tableA
GROUP BY column1, column2, ...
)

在 Oracle 中,您可以通过组合使用插入全部和完全外部连接来找到两个表的差异。在 sql 中,您可以通过完全外部连接来提取差异,但是似乎在 sql 中插入 all/first 不存在!因此,你必须使用以下查询:

select * from A
full outer join B on
A.pk=B.pk
where A.field1!=B.field1
or A.field2!=B.field2 or A.field3!=B.field3 or A.field4!=B.field4
--and A.Date==Date1

尽管在 where 子句中不推荐使用‘ OR’,并且通常会导致性能下降,但是如果表不是大型的,您仍然可以使用上面的查询。 如果上面的查询有任何结果,那就是基于字段1、2、3、4的比较的两个表的差异。为了提高查询性能,还可以根据日期对其进行过滤(检查注释部分)

我们可以使用下面的简单查询比较来自 DB2表的两个表的数据,

步骤1:-从模式(S)的表(T1)中选择需要比较的所有列

     SELECT T1.col1,T1.col3,T1.col5 from S.T1

步骤2:-使用‘ Minus’关键字比较2个表。

步骤3:-从模式(S)的表(T2)中选择需要比较的所有列

     SELECT T2.col1,T2.col3,T2.col5 from S.T1

END result:

     SELECT T1.col1,T1.col3,T1.col5 from S.T1
MINUS
SELECT T2.col1,T2.col3,T2.col5 from S.T1;

如果查询不返回任何行,则数据完全相同。

比较 T1(PK,A,B)和 T2(PK,A,B)。

首先比较主键集,找出两边缺少的键值:

SELECT T1.*, T2.* FROM T1 FULL OUTER JOIN T2 ON T1.PK=T2.PK WHERE T1.PK IS NULL OR T2.PK IS NULL;

然后列出所有不匹配的值:

SELECT T1.PK, 'A' AS columnName, T1.A AS leftValue, T2.A AS rightValue FROM T1 JOIN T2 ON T1.PK=T2.PK WHERE COALESCE(T1.A,0) != COALESCE(T2.A,0)
UNION ALL
SELECT T1.PK, 'B' AS columnName, T1.B AS leftValue, T2.B AS rightValue FROM T1 JOIN T2 ON T1.PK=T2.PK WHERE COALESCE(T1.B,0) != COALESCE(T2.B,0)

A and B must be of same type. You can use INFORMATION SCHEMA to generate the SELECT. Don't forget the COALESCE to also include IS NULL results. You could also use FULL OUTER JOIN and COALESCE(T1.PK,0)=COALESCE(T2.PK,0).

例如,对于 varchar 类型的列:

SELECT concat('SELECT T1.PK, ''', COLUMN_NAME, ''' AS columnName, T1.', COLUMN_NAME, ' AS leftValue, T2.', COLUMN_NAME, ' AS rightValue FROM T1 JOIN T2 ON T1.PK=T2.PK WHERE COALESCE(T1.',COLUMN_NAME, ',0)!=COALESCE(T2.', COLUMN_NAME, ',0)')
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME='T1' AND DATA_TYPE IN ('nvarchar','varchar');

资料来源: 使用 NATURALFULLJOIN 比较 LukasEder 的 SQL 中的两个表

使用 NATURAL FULL JOIN检测两个表之间相同/不同行的聪明方法。

例子一-状态标志:

SELECT t1.*, t2.*, CASE WHEN t1 IS NULL OR t2 IS NULL THEN 'Not equal' ELSE 'Equal' END
FROM t1
NATURAL FULL JOIN t2;

Example 2 - filtering rows

SELECT *
FROM (SELECT 't1' AS t1, t1.* FROM t1) t1
NATURAL FULL JOIN (SELECT 't2' AS t2, t2.* FROM t2) t2
WHERE t1 IS NULL OR t2 IS NULL -- show differences
--WHERE  t1 IS NOT NULL AND t2 IS NOT NULL    -- show the same

Db < > 小提琴演奏

在 SQLServer 中... 使用行计数,然后将其与 intersect 的行计数进行比较:

DECLARE @t1count int = (SELECT COUNT(*) FROM table1)


IF (@t1count = (SELECT COUNT(*) FROM table2))
IF (SELECT COUNT (*) FROM (SELECT * FROM table1 INTERSECT SELECT * FROM table2) AS dT) = @t1count
SELECT 'Equal'
ELSE SELECT 'Not equal'
ELSE
SELECT 'Not equal'

我这样写是为了当表的行数不相等时,可以完全跳过 intersect,这样可以提高这些情况下的性能。

试试这个

SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'table1'
intersect
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'table2';
select count(a.A) from A a
inner join B b on b.A = a.A
and b.B = a.B
and b.C = a.C
and b.D = a.D
and b.E = a.E
and b.F = a.F
if the answer equals with the count of table A that means table A and B are have exactly same data , but if the answer is 0 then the Table A is not equals B .