在 postgreql 中的“ copy from”中忽略重复的键

小开

插入到按键分组的临时表中，这样就可以消除重复项

然后插入，如果不存在

小开

最佳答案

使用与您描述的相同的方法，但是在加载到主表之前，在临时表中重复 DELETE(或组，或修改...)。

比如:

CREATE TEMP TABLE tmp_table
ON COMMIT DROP
AS
SELECT *
FROM main_table
WITH NO DATA;


COPY tmp_table FROM 'full/file/name/here';


INSERT INTO main_table
SELECT DISTINCT ON (PK_field) *
FROM tmp_table
ORDER BY (some_fields)

详情: CREATE TABLE AS，COPY，DISTINCT ON

小开

伊戈尔的回答对我帮助很大，但我也遇到了内特在他的评论中提到的问题。然后我遇到了一个问题(可能除了这里的问题之外) ，即新数据不仅包含内部重复的内容，而且还包含与现有数据的重复内容。对我起作用的是以下几点。

CREATE TEMP TABLE tmp_table AS SELECT * FROM newsletter_subscribers;
COPY tmp_table (name, email) FROM stdin DELIMITER ' ' CSV;
SELECT count(*) FROM tmp_table;  -- Just to be sure
TRUNCATE newsletter_subscribers;
INSERT INTO newsletter_subscribers
SELECT DISTINCT ON (email) * FROM tmp_table
ORDER BY email, subscription_status;
SELECT count(*) FROM newsletter_subscribers;  -- Paranoid again

内部和外部的重复在 tmp_table中变得相同，然后 DISTINCT ON (email)部分去除它们。ORDER BY确保所需的行首先出现在结果集中，然后 DISTINCT放弃所有进一步的行。

小开

PostgreSQL 9.5现在有了上传功能。您可以遵循 Igor 的说明，但是最终的 INSERT 包括子句 ON CONFLICT DO NONothing。

INSERT INTO main_table
SELECT *
FROM tmp_table
ON CONFLICT DO NOTHING

小开

用于使用 COPY FROM，防止目标表和源文件中的重复(在本地实例中验证结果)。

这也应该工作在红移，但我还没有验证它。

-- Target table
CREATE TABLE target_table
(id integer PRIMARY KEY, firstname varchar(100), lastname varchar(100));
INSERT INTO target_table (id, firstname, lastname) VALUES (14, 'albert', 'einstein');
INSERT INTO target_table (id, firstname, lastname) VALUES (4, 'isaac', 'newton');


-- COPY FROM with protection against duplicates in the target table as well as in the source file
BEGIN;
CREATE TEMP TABLE source_file_table ON COMMIT DROP AS (
SELECT * FROM target_table
)
WITH NO DATA;


-- Simulating COPY FROM
INSERT INTO source_file_table (id, firstname, lastname) VALUES (14, 'albert', 'einstein');
INSERT INTO source_file_table (id, firstname, lastname) VALUES (7, 'marie', 'curie');
INSERT INTO source_file_table (id, firstname, lastname) VALUES (7, 'marie', 'curie');
INSERT INTO source_file_table (id, firstname, lastname) VALUES (7, 'marie', 'curie');
INSERT INTO source_file_table (id, firstname, lastname) VALUES (5, 'Neil deGrasse', 'Tyson');


-- for protection agains duplicate in target_table
UPDATE source_file_table SET id=NULL
FROM target_table WHERE source_file_table.id=target_table.id;


INSERT INTO target_table
SELECT * FROM source_file_table
-- for protection agains duplicate in target_table
WHERE source_file_table.id IS NOT NULL
-- for protection agains duplicate in source file
UNION
(SELECT * FROM source_file_table
WHERE source_file_table.id IS NOT NULL
LIMIT 1);
COMMIT;