我可以在 PostgreSQL 中从带头的 csv 文件自动创建一个表吗?

我在 OS X 10.6.8上运行 PostgreSQL 9.2.6。我想导入数据从一个 CSV 文件与列标题到数据库。我可以使用 COPY语句来完成这项工作,但是只有在我首先手动为 CSV 文件中的每个列创建一个包含一列的表时才能做到这一点。有没有办法根据 CSV 文件中的头文件自动创建这个表?

根据 这个问题我已经试过了

COPY test FROM '/path/to/test.csv' CSV HEADER;

但是我得到了这个错误:

ERROR: relation "test" does not exist

如果我首先创建一个没有列的表:

CREATE TABLE test ();

我得到了:

ERROR: extra data after last expected column

我在 PostgreSQL文件复制中找不到任何关于自动创建表的内容。是否有其他方法来自动创建一个带头的 CSV 文件表?

123972 次浏览

You can't find anything in the COPY documentation, because COPY cannot create a table for you.
You need to do that before you can COPY to it.

There is a very good tool that imports tables into Postgres from a csv file. It is a command-line tool called pgfutter (with binaries for windows, linux, etc.). One of its big advantages is that it recognizes the attribute/column names as well.

The usage of the tool is simple. For example if you'd like to import myCSVfile.csv:

pgfutter --db "myDatabase" --port "5432" --user "postgres" --pw "mySecretPassword" csv myCSVfile.csv

This will create a table (called myCSVfile) with the column names taken from the csv file's header. Additionally the data types will be identified from the existing data.

A few notes: The command pgfutter varies depending on the binary you use, e.g. it could be pgfutter_windows_amd64.exe (rename it if you intend to use this command frequently). The above command has to be executed in a command line window (e.g. in Windows run cmd and ensure pgfutter is accessible). If you'd like to have a different table name add --table "myTable"; to select a particular database schema us --schema "mySchema". In case you are accessing an external database use --host "myHostDomain".

A more elaborate example of pgfutter to import myFile into myTable is this one:

pgfutter --host "localhost" --port "5432" --db "myDB" --schema "public" --table "myTable" --user "postgres" --pw "myPwd" csv myFile.csv

Most likely you will change a few data types (from text to numeric) after the import:

alter table myTable
alter column myColumn type numeric
using (trim(myColumn)::numeric)

There is a second approach, which I found here (from mmatt). Basically you call a function within Postgres (last argument specifies the number of columns).

select load_csv_file('myTable','C:/MyPath/MyFile.csv',24)

Here is mmatt's function code, which I had to modify slightly, because I am working on the public schema. (copy&paste into PgAdmin SQL Editor and run it to create the function)

CREATE OR REPLACE FUNCTION load_csv_file(
target_table text,
csv_path text,
col_count integer)
RETURNS void AS
$BODY$


declare


iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet


begin
set schema 'public';


create table temp_table ();


-- add just enough number of columns
for iter in 1..col_count
loop
execute format('alter table temp_table add column col_%s text;', iter);
end loop;


-- copy the data from csv file
execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);


iter := 1;
col_first := (select col_1 from temp_table limit 1);


-- update the column names based on the first row which has the column names
for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
loop
execute format('alter table temp_table rename column col_%s to %s', iter, col);
iter := iter + 1;
end loop;


-- delete the columns row
execute format('delete from temp_table where %s = %L', col_first, col_first);


-- change the temp table name to the name given as parameter, if not blank
if length(target_table) > 0 then
execute format('alter table temp_table rename to %I', target_table);
end if;


end;


$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION load_csv_file(text, text, integer)
OWNER TO postgres;

Note: There is a common issue with importing text files related to encoding. The csv file should be in UTF-8 format. However, sometimes this is not quite achieved by the programs, which try to do the encoding. I have overcome this issue by opening the file in Notepad++ and converting it to ANSI and back to UTF8.

For a single table, I did very simply, quickly and online through one of the many good converters that can be found on the web. Just google convert csv to sql online and choose one.

I achieved it with this steps:

  1. Convert the csv file to utf8
    iconv -f ISO-8859-1 -t UTF-8 file.txt -o file.csv
  1. Use this python script to create the sql to create table and copy
#!/usr/bin/env python3
import csv, os
#pip install python-slugify
from slugify import slugify


origem = 'file.csv'
destino = 'file.sql'
arquivo = os.path.abspath(origem)


d = open(destino,'w')
with open(origem,'r') as f:


header = f.readline().split(';')
head_cells = []
for cell in header:
value = slugify(cell,separator="_")
if value in head_cells:
value = value+'_2'
head_cells.append(value)
#cabecalho = "{}\n".format(';'.join(campos))


#print(cabecalho)
fields= []
for cell in head_cells:
fields.append(" {} text".format(cell))
table = origem.split('.')[0]
sql = "create table {} ( \n {} \n);".format(origem.split('.')[0],",\n".join(fields))
sql += "\n COPY {} FROM '{}' DELIMITER ';' CSV HEADER;".format(table,arquivo)


print(sql)
d.write(sql)


3.Run the script with

python3 importar.py

Optional: Edit the sql script to adjust the field types (all are text by default)

  1. Run the sql script. Short for console
sudo -H -u postgres bash -c "psql mydatabase < file.sql"

I am using csvsql to generate the table layout (it will automatically guess the format):

head -n 20 table.csv | csvsql --no-constraints --tables table_name

And then I use \COPY in psql. That's for me the fastest way to import CSV file.

You can also use sed with csvsql in order to get the desired datatype:

head -n 20 table.csv | csvsql --no-constraints --tables table_name  | sed 's/DECIMAL/NUMERIC/' | sed 's/VARCHAR/TEXT/' | sed 's/DATETIME/TIMESTAMP'

I haven't used it, but pgLoader (https://pgloader.io/) is recommended by the pgfutter developers (see answer above) for more complicated problems. It looks very capable.

Use sqlite as intermediate step.

Steps:

  1. In the command prompt type: sqlite3
  2. In the sqlite3 CLI type: .mode csv
  3. .import my_csv.csv my_table
  4. .output my_table_sql.sql
  5. .dump my_table
  6. Finally execute that sql in your Postgresql

You can create a new table in DBeaver out of a CSV.

Automatic creation seems to be pretty easy with Python+Pandas

Install sqlalchemy library in your Python environment pip install SQLAlchemy==1.4.31

import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password@localhost:5432/mydatabase')
df=pd.read_csv('example.csv')
df.to_sql('table_name', engine)