我可以在 PostgreSQL 中从带头的 csv 文件自动创建一个表吗？

小开

最佳答案

You can't find anything in the COPY documentation, because COPY cannot create a table for you.
You need to do that before you can COPY to it.

小开

There is a very good tool that imports tables into Postgres from a csv file. It is a command-line tool called pgfutter (with binaries for windows, linux, etc.). One of its big advantages is that it recognizes the attribute/column names as well.

The usage of the tool is simple. For example if you'd like to import myCSVfile.csv:

pgfutter --db "myDatabase" --port "5432" --user "postgres" --pw "mySecretPassword" csv myCSVfile.csv

This will create a table (called myCSVfile) with the column names taken from the csv file's header. Additionally the data types will be identified from the existing data.

A few notes: The command pgfutter varies depending on the binary you use, e.g. it could be pgfutter_windows_amd64.exe (rename it if you intend to use this command frequently). The above command has to be executed in a command line window (e.g. in Windows run cmd and ensure pgfutter is accessible). If you'd like to have a different table name add --table "myTable"; to select a particular database schema us --schema "mySchema". In case you are accessing an external database use --host "myHostDomain".

A more elaborate example of pgfutter to import myFile into myTable is this one:

pgfutter --host "localhost" --port "5432" --db "myDB" --schema "public" --table "myTable" --user "postgres" --pw "myPwd" csv myFile.csv

Most likely you will change a few data types (from text to numeric) after the import:

alter table myTable
alter column myColumn type numeric
using (trim(myColumn)::numeric)

小开

There is a second approach, which I found here (from mmatt). Basically you call a function within Postgres (last argument specifies the number of columns).

select load_csv_file('myTable','C:/MyPath/MyFile.csv',24)

Here is mmatt's function code, which I had to modify slightly, because I am working on the public schema. (copy&paste into PgAdmin SQL Editor and run it to create the function)

CREATE OR REPLACE FUNCTION load_csv_file(
target_table text,
csv_path text,
col_count integer)
RETURNS void AS
$BODY$


declare


iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet


begin
set schema 'public';


create table temp_table ();


-- add just enough number of columns
for iter in 1..col_count
loop
execute format('alter table temp_table add column col_%s text;', iter);
end loop;


-- copy the data from csv file
execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);


iter := 1;
col_first := (select col_1 from temp_table limit 1);


-- update the column names based on the first row which has the column names
for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
loop
execute format('alter table temp_table rename column col_%s to %s', iter, col);
iter := iter + 1;
end loop;


-- delete the columns row
execute format('delete from temp_table where %s = %L', col_first, col_first);


-- change the temp table name to the name given as parameter, if not blank
if length(target_table) > 0 then
execute format('alter table temp_table rename to %I', target_table);
end if;


end;


$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION load_csv_file(text, text, integer)
OWNER TO postgres;

Note: There is a common issue with importing text files related to encoding. The csv file should be in UTF-8 format. However, sometimes this is not quite achieved by the programs, which try to do the encoding. I have overcome this issue by opening the file in Notepad++ and converting it to ANSI and back to UTF8.

小开

For a single table, I did very simply, quickly and online through one of the many good converters that can be found on the web. Just google convert csv to sql online and choose one.

小开

I achieved it with this steps:

Convert the csv file to utf8

    iconv -f ISO-8859-1 -t UTF-8 file.txt -o file.csv

Use this python script to create the sql to create table and copy

#!/usr/bin/env python3
import csv, os
#pip install python-slugify
from slugify import slugify


origem = 'file.csv'
destino = 'file.sql'
arquivo = os.path.abspath(origem)


d = open(destino,'w')
with open(origem,'r') as f:


header = f.readline().split(';')
head_cells = []
for cell in header:
value = slugify(cell,separator="_")
if value in head_cells:
value = value+'_2'
head_cells.append(value)
#cabecalho = "{}\n".format(';'.join(campos))


#print(cabecalho)
fields= []
for cell in head_cells:
fields.append(" {} text".format(cell))
table = origem.split('.')[0]
sql = "create table {} ( \n {} \n);".format(origem.split('.')[0],",\n".join(fields))
sql += "\n COPY {} FROM '{}' DELIMITER ';' CSV HEADER;".format(table,arquivo)


print(sql)
d.write(sql)

3.Run the script with

python3 importar.py

Optional: Edit the sql script to adjust the field types (all are text by default)

Run the sql script. Short for console

sudo -H -u postgres bash -c "psql mydatabase < file.sql"

小开

I am using csvsql to generate the table layout (it will automatically guess the format):

head -n 20 table.csv | csvsql --no-constraints --tables table_name

And then I use \COPY in psql. That's for me the fastest way to import CSV file.

You can also use sed with csvsql in order to get the desired datatype:

head -n 20 table.csv | csvsql --no-constraints --tables table_name  | sed 's/DECIMAL/NUMERIC/' | sed 's/VARCHAR/TEXT/' | sed 's/DATETIME/TIMESTAMP'

小开

I haven't used it, but pgLoader (https://pgloader.io/) is recommended by the pgfutter developers (see answer above) for more complicated problems. It looks very capable.

小开

Use sqlite as intermediate step.

Steps:

In the command prompt type: sqlite3
In the sqlite3 CLI type: .mode csv
.import my_csv.csv my_table
.output my_table_sql.sql
.dump my_table
Finally execute that sql in your Postgresql

小开

You can create a new table in DBeaver out of a CSV.

小开

Automatic creation seems to be pretty easy with Python+Pandas

Install sqlalchemy library in your Python environment pip install SQLAlchemy==1.4.31

import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password@localhost:5432/mydatabase')
df=pd.read_csv('example.csv')
df.to_sql('table_name', engine)