如何在 CSV 文件中编写 UTF-8

小开

The examples in the Python documentation show how to write Unicode CSV files: http://docs.python.org/2/library/csv.html#examples

(can't copy the code here because it's protected by copyright)

小开

Use this package, it just works: https://github.com/jdunck/python-unicodecsv.

小开

From your shell run:

pip2 install unicodecsv

And (unlike the original question) presuming you're using Python's built in csv module, turn
import csv into
import unicodecsv as csv in your code.

小开

最佳答案

It's very simple for Python 3.x (docs).

import csv


with open('output_file_name', 'w', newline='', encoding='utf-8') as csv_file:
writer = csv.writer(csv_file, delimiter=';')
writer.writerow('my_utf8_string')

For Python 2.x, look here.

小开

A very simple hack is to use the json import instead of csv. For example instead of csv.writer just do the following:

    fd = codecs.open(tempfilename, 'wb', 'utf-8')
for c in whatever :
fd.write( json.dumps(c) [1:-1] )   # json dumps writes ["a",..]
fd.write('\n')
fd.close()

Basically, given the list of fields in correct order, the json formatted string is identical to a csv line except for [ and ] at the start and end respectively. And json seems to be robust to utf-8 in python 2.*

小开

For me the UnicodeWriter class from Python 2 CSV module documentation didn't really work as it breaks the csv.writer.write_row() interface.

For example:

csv_writer = csv.writer(csv_file)
row = ['The meaning', 42]
csv_writer.writerow(row)

works, while:

csv_writer = UnicodeWriter(csv_file)
row = ['The meaning', 42]
csv_writer.writerow(row)

will throw AttributeError: 'int' object has no attribute 'encode'.

As UnicodeWriter obviously expects all column values to be strings, we can convert the values ourselves and just use the default CSV module:

def to_utf8(lst):
return [unicode(elem).encode('utf-8') for elem in lst]


...
csv_writer.writerow(to_utf8(row))

Or we can even monkey-patch csv_writer to add a write_utf8_row function - the exercise is left to the reader.

小开

For python2 you can use this code before csv_writer.writerows(rows)
This code will NOT convert integers to utf-8 strings

def encode_rows_to_utf8(rows):
encoded_rows = []
for row in rows:
encoded_row = []
for value in row:
if isinstance(value, basestring):
value = unicode(value).encode("utf-8")
encoded_row.append(value)
encoded_rows.append(encoded_row)
return encoded_rows

小开

I tried using Bojan's suggestion but it turned all the None cells into the word None rather than blank, and rendered floats as 1.231111111111111e+11, maybe other annoyances. Plus, I want my program to run under both Python3 and Python2. So, I ended up putting at the top of the program:

try:
csv.writer(open(os.devnull, 'w')).writerow([u'\u03bc'])
PREPROCESS = lambda array: array
except UnicodeEncodeError:
logging.warning('csv module cannot handle unicode, patching...')
PREPROCESS = lambda array: [
item.encode('utf8')
if hasattr(item, 'encode') else item
for item in array
]

Then changed all csvout.writerow(row) statements to csvout.writerow(PREPROCESS(row))

I could have used the test if sys.version_info < (3,): instead of the try statement but that violates "duck typing". I may revisit it and write that first one-liner properly with with statements, to get rid of the dangling open file and writer, but then I'd have to use ALL_CAPS variable names or pylint would complain... it should get garbage collected anyway, and in any case only lasts while the script is running.