You can use Notepad++ to evaluate a file's encoding without needing to write code. The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.
# install the chardet library
!pip install chardet
# import the chardet library
import chardet
# use the detect method to find the encoding
# 'rb' means read in the file as binary
with open("test.csv", 'rb') as file:
print(chardet.detect(file.read()))
CSV files have no headers indicating the encoding.
You can only guess by looking at:
the platform / application the file was created on
the bytes in the file
In 2021, emoticons are widely used, but many import tools fail to import them. The chardet library is often recommended in the answers above, but the lib does not handle emoticons well.
icecream = '🍦'
import csv
with open('test.csv', 'w') as f:
wf = csv.writer(f)
wf.writerow(['ice cream', icecream])
import chardet
with open('test.csv', 'rb') as f:
print(chardet.detect(f.read()))
{'encoding': 'Windows-1254', 'confidence': 0.3864823918622268, 'language': 'Turkish'}
This gives UnicodeDecodeError while trying to read the file with this encoding.
The default encoding on Mac is UTF-8. It's included explicitly here but that wasn't even necessary... but on Windows it might be.
with open('test.csv', 'r', encoding='utf-8') as f:
print(f.read())
ice cream,🍦
The file command also picked this up
file test.csv
test.csv: UTF-8 Unicode text, with CRLF line terminators
My advice in 2021, if the automatic detection goes wrong: try UTF-8 before resorting to chardet.
As it is mentioned by @3724913 (Jitender Kumar) to use file command (it also works in WSL on Windows), I was able to get encoding information of a csv file by executing file --exclude encoding blah.csv using info available on man file as file blah.csv won't show the encoding info on my system.