I am attempting to work with a very large dataset that has some non-standard characters in it. I need to use unicode, as per the job specs, but I am baffled. (And quite possibly doing it all wrong.)
I open the CSV using:
15 ncesReader = csv.reader(open('geocoded_output.csv', 'rb'), delimiter='\t', quotechar='"')
Then, I attempt to encode it with:
name=school_name.encode('utf-8'), street=row[9].encode('utf-8'), city=row[10].encode('utf-8'), state=row[11].encode('utf-8'), zip5=row[12], zip4=row[13],county=row[25].encode('utf-8'), lat=row[22], lng=row[23])
I'm encoding everything except the lat and lng because those need to be sent out to an API. When I run the program to parse the dataset into what I can use, I get the following Traceback.
Traceback (most recent call last):
File "push_into_db.py", line 80, in <module>
main()
File "push_into_db.py", line 74, in main
district_map = buildDistrictSchoolMap()
File "push_into_db.py", line 32, in buildDistrictSchoolMap
county=row[25].encode('utf-8'), lat=row[22], lng=row[23])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 2: ordinal not in range(128)
I think I should tell you that I'm using python 2.7.2, and this is part of an app build on django 1.4. I've read several posts on this topic, but none of them seem to directly apply. Any help will be greatly appreciated.
You might also want to know that some of the non-standard characters causing the issue are Ñ and possibly É.