Convert commas decimal separators to dots within a Dataframe

I am importing a CSV file like the one below, using pandas.read_csv:

df = pd.read_csv(Input, delimiter=";")

Example of CSV file:

10;01.02.2015 16:58;01.02.2015 16:58;-0.59;0.1;-4.39;NotApplicable;0.79;0.2
11;01.02.2015 16:58;01.02.2015 16:58;-0.57;0.2;-2.87;NotApplicable;0.79;0.21

The problem is that when I later on in my code try to use these values I get this error: TypeError: can't multiply sequence by non-int of type 'float'

The error is because the number I'm trying to use is not written with a dot (.) as a decimal separator but a comma(,). After manually changing the commas to a dots my program works.

I can't change the format of my input, and thus have to replace the commas in my DataFrame in order for my code to work, and I want python to do this without the need of doing it manually. Do you have any suggestions?

143849 次浏览

pandas.read_csv has a decimal parameter for this: doc

I.e. try with:

df = pd.read_csv(Input, delimiter=";", decimal=",")

I think the earlier mentioned answer of including decimal="," in pandas read_csv is the preferred option.

However, I found it is incompatible with the Python parsing engine. e.g. when using skiprow=, read_csv will fall back to this engine and thus you can't use skiprow= and decimal= in the same read_csv statement as far as I know. Also, I haven't been able to actually get the decimal= statement to work (probably due to me though)

The long way round I used to achieving the same result is with list comprehensions, .replace and .astype. The major downside to this method is that it needs to be done one column at a time:

df = pd.DataFrame({'a': ['120,00', '42,00', '18,00', '23,00'],
'b': ['51,23', '18,45', '28,90', '133,00']})


df['a'] = [x.replace(',', '.') for x in df['a']]


df['a'] = df['a'].astype(float)

Now, column a will have float type cells. Column b still contains strings.

Note that the .replace used here is not pandas' but rather Python's built-in version. Pandas' version requires the string to be an exact match or a regex.

I answer to the question about how to change the decimal comma to the decimal dot with Python Pandas.

$ cat test.py
import pandas as pd
df = pd.read_csv("test.csv", quotechar='"', decimal=",")
df.to_csv("test2.csv", sep=',', encoding='utf-8', quotechar='"', decimal='.')

where we specify the reading in decimal separator as comma while the output separator is specified as dot. So

$ cat test.csv
header,header2
1,"2,1"
3,"4,0"
$ cat test2.csv
,header,header2
0,1,2.1
1,3,4.0

where you see that the separator has changed to dot.

stallasia's answer looks like the best one.

However, if you want to change the separator when you already have a dataframe, you could do :

df['a'] = df['a'].str.replace(',', '.').astype(float)

Thanks for the great answers. I just want to add that in my case just using decimal=',' did not work because I had numbers like 1.450,00 (with thousands separator), therefore pandas did not recognize it, but passing thousands='.' helped to read the file correctly:

df = pd.read_csv(
Input,
delimiter=";",
decimal=","
thousands="."
)