Convert commas decimal separators to dots within a Dataframe

小开

最佳答案

pandas.read_csv has a decimal parameter for this: doc

I.e. try with:

df = pd.read_csv(Input, delimiter=";", decimal=",")

小开

I think the earlier mentioned answer of including decimal="," in pandas read_csv is the preferred option.

However, I found it is incompatible with the Python parsing engine. e.g. when using skiprow=, read_csv will fall back to this engine and thus you can't use skiprow= and decimal= in the same read_csv statement as far as I know. Also, I haven't been able to actually get the decimal= statement to work (probably due to me though)

The long way round I used to achieving the same result is with list comprehensions, .replace and .astype. The major downside to this method is that it needs to be done one column at a time:

df = pd.DataFrame({'a': ['120,00', '42,00', '18,00', '23,00'],
'b': ['51,23', '18,45', '28,90', '133,00']})


df['a'] = [x.replace(',', '.') for x in df['a']]


df['a'] = df['a'].astype(float)

Now, column a will have float type cells. Column b still contains strings.

Note that the .replace used here is not pandas' but rather Python's built-in version. Pandas' version requires the string to be an exact match or a regex.

小开

I answer to the question about how to change the decimal comma to the decimal dot with Python Pandas.

$ cat test.py
import pandas as pd
df = pd.read_csv("test.csv", quotechar='"', decimal=",")
df.to_csv("test2.csv", sep=',', encoding='utf-8', quotechar='"', decimal='.')

where we specify the reading in decimal separator as comma while the output separator is specified as dot. So

$ cat test.csv
header,header2
1,"2,1"
3,"4,0"
$ cat test2.csv
,header,header2
0,1,2.1
1,3,4.0

where you see that the separator has changed to dot.

小开

stallasia's answer looks like the best one.

However, if you want to change the separator when you already have a dataframe, you could do :

df['a'] = df['a'].str.replace(',', '.').astype(float)

小开

Thanks for the great answers. I just want to add that in my case just using decimal=',' did not work because I had numbers like 1.450,00 (with thousands separator), therefore pandas did not recognize it, but passing thousands='.' helped to read the file correctly:

df = pd.read_csv(
Input,
delimiter=";",
decimal=","
thousands="."
)