熊猫 read_csv dtype 读取所有列，但很少作为字符串读取

小开

最佳答案

EDIT - sorry, I misread your question. Updated my answer.

You can read the entire csv as strings then convert your desired columns to other types afterwards like this:

df = pd.read_csv('/path/to/file.csv', dtype=str)
# example df; yours will be from pd.read_csv() above
df = pd.DataFrame({'A': ['1', '3', '5'], 'B': ['2', '4', '6'], 'C': ['x', 'y', 'z']})
types_dict = {'A': int, 'B': float}
for col, col_type in types_dict.items():
df[col] = df[col].astype(col_type)

Another approach, if you really want to specify the proper types for all columns when reading the file in and not change them after: read in just the column names (no rows), then use those to fill in which columns should be strings

col_names = pd.read_csv('file.csv', nrows=0).columns
types_dict = {'A': int, 'B': float}
types_dict.update({col: str for col in col_names if col not in types_dict})
pd.read_csv('file.csv', dtype=types_dict)

小开

I recently encountered the same issue, though I only have one csv file so I don't need to loop over files. I think this solution can be adapted into a loop as well.

Here I present a solution I used. Pandas' read_csv has a parameter called converters which overrides dtype, so you may take advantage of this feature.

An example code is as follows: Assume that our data.csv file contains all float64 columns except A and B which are string columns. You may read this file using:

df = pd.read_csv('data.csv', dtype = 'float64', converters = {'A': str, 'B': str})

The code gives warnings that converters override dtypes for these two columns A and B, and the result is as desired.

Regarding looping over several csv files all one needs to do is to figure out which columns will be exceptions to put in converters. This is easy if files have a similar pattern of column names, otherwise, it would get tedious.

小开

You can do the following:

pd.read_csv(self._LOCAL_FILE_PATH,
index_col=0,
encoding="utf-8",
dtype={
'customer_id': 'int32',
'product_id': 'int32',
'subcategory_id': 'int16',
'category_id': 'int16',
'gender': 'int8',
'views': 'int8',
'purchased': 'int8',
'added': 'int8',
'time_on_page': 'float16',
})

小开

Extending on @MECoskun's answer using converters and simultaneously striping leading and trailing white spaces, making converters more versatile:

df = pd.read_csv('data.csv', dtype = 'float64', converters = {'A': str.strip, 'B': str.strip})

There is also lstrip and rstrip that could be used if needed instead of strip. Note, do not use strip() but just strip. Of course, you do not strip non strings.