有没有办法只复制熊猫数据框架的结构(而不是数据) ？

小开

In version 0.18 of pandas, the DataFrame constructor has no options for creating a dataframe like another dataframe with NaN instead of the values.

The code you use df2 = pd.DataFrame(columns=df1.columns, index=df1.index) is the most logical way, the only way to improve on it is to spell out even more what you are doing is to add data=None, so that other coders directly see that you intentionally leave out the data from this new DataFrame you are creating.

TLDR: So my suggestion is:

Explicit is better than implicit

df2 = pd.DataFrame(data=None, columns=df1.columns, index=df1.index)

Very much like yours, but more spelled out.

小开

Let's start with some sample data

In [1]: import pandas as pd


In [2]: df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']],
...:                   columns=['num', 'char'])


In [3]: df
Out[3]:
num char
0    1    a
1    2    b
2    3    c


In [4]: df.dtypes
Out[4]:
num      int64
char    object
dtype: object

Now let's use a simple `DataFrame` initialization using the columns of the original `DataFrame` but providing no data:

In [5]: empty_copy_1 = pd.DataFrame(data=None, columns=df.columns)


In [6]: empty_copy_1
Out[6]:
Empty DataFrame
Columns: [num, char]
Index: []


In [7]: empty_copy_1.dtypes
Out[7]:
num     object
char    object
dtype: object

As you can see, the column data types are not the same as in our original DataFrame.

So, if you want to preserve the column `dtype`...

If you want to preserve the column data types you need to construct the DataFrame one Series at a time

In [8]: empty_copy_2 = pd.DataFrame.from_items([
...:     (name, pd.Series(data=None, dtype=series.dtype))
...:     for name, series in df.iteritems()])


In [9]: empty_copy_2
Out[9]:
Empty DataFrame
Columns: [num, char]
Index: []


In [10]: empty_copy_2.dtypes
Out[10]:
num      int64
char    object
dtype: object

小开

A simple alternative -- first copy the basic structure or indexes and columns with datatype from the original dataframe (df1) into df2

df2 = df1.iloc[0:0]

Then fill your dataframe with empty rows -- pseudocode that will need to be adapted to better match your actual structure:

s = pd.Series([Nan,Nan,Nan], index=['Col1', 'Col2', 'Col3'])

loop through the rows in df1

df2 = df2.append(s)

小开

最佳答案

That's a job for reindex_like. Start with the original:

df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])

Construct an empty DataFrame and reindex it like df1:

pd.DataFrame().reindex_like(df1)
Out:
c1  c2
i1 NaN NaN
i2 NaN NaN

小开

You can simply mask by notna() i.e

df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])


df2 = df1.mask(df1.notna())


c1  c2
i1 NaN NaN
i2 NaN NaN

小开

This has worked for me in pandas 0.22: df2 = pd.DataFrame(index=df.index.delete(slice(None)), columns=df.columns)

Convert types: df2 = df2.astype(df.dtypes)

delete(slice(None)) In case you do not want to keep the values of the indexes.

小开

I know this is an old question, but I thought I would add my two cents.

def df_cols_like(df):
"""
Returns an empty data frame with the same column names and types as df
"""
df2 = pd.DataFrame({i[0]: pd.Series(dtype=i[1])
for i in df.dtypes.iteritems()},
columns=df.dtypes.index)
return df2

This approach centers around the df.dtypes attribute of the input data frame, df, which is a pd.Series. A pd.DataFrame is constructed from a dictionary of empty pd.Series objects named using the input column names with the column order being taken from the input df.

小开

Not exactly answering this question, but a similar one for people coming here via a search engine

My case was creating a copy of the data frame without data and without index. One can achieve this by doing the following. This will maintain the dtypes of the columns.

empty_copy = df.drop(df.index)

小开

A simple way to copy df structure into df2 is:

df2 = pd.DataFrame(columns=df.columns)

小开

To preserve column type you can use the astype method, like pd.DataFrame(columns=df1.columns).astype(df1.dtypes)

import pandas as pd


df1 = pd.DataFrame(
[
[11, 12, 'Alice'],
[21, 22, 'Bob']
],
columns=['c1', 'c2', 'c3'],
index=['i1', 'i2']
)


df2 = pd.DataFrame(columns=df1.columns).astype(df1.dtypes)
print(df2.shape)
print(df2.dtypes)

output:

(0, 3)
c1     int64
c2     int64
c3    object
dtype: object

Working example