熊猫数据框架的开始索引为1

在将熊猫数据框写入 CSV 时,我需要索引从1开始,而不是从0开始。

这里有一个例子:

In [1]: import pandas as pd


In [2]: result = pd.DataFrame({'Count': [83, 19, 20]})


In [3]: result.to_csv('result.csv', index_label='Event_id')

其产出如下:

In [4]: !cat result.csv
Event_id,Count
0,83
1,19
2,20

但我想要的结果是这样的:

In [5]: !cat result2.csv
Event_id,Count
1,83
2,19
3,20

我意识到这可以通过在我的数据框中添加一个移位了1的整数序列来实现,但我对熊猫并不熟悉,我想知道是否存在一种更干净的方式。

172865 次浏览

Just set the index before writing to CSV.

df.index = np.arange(1, len(df) + 1)

And then write it normally. ​

Index is an object, and default index starts from 0:

>>> result.index
Int64Index([0, 1, 2], dtype=int64)

You can shift this index by 1 with

>>> result.index += 1
>>> result.index
Int64Index([1, 2, 3], dtype=int64)

source: In Python pandas, start row index from 1 instead of zero without creating additional column

Working example:

import pandas as pdas
dframe = pdas.read_csv(open(input_file))
dframe.index = dframe.index + 1

Another way in one line:

df.shift()[1:]

This worked for me

 df.index = np.arange(1, len(df)+1)

You can use this one:

import pandas as pd


result = pd.DataFrame({'Count': [83, 19, 20]})
result.index += 1
print(result)

or this one, by getting the help of numpy library like this:

import pandas as pd
import numpy as np


result = pd.DataFrame({'Count': [83, 19, 20]})
result.index = np.arange(1, len(result)+1)
print(result)

np.arange will create a numpy array and return values within a given interval which is (1, len(result)+1) and finally you will assign that array to result.index.

Fork from the original answer, giving some cents:

  • if I'm not mistaken, starting from version 0.23, index object is RangeIndex type

From the official doc:

RangeIndex is a memory-saving special case of Int64Index limited to representing monotonic ranges. Using RangeIndex may in some instances improve computing speed.

In case of a huge index range, that makes sense, using the representation of the index, instead of defining the whole index at once (saving memory).

Therefore, an example (using Series, but it applies to DataFrame also):

>>> import pandas as pd
>>>
>>> countries = ['China', 'India', 'USA']
>>> ds = pd.Series(countries)
>>>
>>>
>>> type(ds.index)
<class 'pandas.core.indexes.range.RangeIndex'>
>>> ds.index
RangeIndex(start=0, stop=3, step=1)
>>>
>>> ds.index += 1
>>>
>>> ds.index
RangeIndex(start=1, stop=4, step=1)
>>>
>>> ds
1    China
2    India
3      USA
dtype: object
>>>

As you can see, the increment of the index object, changes the start and stop parameters.

use this

df.index = np.arange(1, len(df)+1)

In my opinion best practice is to set the index with a RangeIndex

import pandas as pd


result = pd.DataFrame(
{'Count': [83, 19, 20]},
index=pd.RangeIndex(start=1, stop=4, name='index')
)
>>> result
Count
index
1         83
2         19
3         20

I prefer this, because you can define the range and a possible step and a name for the index in one line.

This adds a column that accomplishes what you want

df.insert(0,"Column Name", np.arange(1,len(df)+1))

Add ".shift()[1:]" while creating a data frame

data = pd.read_csv(r"C:\Users\user\path\data.csv").shift()[1:]