索引熊猫数据框架: 整数行,命名列

假设 df是一个熊猫数据框架。

  • df.loc[]只接受名字
  • df.iloc[]只接受整数(实际放置)
  • df.ix[]同时接受名称和整数:

在引用行时,df.ix[row_idx, ]只希望被赋予名称。

df = pd.DataFrame({'a' : ['one', 'two', 'three','four', 'five', 'six'],
'1' : np.arange(6)})
df = df.ix[2:6]
print(df)


1      a
2  2  three
3  3   four
4  4   five
5  5    six


df.ix[0, 'a']

抛出一个错误,它不会返回‘ two’。

在引用列时,iloc 更喜欢整数,而不是名称。

df.ix[2, 1]

返回“ three”,而不是2(尽管 df.idx[2, '1']确实返回 2)。

奇怪的是,我喜欢完全相反的功能。通常我的列名是非常有意义的,所以在我的代码中我直接引用它们。但是由于大量的观察清洗,我的熊猫数据帧中的行名通常不对应于 range(len(df))

我意识到我可以用:

df.iloc[0].loc['a'] # returns three

但它看起来很丑陋!有没有人知道更好的方法来实现这一点,使代码看起来像这样?

df.foo[0, 'a'] # returns three

事实上,有没有可能在 pandas.core.frame.DataFrame中加入我自己的新方法,例如。 df.idx(rows, cols)实际上是 df.iloc[rows].loc[cols]

85654 次浏览

It's a late answer, but @unutbu's comment is still valid and a great solution to this problem.

To index a DataFrame with integer rows and named columns (labeled columns):

df.loc[df.index[#], 'NAME'] where # is a valid integer index and NAME is the name of the column.

we can reset the index and then use 0 based indexing like this

df.reset_index(drop=True).loc[0,'a']

edit: removed [] from col name index 'a' so it just outputs the value

Something like df["a"][0] is working fine for me. You may try it out!

For getting or setting a single value in a DataFrame by row/column labels, you better use DataFrame.at instead of DataFrame.loc, as it is ...

  1. faster
  2. you are more explicit about wanting to access only a single value.

How others have already shown, if you start out with an integer position for the row, you still have to find the row-label first with DataFrame.index as DataFrame.at only accepts labels:

df.at[df.index[0], 'a']
# Out: 'three'

Benchmark:

%timeit df.at[df.index[0], 'a']
# 7.57 µs ± 30.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.loc[df.index[0], 'a']
# 10.9 µs ± 53.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.iloc[0, df.columns.get_loc("a")]
# 13.3 µs ± 24 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

For completeness:

DataFrame.iat for accessing a single value for a row/column pair by integer position.

The existing answers seem short-sighted to me.

Problematic Solutions


  1. The strategy here is to get the row label of the 0th row and then use .loc as normal. I see two issues.

    1. If df has repeated row labels, df.loc[df.index[0], 'a'] could return multiple rows.
    2. .loc is slower than .iloc so you're sacrificing speed here.

  2. The strategy here is to reset the index so the row labels become 0, 1, 2, ... thus .loc[0] gives the same result as .iloc[0]. Still, the problem here is runtime, as .loc is slower than .iloc and you'll incur a cost for resetting the index.

Better Solution

I suggest following @Landmaster's comment:

df.iloc[0, df.columns.get_loc("a")]

Essentially, this is the same as df.iloc[0, 0] except we get the column index dynamically using df.columns.get_loc("a").

To index multiple columns such as ['a', 'b', 'c'], use:

df.iloc[0, [df.columns.get_loc(c) for c in ['a', 'b', 'c']]]

Update

This is discussed here as part of my course on Pandas.

A very late answer but it amzed me that pandas still doesn't have such a function after all these years. If it irks you a lot, you can monkey-patch a custom indexer into the DataFrame:

class XLocIndexer:
def __init__(self, frame):
self.frame = frame
    

def __getitem__(self, key):
row, col = key
return self.frame.iloc[row][col]


pd.core.indexing.IndexingMixin.xloc = property(lambda frame: XLocIndexer(frame))


# Usage
df.xloc[0, 'a'] # one