以整数形式获取熊猫数据框架行的索引

例如,假设一个简单的数据框架

    A         B
0   1  0.810743
1   2  0.595866
2   3  0.154888
3   4  0.472721
4   5  0.894525
5   6  0.978174
6   7  0.859449
7   8  0.541247
8   9  0.232302
9  10  0.276566

在给定条件的情况下,如何检索行的索引值? 例如: dfb = df[df['A']==5].index.values.astype(int) 返回 [4],但我想得到的只是 4。这在后面的代码中给我带来了麻烦。

基于某些条件,我希望记录满足该条件的索引,然后在。

我尽力了

dfb = df[df['A']==5].index.values.astype(int)
dfbb = df[df['A']==8].index.values.astype(int)
df.loc[dfb:dfbb,'B']

以获得所需的输出

    A         B
4   5  0.894525
5   6  0.978174
6   7  0.859449

但我得到了 TypeError: '[4]' is an invalid key

326566 次浏览

The easier is add [0] - select first value of list with one element:

dfb = df[df['A']==5].index.values.astype(int)[0]
dfbb = df[df['A']==8].index.values.astype(int)[0]

dfb = int(df[df['A']==5].index[0])
dfbb = int(df[df['A']==8].index[0])

But if possible some values not match, error is raised, because first value not exist.

Solution is use next with iter for get default parameetr if values not matched:

dfb = next(iter(df[df['A']==5].index), 'no match')
print (dfb)
4


dfb = next(iter(df[df['A']==50].index), 'no match')
print (dfb)
no match

Then it seems need substract 1:

print (df.loc[dfb:dfbb-1,'B'])
4    0.894525
5    0.978174
6    0.859449
Name: B, dtype: float64

Another solution with boolean indexing or query:

print (df[(df['A'] >= 5) & (df['A'] < 8)])
A         B
4  5  0.894525
5  6  0.978174
6  7  0.859449


print (df.loc[(df['A'] >= 5) & (df['A'] < 8), 'B'])
4    0.894525
5    0.978174
6    0.859449
Name: B, dtype: float64

print (df.query('A >= 5 and A < 8'))
A         B
4  5  0.894525
5  6  0.978174
6  7  0.859449

The nature of wanting to include the row where A == 5 and all rows upto but not including the row where A == 8 means we will end up using iloc (loc includes both ends of slice).

In order to get the index labels we use idxmax. This will return the first position of the maximum value. I run this on a boolean series where A == 5 (then when A == 8) which returns the index value of when A == 5 first happens (same thing for A == 8).

Then I use searchsorted to find the ordinal position of where the index label (that I found above) occurs. This is what I use in iloc.

i5, i8 = df.index.searchsorted([df.A.eq(5).idxmax(), df.A.eq(8).idxmax()])
df.iloc[i5:i8]

enter image description here


numpy

you can further enhance this by using the underlying numpy objects the analogous numpy functions. I wrapped it up into a handy function.

def find_between(df, col, v1, v2):
vals = df[col].values
mx1, mx2 = (vals == v1).argmax(), (vals == v2).argmax()
idx = df.index.values
i1, i2 = idx.searchsorted([mx1, mx2])
return df.iloc[i1:i2]


find_between(df, 'A', 5, 8)

enter image description here


timing
enter image description here

To answer the original question on how to get the index as an integer for the desired selection, the following will work :

df[df['A']==5].index.item()

Little sum up for searching by row:

This can be useful if you don't know the column values ​​or if columns have non-numeric values

if u want get index number as integer u can also do:

item = df[4:5].index.item()
print(item)
4

it also works in numpy / list:

numpy = df[4:7].index.to_numpy()[0]
lista = df[4:7].index.to_list()[0]

in [x] u pick number in range [4:7], for example if u want 6:

numpy = df[4:7].index.to_numpy()[2]
print(numpy)
6

for DataFrame:

df[4:7]


A          B
4   5   0.894525
5   6   0.978174
6   7   0.859449

or:

df[(df.index>=4) & (df.index<7)]


A          B
4   5   0.894525
5   6   0.978174
6   7   0.859449

Or you can add a for loop

for i in dfb:
dfb = i


for j in dfbb:
dgbb = j

This way the element '4' is out of the list