熊猫从 Dataframe 中使用 startswith 进行选择

这是可行的(使用熊猫12 dev)

table2=table[table['SUBDIVISION'] =='INVERNESS']

然后我意识到我需要使用“ start with”来选择字段,因为我错过了很多。 所以根据熊猫博士的说法,我尽我所能地去追寻

criteria = table['SUBDIVISION'].map(lambda x: x.startswith('INVERNESS'))
table2 = table[criteria]

得到 AttributeError: ‘ float’对象没有属性‘ start with’

所以我尝试了另一种语法,得到了相同的结果

table[[x.startswith('INVERNESS') for x in table['SUBDIVISION']]]

参考文献 < a href = “ http://Pandas.pydata.org/anda-docs/stat/indexing.html # boolean-indexing”> http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing 第4节: 系列的列表理解和地图方法也可用于制定更复杂的标准:

我错过了什么?

136935 次浏览

You can use the str.startswith DataFrame method to give more consistent results:

In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])


In [12]: s
Out[12]:
0      a
1     ab
2      c
3     11
4    NaN
dtype: object


In [13]: s.str.startswith('a', na=False)
Out[13]:
0     True
1     True
2    False
3    False
4    False
dtype: bool

and the boolean indexing will work just fine (I prefer to use loc, but it works just the same without):

In [14]: s.loc[s.str.startswith('a', na=False)]
Out[14]:
0     a
1    ab
dtype: object

.

It looks least one of your elements in the Series/column is a float, which doesn't have a startswith method hence the AttributeError, the list comprehension should raise the same error...

To retrieve all the rows which startwith required string

dataFrameOut = dataFrame[dataFrame['column name'].str.match('string')]

To retrieve all the rows which contains required string

dataFrameOut = dataFrame[dataFrame['column name'].str.contains('string')]

You can use apply to easily apply any string matching function to your column elementwise.

table2=table[table['SUBDIVISION'].apply(lambda x: x.startswith('INVERNESS'))]

this assuming that your "SUBDIVISION" column is of the correct type (string)

Edit: fixed missing parenthesis

Using startswith for a particular column value

df  = df.loc[df["SUBDIVISION"].str.startswith('INVERNESS', na=False)]

This can also be achieved using query:

table.query('SUBDIVISION.str.startswith("INVERNESS").values')