根据条件获取数据帧行计数

小开

最佳答案

You are asking for the condition where all the conditions are true, so len of the frame is the answer, unless I misunderstand what you are asking

In [17]: df = DataFrame(randn(20,4),columns=list('ABCD'))


In [18]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)]
Out[18]:
A         B         C         D
12  0.491683  0.137766  0.859753 -1.041487
13  0.376200  0.575667  1.534179  1.247358
14  0.428739  1.539973  1.057848 -1.254489


In [19]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)].count()
Out[19]:
A    3
B    3
C    3
D    3
dtype: int64


In [20]: len(df[(df['A']>0) & (df['B']>0) & (df['C']>0)])
Out[20]: 3

小开

For increased performance you should not evaluate the dataframe using your predicate. You can just use the outcome of your predicate directly as illustrated below:

In [1]: import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(20,4),columns=list('ABCD'))




In [2]: df.head()
Out[2]:
A         B         C         D
0 -2.019868  1.227246 -0.489257  0.149053
1  0.223285 -0.087784 -0.053048 -0.108584
2 -0.140556 -0.299735 -1.765956  0.517803
3 -0.589489  0.400487  0.107856  0.194890
4  1.309088 -0.596996 -0.623519  0.020400


In [3]: %time sum((df['A']>0) & (df['B']>0))
CPU times: user 1.11 ms, sys: 53 µs, total: 1.16 ms
Wall time: 1.12 ms
Out[3]: 4


In [4]: %time len(df[(df['A']>0) & (df['B']>0)])
CPU times: user 1.38 ms, sys: 78 µs, total: 1.46 ms
Wall time: 1.42 ms
Out[4]: 4

Keep in mind that this technique only works for counting the number of rows that comply with your predicate.

小开

In Pandas, I like to use the shape attribute to get number of rows.

df[df.A > 0].shape[0]

gives the number of rows matching the condition A > 0, as desired.

小开

You can use the method query and get the shape of the resulting dataframe. For example:

   A  B  C
0  1  1  x
1  2  2  y
2  3  3  z


df.query("A == 2 & B > 1 & C != 'z'").shape[0]

Output:

小开

import pandas as pd
data = {'title': ['Manager', 'Technical Analyst', 'Software Engineer', 'Sales Manager'], 'Description': [
'''a man or woman who controls an organization or part of an organization,a person who looks after the business affairs of a singer, actor, etc''',
'''Technical analysts, also known as chartists or technicians, employ technical analysis in their trading and research. Technical analysis looks for price patterns and trends based on historical performance to identify signals based on market sentiment and psychology.''',
'''A software engineer is a person who applies the principles of software engineering to design, develop, maintain, test, and evaluate computer software. The term programmer is sometimes used as a synonym, but may also lack connotations of engineering education or skills.''',
'''A sales manager is someone who leads and supervises sales agents and runs the day-to-day sales operations of a business. They oversee the sales strategy, set sales goals, and track sales performance'''
]}
df = pd.DataFrame(data)
data2 = {'title': ['Manager', 'Technical Analyst', 'Software Engineer', 'Sales Manager'], 'Keywords': [
['organization','business','people','arrange']
,['technicians','analysis','research','business']
,['engineering', 'design', 'develop', 'maintain']
,['supervises', 'agents','business','performance','target']
]}
df2 = pd.DataFrame(data2)
print(df2)
df2=df2.explode('Keywords')


print(df2)
print("checking df3")
df3=df.merge(df2,how='left',on='title')
print(df3)
df3['match'] = df3.apply(lambda x: x.Keywords in x.Description, axis=1)
print(df3)
df4=df3.loc[df3['match']==True].groupby(['Description']).count()
print(df4)