如何在 Django 中筛选用于计数注释的对象？

小开

更新

我提到的子查询方法现在通过子查询-表达式在 Django 1.11中得到支持。

Event.objects.annotate(
num_paid_participants=Subquery(
Participant.objects.filter(
is_paid=True,
event=OuterRef('pk')
).values('event')
.annotate(cnt=Count('pk'))
.values('cnt'),
output_field=models.IntegerField()
)
)

我更喜欢这个超过聚合 (款额 + 个案)，因为它应该是更快，更容易被优化的 (有适当的索引)。

对于旧版本，可以使用 .extra实现相同的功能

Event.objects.extra(select={'num_paid_participants': "\
SELECT COUNT(*) \
FROM `myapp_participant` \
WHERE `myapp_participant`.`is_paid` = 1 AND \
`myapp_participant`.`event_id` = `myapp_event`.`id`"
})

小开

刚刚发现 Django 1.8有了新的条件表达式特征条件表达式特征，所以现在我们可以这样做:

events = Event.objects.all().annotate(paid_participants=models.Sum(
models.Case(
models.When(participant__is_paid=True, then=1),
default=0, output_field=models.IntegerField()
)))

小开

我建议改用 Participant查询集的 .values方法。

简而言之，你想做的是:

Participant.objects\
.filter(is_paid=True)\
.values('event')\
.distinct()\
.annotate(models.Count('id'))

一个完整的例子如下:

创建2个 Events:

event1 = Event.objects.create(title='event1')
event2 = Event.objects.create(title='event2')

Add Participants to them:

part1l = [Participant.objects.create(event=event1, is_paid=((_%2) == 0))\
for _ in range(10)]
part2l = [Participant.objects.create(event=event2, is_paid=((_%2) == 0))\
for _ in range(50)]

Group all Participants by their event field:

Participant.objects.values('event')
> <QuerySet [{'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, '...(remaining elements truncated)...']>

这里需要明确指出的是:

Participant.objects.values('event').distinct()
> <QuerySet [{'event': 1}, {'event': 2}]>

.values和 .distinct在这里所做的就是创建两个由它们的元素 event分组的 Participant桶。注意，这些存储桶包含 Participant。

You can then annotate those buckets as they contain the set of original Participant. Here we want to count the number of Participant, this is simply done by counting the ids of the elements in those buckets (since those are Participant):
```
Participant.objects\
.values('event')\
.distinct()\
.annotate(models.Count('id'))
> <QuerySet [{'event': 1, 'id__count': 10}, {'event': 2, 'id__count': 50}]>
```

Finally you want only Participant with a is_paid being True, you may just add a filter in front of the previous expression, and this yield the expression shown above:

Participant.objects\
.filter(is_paid=True)\
.values('event')\
.distinct()\
.annotate(models.Count('id'))
> <QuerySet [{'event': 1, 'id__count': 5}, {'event': 2, 'id__count': 25}]>

The only drawback is that you have to retrieve the Event afterwards as you only have the id from the method above.

小开

最佳答案

Django 2.0中的条件聚合允许您进一步减少过去的 feff 数量。这也将使用 Postgres 的 filter逻辑，它比求和逻辑要快一些(我见过20-30% 这样的数字)。

不管怎样，就你的情况来说，我们看到的东西很简单:

from django.db.models import Q, Count
events = Event.objects.annotate(
paid_participants=Count('participants', filter=Q(participants__is_paid=True))
)

在文档中有一个关于过滤注释的独立部分。它与条件聚合相同，但更像我上面的例子。无论哪种方式，这都比我之前做的粗糙的子查询要健康得多。

小开

我想要什么样的结果:

将任务添加到报表中的人员(受托人) 人数
将任务添加到报表中但是为任务添加任务的人员只收取超过0的费用。

一般来说，我必须使用两种不同的查询:

Task.objects.filter(billable_efforts__gt=0)
Task.objects.all()

但是我想在一个查询中同时使用这两种方法，因此:

Task.objects.values('report__title').annotate(withMoreThanZero=Count('assignee', distinct=True, filter=Q(billable_efforts__gt=0))).annotate(totalUniqueAssignee=Count('assignee', distinct=True))

结果:

<QuerySet [{'report__title': 'TestReport', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}, {'report__title': 'Utilization_Report_April_2019', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}]>

小开

对于 Django 3.x，只需在注释后面写上 filter:

User.objects.values('user_id')
.annotate(sudo_field=models.Count('likes'))
.filter(sudo_field__gt=100)

在上面的 sudo_field不是用户模型中的模型字段，在这里我们筛选喜欢(或 xyz)超过100的用户。