计算 Postgresql 的累计总数

我使用 countgroup by来获得每天注册用户的数量:

  SELECT created_at, COUNT(email)
FROM subscriptions
GROUP BY created at;

结果:

created_at  count
-----------------
04-04-2011  100
05-04-2011   50
06-04-2011   50
07-04-2011  300

我想要得到每天累计的订阅者总数。我如何得到这个?

created_at  count
-----------------
04-04-2011  100
05-04-2011  150
06-04-2011  200
07-04-2011  500
78534 次浏览

Use:

SELECT a.created_at,
(SELECT COUNT(b.email)
FROM SUBSCRIPTIONS b
WHERE b.created_at <= a.created_at) AS count
FROM SUBSCRIPTIONS a
SELECT
s1.created_at,
COUNT(s2.email) AS cumul_count
FROM subscriptions s1
INNER JOIN subscriptions s2 ON s1.created_at >= s2.created_at
GROUP BY s1.created_at

I assume you want only one row per day and you want to still show days without any subscriptions (suppose nobody subscribes for a certain date, do you want to show that date with the balance of the previous day?). If this is the case, you can use the 'with' feature:

with recursive serialdates(adate) as (
select cast('2011-04-04' as date)
union all
select adate + 1 from serialdates where adate < cast('2011-04-07' as date)
)
select D.adate,
(
select count(distinct email)
from subscriptions
where created_at between date_trunc('month', D.adate) and D.adate
)
from serialdates D

With larger datasets, window functions are the most efficient way to perform these kinds of queries -- the table will be scanned only once, instead of once for each date, like a self-join would do. It also looks a lot simpler. :) PostgreSQL 8.4 and up have support for window functions.

This is what it looks like:

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM subscriptions
GROUP BY created_at;

Here OVER creates the window; ORDER BY created_at means that it has to sum up the counts in created_at order.


Edit: If you want to remove duplicate emails within a single day, you can use sum(count(distinct email)). Unfortunately this won't remove duplicates that cross different dates.

If you want to remove all duplicates, I think the easiest is to use a subquery and DISTINCT ON. This will attribute emails to their earliest date (because I'm sorting by created_at in ascending order, it'll choose the earliest one):

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM (
SELECT DISTINCT ON (email) created_at, email
FROM subscriptions ORDER BY email, created_at
) AS subq
GROUP BY created_at;

If you create an index on (email, created_at), this query shouldn't be too slow either.


(If you want to test, this is how I created the sample dataset)

create table subscriptions as
select date '2000-04-04' + (i/10000)::int as created_at,
'foofoobar@foobar.com' || (i%700000)::text as email
from generate_series(1,1000000) i;
create index on subscriptions (email, created_at);

The best way is to have a calendar table: calendar ( date date, month int, quarter int, half int, week int, year int )

Then, you can join this table to make summary for the field you need.

To anyone seeing this answer today (2021) You can use rollup

SELECT created_at, COUNT(email)
FROM subscriptions
GROUP BY rollup(created_at);

this will give you a new row with the total

created_at  count
-----------------
04-04-2011  100
05-04-2011   50
06-04-2011   50
07-04-2011  300
NULL        500

You can also use rollup for partial results if you have more than one parameter to show in your group by. If you have a created_by for instance:

SELECT created_at, created_by COUNT(email)
FROM subscriptions
GROUP BY rollup(created_at, created_by);

this will give you a new row with the total

created_at  created_by  count
-----------------------------
04-04-2011     1        80
04-04-2011     2        20
04-04-2021    NULL      100
05-04-2011     1        20
05-04-2011     2        30
05-04-2011    NULL      50
NULL          NULL      150

I only took the numbers of the first two days, but that's the idea. it will show grouped by date, then total of that day, then the total of totals.

Order matters in the rollup() here, as to how the partial totals will be displayed