I have the following log table for user messages (simplified form) in Postgres 9.2:
CREATE TABLE log (
log_date DATE,
user_id INTEGER,
payload INTEGER
);
It contains up to one record per user and per day. There will be approximately 500K records per day for 300 days. payload is ever increasing for each user (if that matters).
I want to efficiently retrieve the latest record for each user before a specific date. My query is:
SELECT user_id, max(log_date), max(payload)
FROM log
WHERE log_date <= :mydate
GROUP BY user_id
which is extremely slow. I have also tried:
SELECT DISTINCT ON(user_id), log_date, payload
FROM log
WHERE log_date <= :mydate
ORDER BY user_id, log_date DESC;
which has the same plan and is equally slow.
So far I have a single index on log(log_date)
, but doesn't help much.
And I have a users
table with all users included. I also want to retrieve the result for some some users (those with payload > :value
).
Is there any other index I should use to speed this up, or any other way to achieve what I want?