分组限制在PostgreSQL:显示每组的前N行?

我需要为每个组取前N行,按自定义列排序。

已知下表:

db=# SELECT * FROM xxx;
id | section_id | name
----+------------+------
1 |          1 | A
2 |          1 | B
3 |          1 | C
4 |          1 | D
5 |          2 | E
6 |          2 | F
7 |          3 | G
8 |          2 | H
(8 rows)

对于每个section_id,我需要前2行(按的名字排序),即类似于:

 id | section_id | name
----+------------+------
1 |          1 | A
2 |          1 | B
5 |          2 | E
6 |          2 | F
7 |          3 | G
(5 rows)

我使用的是PostgreSQL 8.3.5。

121565 次浏览
SELECT  x.*
FROM    (
SELECT  section_id,
COALESCE
(
(
SELECT  xi
FROM    xxx xi
WHERE   xi.section_id = xo.section_id
ORDER BY
name, id
OFFSET 1 LIMIT 1
),
(
SELECT  xi
FROM    xxx xi
WHERE   xi.section_id = xo.section_id
ORDER BY
name DESC, id DESC
LIMIT 1
)
) AS mlast
FROM    (
SELECT  DISTINCT section_id
FROM    xxx
) xo
) xoo
JOIN    xxx x
ON      x.section_id = xoo.section_id
AND (x.name, x.id) <= ((mlast).name, (mlast).id)

这里是另一个解决方案(PostgreSQL <= 8.3)。

SELECT
*
FROM
xxx a
WHERE (
SELECT
COUNT(*)
FROM
xxx
WHERE
section_id = a.section_id
AND
name <= a.name
) <= 2

新解决方案(PostgreSQL 8.4)

SELECT
*
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY section_id ORDER BY name) AS r,
t.*
FROM
xxx t) x
WHERE
x.r <= 2;
        -- ranking without WINDOW functions
-- EXPLAIN ANALYZE
WITH rnk AS (
SELECT x1.id
, COUNT(x2.id) AS rnk
FROM xxx x1
LEFT JOIN xxx x2 ON x1.section_id = x2.section_id AND x2.name <= x1.name
GROUP BY x1.id
)
SELECT this.*
FROM xxx this
JOIN rnk ON rnk.id = this.id
WHERE rnk.rnk <=2
ORDER BY this.section_id, rnk.rnk
;


-- The same without using a CTE
-- EXPLAIN ANALYZE
SELECT this.*
FROM xxx this
JOIN ( SELECT x1.id
, COUNT(x2.id) AS rnk
FROM xxx x1
LEFT JOIN xxx x2 ON x1.section_id = x2.section_id AND x2.name <= x1.name
GROUP BY x1.id
) rnk
ON rnk.id = this.id
WHERE rnk.rnk <=2
ORDER BY this.section_id, rnk.rnk
;

从v9.3开始,您可以执行横向连接

select distinct t_outer.section_id, t_top.id, t_top.name from t t_outer
join lateral (
select * from t t_inner
where t_inner.section_id = t_outer.section_id
order by t_inner.name
limit 2
) t_top on true
order by t_outer.section_id;

可能会更快,但是,当然,你应该在你的数据和用例上测试性能。

横向连接是可行的方法,但您应该首先执行嵌套查询,以提高大型表的性能。

SELECT t_limited.*
FROM (
SELECT DISTINCT section_id
FROM t
) t_groups
JOIN LATERAL (
SELECT *
FROM t t_all
WHERE t_all.section_id = t_groups.section_id
ORDER BY t_all.name
LIMIT 2
) t_limited ON true

如果没有嵌套的选择distinct,则连接横向对表中的每一行运行,即使section_id经常是重复的。由于嵌套的选择是不同的,联接横向操作将为每个不同的section_id运行一次且仅运行一次。