在GROUP BY中使用LIMIT来获得N个结果?

查询:

SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC

收益率:

year    id  rate
2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2009    p01 4.4
2002    p01 3.9
2004    p01 3.5
2005    p01 2.1
2000    p01 0.8
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7
2006    p02 4.6
2007    p02 3.3

我想要的是每个id只有前5个结果:

2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7

是否有一种方法来做到这一点,使用一些限制之类的修饰符,在GROUP BY中工作?

497300 次浏览

不,你不能任意地限制子查询(你可以在较新的mysql中有限地这样做,但不能每组5个结果)。

这是一个分组最大类型查询,在SQL中执行起来并不简单。有不同的方式来解决这个问题,它在某些情况下更有效,但对于top-n,一般情况下,你会想要查看比尔的答案来解决类似的前一个问题。

对于这个问题的大多数解决方案,如果有多行具有相同的rate值,它可以返回超过5行,所以你可能仍然需要大量的后处理来检查它。

下面的帖子:sql:选择每组的前N个记录描述了在没有子查询的情况下实现这一目标的复杂方式。

它改进了这里提供的其他解决方案:

  • 在单个查询中执行所有操作
  • 能够正确地利用索引
  • 避免子查询,众所周知,在MySQL中会产生糟糕的执行计划
然而,这并不漂亮。如果在MySQL中启用窗口函数(又名分析函数),就可以实现一个很好的解决方案——但它们没有。 在上面的帖子中使用的技巧利用了GROUP_CONCAT,它有时被描述为“穷人的MySQL窗口函数”

这需要一系列子查询对值进行排序、限制,然后在分组时执行求和

@Rnk:=0;
@N:=2;
select
c.id,
sum(c.val)
from (
select
b.id,
b.bal
from (
select
if(@last_id=id,@Rnk+1,1) as Rnk,
a.id,
a.val,
@last_id=id,
from (
select
id,
val
from list
order by id,val desc) as a) as b
where b.rnk < @N) as c
group by c.id;
SELECT year, id, rate
FROM (SELECT
year, id, rate, row_number() over (partition by id order by rate DESC)
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC) as subquery
WHERE row_number <= 5

子查询与您的查询几乎相同。只有改变是增加

row_number() over (partition by id order by rate DESC)

对于那些像我一样有查询超时的人。我做了下面的限制和任何其他由特定的组。

DELIMITER $$
CREATE PROCEDURE count_limit200()
BEGIN
DECLARE a INT Default 0;
DECLARE stop_loop INT Default 0;
DECLARE domain_val VARCHAR(250);
DECLARE domain_list CURSOR FOR SELECT DISTINCT domain FROM db.one;


OPEN domain_list;


SELECT COUNT(DISTINCT(domain)) INTO stop_loop
FROM db.one;
-- BEGIN LOOP
loop_thru_domains: LOOP
FETCH domain_list INTO domain_val;
SET a=a+1;


INSERT INTO db.two(book,artist,title,title_count,last_updated)
SELECT * FROM
(
SELECT book,artist,title,COUNT(ObjectKey) AS titleCount, NOW()
FROM db.one
WHERE book = domain_val
GROUP BY artist,title
ORDER BY book,titleCount DESC
LIMIT 200
) a ON DUPLICATE KEY UPDATE title_count = titleCount, last_updated = NOW();


IF a = stop_loop THEN
LEAVE loop_thru_domain;
END IF;
END LOOP loop_thru_domain;
END $$

它循环遍历一个域列表,然后每个域只插入200个限制

试试这个:

SELECT h.year, h.id, h.rate
FROM (SELECT h.year, h.id, h.rate, IF(@lastid = (@lastid:=h.id), @index:=@index+1, @index:=0) indx
FROM (SELECT h.year, h.id, h.rate
FROM h
WHERE h.year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2)
GROUP BY id, h.year
ORDER BY id, rate DESC
) h, (SELECT @lastid:='', @index:=0) AS a
) h
WHERE h.indx <= 5;

你可以使用GROUP_CONCAT aggregated函数将所有年份放入一个列中,按id分组并按rate排序:

SELECT   id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM     yourtable
GROUP BY id

结果:

-----------------------------------------------------------
|  ID | GROUPED_YEAR                                      |
-----------------------------------------------------------
| p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 |
| p02 | 2001,2004,2002,2003,2000,2006,2007                |
-----------------------------------------------------------

然后你可以使用FIND_IN_SET,它返回第一个参数在第二个参数中的位置,例如。

SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
1


SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
6

使用GROUP_CONCATFIND_IN_SET的组合,并根据find_in_set返回的位置进行过滤,你可以使用这个查询,它只返回每个id的前5年:

SELECT
yourtable.*
FROM
yourtable INNER JOIN (
SELECT
id,
GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM
yourtable
GROUP BY id) group_max
ON yourtable.id = group_max.id
AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5
ORDER BY
yourtable.id, yourtable.year DESC;

请参见小提琴在这里

请注意,如果多个行可以具有相同的速率,您应该考虑在rate列上使用GROUP_CONCAT(DISTINCT rate ORDER BY rate)而不是year列。

GROUP_CONCAT返回的字符串的最大长度是有限的,所以如果你需要为每个组选择一些记录,这个方法很有效。

对我来说

SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N)

完美的工作。没有复杂的查询。


例如:每组取top 1

SELECT
*
FROM
yourtable
WHERE
id IN (SELECT
SUBSTRING_INDEX(GROUP_CONCAT(id
ORDER BY rate DESC),
',',
1) id
FROM
yourtable
GROUP BY year)
ORDER BY rate DESC;

试试这个:

SET @num := 0, @type := '';
SELECT `year`, `id`, `rate`,
@num := if(@type = `id`, @num + 1, 1) AS `row_number`,
@type := `id` AS `dummy`
FROM (
SELECT *
FROM `h`
WHERE (
`year` BETWEEN '2000' AND '2009'
AND `id` IN (SELECT `rid` FROM `table2`) AS `temp_rid`
)
ORDER BY `id`
) AS `temph`
GROUP BY `year`, `id`, `rate`
HAVING `row_number`<='5'
ORDER BY `id`, `rate DESC;

你想要找到每组前n行。这个答案使用与OP不同的示例数据提供了一个通用的解决方案。

在MySQL 8或更高版本中,您可以根据top 5的确切定义使用__ABC0, __ABC1或DENSE_RANK函数。下面是这些函数根据value降序排序生成的数字。注意领带是如何处理的:

< span style=" font - family:宋体;"> pkid < / th > < span style=" font - family:宋体;"> < / th >价值 < span style=" font - family:宋体;"> row_number < / th > < span style=" font - family:宋体;"> < / th >排名 < span style=" font - family:宋体;"> dense_rank < / th > < span style=" font - family:宋体;"1 > < / td > < span style=" font - family:宋体;"道明> > 100 < / < span style=" font - family:宋体;"> * 1道明> < / < span style=" font - family:宋体;"> * 1道明> < / < span style=" font - family:宋体;"> * 1道明> < / < span style=" font - family:宋体;"2 > < / td > < span style=" font - family:宋体;"道明> > 90 < / < span style=" font - family:宋体;"> * 2道明> < / < span style=" font - family:宋体;"> * 2道明> < / < span style=" font - family:宋体;"> * 2道明> < / < span style=" font - family:宋体;"> 3 < / td > < span style=" font - family:宋体;"道明> > 90 < / < span style=" font - family:宋体;"> * 3道明> < / < span style=" font - family:宋体;"> * 2道明> < / < span style=" font - family:宋体;"> * 2道明> < / < span style=" font - family:宋体;"4 > < / td > < span style=" font - family:宋体;"道明> > 80 < / < span style=" font - family:宋体;"> * 4道明> < / < span style=" font - family:宋体;"> * 4道明> < / < span style=" font - family:宋体;"> * 3道明> < / < span style=" font - family:宋体;"5 > < / td > < span style=" font - family:宋体;"道明> > 80 < / < span style=" font - family:宋体;"> * 5道明> < / < span style=" font - family:宋体;"> * 4道明> < / < span style=" font - family:宋体;"> * 3道明> < / < span style=" font - family:宋体;"6 > < / td > < span style=" font - family:宋体;"道明> > 80 < / < span style=" font - family:宋体;"6 > < / td > < span style=" font - family:宋体;"> * 4道明> < / < span style=" font - family:宋体;"> * 3道明> < / < span style=" font - family:宋体;"7 > < / td > < span style=" font - family:宋体;"道明> > 70 < / < span style=" font - family:宋体;"7 > < / td > < span style=" font - family:宋体;"7 > < / td > < span style=" font - family:宋体;"> * 4道明> < / < span style=" font - family:宋体;"> 8道明> < / < span style=" font - family:宋体;"> 60道明> < / < span style=" font - family:宋体;"> 8道明> < / < span style=" font - family:宋体;"> 8道明> < / < span style=" font - family:宋体;"> * 5道明> < / < span style=" font - family:宋体;"9 > < / td > < span style=" font - family:宋体;"> 50道明> < / < span style=" font - family:宋体;"9 > < / td > < span style=" font - family:宋体;"9 > < / td > < span style=" font - family:宋体;"6 > < / td > < span style=" font - family:宋体;"10 > < / td > < span style=" font - family:宋体;"> 40道明> < / < span style=" font - family:宋体;"10 > < / td > < span style=" font - family:宋体;"10 > < / td > < span style=" font - family:宋体;"7 > < / td >
catid
p01
p01
p01
p01
p01
p01
p01
p01
p01
p01

一旦你选择了函数,就像这样使用它:

SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY value DESC) AS n
FROM t
) AS x
WHERE n <= 5

DB<>小提琴 .


在MySQL 5。x,你可以使用穷人的排名超过分区,以达到预期的结果:外部连接表本身和每一行,计算行数<强> < / >强之前它(例如,前一行可以是一个较高的值)。

下面将产生类似RANK函数的结果:

SELECT t.pkid, t.catid, t.value, COUNT(b.value) + 1 AS rank
FROM t
LEFT JOIN t AS b ON b.catid = t.catid AND b.value > t.value
GROUP BY t.pkid, t.catid, t.value
HAVING COUNT(b.value) + 1 <= 5
ORDER BY t.catid, t.value DESC, t.pkid

进行以下更改以产生类似DENSE_RANK函数的结果:

COUNT(DISTINCT b.value)

或进行以下更改以产生类似ROW_NUMBER函数的结果:

ON b.catid = t.catid AND (b.value > t.value OR b.value = t.value AND b.pkid < t.pkid)

DB<>小提琴 .

构建虚拟列(如Oracle中的RowID)

表:

CREATE TABLE `stack`
(`year` int(11) DEFAULT NULL,
`id` varchar(10) DEFAULT NULL,
`rate` float DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

数据:

insert into stack values(2006,'p01',8);
insert into stack values(2001,'p01',5.9);
insert into stack values(2007,'p01',5.3);
insert into stack values(2009,'p01',4.4);
insert into stack values(2001,'p02',12.5);
insert into stack values(2004,'p02',12.4);
insert into stack values(2005,'p01',2.1);
insert into stack values(2000,'p01',0.8);
insert into stack values(2002,'p02',12.2);
insert into stack values(2002,'p01',3.9);
insert into stack values(2004,'p01',3.5);
insert into stack values(2003,'p02',10.3);
insert into stack values(2000,'p02',8.7);
insert into stack values(2006,'p02',4.6);
insert into stack values(2007,'p02',3.3);
insert into stack values(2003,'p01',7.4);
insert into stack values(2008,'p01',6.8);

SQL是这样的:

select t3.year,t3.id,t3.rate
from (select t1.*, (select count(*) from stack t2 where t1.rate<=t2.rate and t1.id=t2.id) as rownum from stack t1) t3
where rownum <=3 order by id,rate DESC;

如果删除t3中的where子句,则如下所示:

enter image description here

GET“TOP N Record"——比;在where子句中添加rownum <=3 (t3的where子句);

选择“年份”;——比;在where子句中添加BETWEEN 2000 AND 2009 (t3的where子句);

请尝试下面的存储过程。我已经核实了。我得到正确的结果,但没有使用groupby

CREATE DEFINER=`ks_root`@`%` PROCEDURE `first_five_record_per_id`()
BEGIN
DECLARE query_string text;
DECLARE datasource1 varchar(24);
DECLARE done INT DEFAULT 0;
DECLARE tenants varchar(50);
DECLARE cur1 CURSOR FOR SELECT rid FROM demo1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;


SET @query_string='';


OPEN cur1;
read_loop: LOOP


FETCH cur1 INTO tenants ;


IF done THEN
LEAVE read_loop;
END IF;


SET @datasource1 = tenants;
SET @query_string = concat(@query_string,'(select * from demo  where `id` = ''',@datasource1,''' order by rate desc LIMIT 5) UNION ALL ');


END LOOP;
close cur1;


SET @query_string  = TRIM(TRAILING 'UNION ALL' FROM TRIM(@query_string));
select @query_string;
PREPARE stmt FROM @query_string;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;


END

花了一些工作,但我认为我的解决方案将是一些分享,因为它看起来很优雅,以及相当快。

SELECT h.year, h.id, h.rate
FROM (
SELECT id,
SUBSTRING_INDEX(GROUP_CONCAT(CONCAT(id, '-', year) ORDER BY rate DESC), ',' , 5) AS l
FROM h
WHERE year BETWEEN 2000 AND 2009
GROUP BY id
ORDER BY id
) AS h_temp
LEFT JOIN h ON h.id = h_temp.id
AND SUBSTRING_INDEX(h_temp.l, CONCAT(h.id, '-', h.year), 1) != h_temp.l

请注意,这个示例是为问题的目的而指定的,可以很容易地修改以用于其他类似的目的。

我刚刚为MYSQL创建了一个top操作。代码很简单。

drop table if exists h;
create table h(id varchar(5), year int, rate numeric(8,2), primary key(id,year));
insert into h(year, id, rate) values
(2006,'p01',8),
(2003,'p01',7.4),
(2008,'p01',6.8),
(2001,'p01',5.9),
(2007,'p01',5.3),
(2009,'p01',4.4),
(2002,'p01',3.9),
(2004,'p01',3.5),
(2005,'p01',2.1),
(2000,'p01',0.8),
(2001,'p02',12.5),
(2004,'p02',12.4),
(2002,'p02',12.2),
(2003,'p02',10.3),
(2000,'p02',8.7),
(2006,'p02',4.6),
(2007,'p02',3.3);


select id, year, rate
from
(
select id, year, rate, @last, if(@last=id,@top:=@top+1, @top:=0) as ztop, @last:=id update_last
from h
order by id, rate desc, year desc
) t2
where ztop<5