MySQL 中的秩函数

我需要知道顾客的等级。在这里,我添加了相应的 ANSI 标准 SQL 查询我的要求。请帮我把它转换成 MySQL。

SELECT RANK() OVER (PARTITION BY Gender ORDER BY Age) AS [Partition by Gender],
FirstName,
Age,
Gender
FROM Person

在 MySQL 中有没有查找排名的函数?

322239 次浏览

一种选择是使用排名变量,如下所示:

SELECT    first_name,
age,
gender,
@curRank := @curRank + 1 AS rank
FROM      person p, (SELECT @curRank := 0) r
ORDER BY  age;

(SELECT @curRank := 0)部分允许变量初始化,而不需要单独的 SET命令。

测试案例:

CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));


INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Jack', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');

结果:

+------------+------+--------+------+
| first_name | age  | gender | rank |
+------------+------+--------+------+
| Kathy      |   18 | F      |    1 |
| Jane       |   20 | F      |    2 |
| Nick       |   22 | M      |    3 |
| Bob        |   25 | M      |    4 |
| Anne       |   25 | F      |    5 |
| Jack       |   30 | M      |    6 |
| Bill       |   32 | M      |    7 |
| Steve      |   36 | M      |    8 |
+------------+------+--------+------+
8 rows in set (0.02 sec)

丹尼尔版本的一个调整来计算百分比和等级。两个得分相同的人也会得到相同的排名。

set @totalStudents = 0;
select count(*) into @totalStudents from marksheets;
SELECT id, score, @curRank := IF(@prevVal=score, @curRank, @studentNumber) AS rank,
@percentile := IF(@prevVal=score, @percentile, (@totalStudents - @studentNumber + 1)/(@totalStudents)*100),
@studentNumber := @studentNumber + 1 as studentNumber,
@prevVal:=score
FROM marksheets, (
SELECT @curRank :=0, @prevVal:=null, @studentNumber:=1, @percentile:=100
) r
ORDER BY score DESC

示例数据的查询结果-

+----+-------+------+---------------+---------------+-----------------+
| id | score | rank | percentile    | studentNumber | @prevVal:=score |
+----+-------+------+---------------+---------------+-----------------+
| 10 |    98 |    1 | 100.000000000 |             2 |              98 |
|  5 |    95 |    2 |  90.000000000 |             3 |              95 |
|  6 |    91 |    3 |  80.000000000 |             4 |              91 |
|  2 |    91 |    3 |  80.000000000 |             5 |              91 |
|  8 |    90 |    5 |  60.000000000 |             6 |              90 |
|  1 |    90 |    5 |  60.000000000 |             7 |              90 |
|  9 |    84 |    7 |  40.000000000 |             8 |              84 |
|  3 |    83 |    8 |  30.000000000 |             9 |              83 |
|  4 |    72 |    9 |  20.000000000 |            10 |              72 |
|  7 |    60 |   10 |  10.000000000 |            11 |              60 |
+----+-------+------+---------------+---------------+-----------------+

@ Sam,你的观点在概念上非常出色,但是我认为你误解了 MySQL 文档在参考页面上所说的内容——或者我误解了: ——我只是想补充一下,这样如果有人对@Daniel 的回答感到不舒服,他们会更加放心,或者至少挖掘得更深一些。

您可以看到 SELECT中的 "@curRank := @curRank + 1 AS rank"不是“一个语句”,而是语句的一个“原子”部分,所以它应该是安全的。

您引用的文档接下来展示了一些示例,其中在语句的2个(原子)部分中显示了相同的用户定义变量,例如 "SELECT @curRank, @curRank := @curRank + 1 AS rank"

有人可能会说,在@Daniel 的回答中,@curRank被使用了两次: (1) "@curRank := @curRank + 1 AS rank"和(2) "(SELECT @curRank := 0) r",但是因为第二种用法是 FROM子句的一部分,所以我非常确定它肯定会被首先求值; 本质上就是使它成为第二个,也是第一个语句。

事实上,在你引用的同一个 MySQL 文档页面上,你会在评论中看到同样的解决方案——它可能是@Daniel 得到它的地方; 是的,我知道它是评论,但它是官方文档页面上的评论,这确实有一定的分量。

下面是一个通用的解决方案,它将分区上的稠密等级赋给行,它使用用户变量:

CREATE TABLE person (
id INT NOT NULL PRIMARY KEY,
firstname VARCHAR(10),
gender VARCHAR(1),
age INT
);


INSERT INTO person (id, firstname, gender, age) VALUES
(1,  'Adams',  'M', 33),
(2,  'Matt',   'M', 31),
(3,  'Grace',  'F', 25),
(4,  'Harry',  'M', 20),
(5,  'Scott',  'M', 30),
(6,  'Sarah',  'F', 30),
(7,  'Tony',   'M', 30),
(8,  'Lucy',   'F', 27),
(9,  'Zoe',    'F', 30),
(10, 'Megan',  'F', 26),
(11, 'Emily',  'F', 20),
(12, 'Peter',  'M', 20),
(13, 'John',   'M', 21),
(14, 'Kate',   'F', 35),
(15, 'James',  'M', 32),
(16, 'Cole',   'M', 25),
(17, 'Dennis', 'M', 27),
(18, 'Smith',  'M', 35),
(19, 'Zack',   'M', 35),
(20, 'Jill',   'F', 25);


SELECT person.*, @rank := CASE
WHEN @partval = gender AND @rankval = age THEN @rank
WHEN @partval = gender AND (@rankval := age) IS NOT NULL THEN @rank + 1
WHEN (@partval := gender) IS NOT NULL AND (@rankval := age) IS NOT NULL THEN 1
END AS rnk
FROM person, (SELECT @rank := NULL, @partval := NULL, @rankval := NULL) AS x
ORDER BY gender, age;

注意,变量赋值放置在 CASE表达式中。这(在理论上)照顾到了评估问题的顺序。增加 IS NOT NULL是为了处理数据类型转换和短路问题。

PS: 通过删除检查关联的所有条件,它可以很容易地转换为分区上的行号。

| id | firstname | gender | age | rank |
|----|-----------|--------|-----|------|
| 11 | Emily     | F      | 20  | 1    |
| 20 | Jill      | F      | 25  | 2    |
| 3  | Grace     | F      | 25  | 2    |
| 10 | Megan     | F      | 26  | 3    |
| 8  | Lucy      | F      | 27  | 4    |
| 6  | Sarah     | F      | 30  | 5    |
| 9  | Zoe       | F      | 30  | 5    |
| 14 | Kate      | F      | 35  | 6    |
| 4  | Harry     | M      | 20  | 1    |
| 12 | Peter     | M      | 20  | 1    |
| 13 | John      | M      | 21  | 2    |
| 16 | Cole      | M      | 25  | 3    |
| 17 | Dennis    | M      | 27  | 4    |
| 7  | Tony      | M      | 30  | 5    |
| 5  | Scott     | M      | 30  | 5    |
| 2  | Matt      | M      | 31  | 6    |
| 15 | James     | M      | 32  | 7    |
| 1  | Adams     | M      | 33  | 8    |
| 18 | Smith     | M      | 35  | 9    |
| 19 | Zack      | M      | 35  | 9    |

小提琴演示

如果你只想给一个人排名,你可以这样做:

SELECT COUNT(Age) + 1
FROM PERSON
WHERE(Age < age_to_rank)

这个排名对应于甲骨文排名函数(如果你有相同年龄的人,他们得到相同的排名,并且之后的排名是非连续的)。

这比在子查询中使用上述解决方案之一并从中选择以获得一个人的排名要快一些。

这可以用来排名每个人,但它比上述解决方案慢。

SELECT
Age AS age_var,
(
SELECT COUNT(Age) + 1
FROM Person
WHERE (Age < age_var)
) AS rank
FROM Person

丹尼尔和萨尔曼答案的结合。然而,排名不会给出,因为连续序列与关系的存在。相反,它跳过排名到下一个。所以最大值总是达到行数。

    SELECT    first_name,
age,
gender,
IF(age=@_last_age,@curRank:=@curRank,@curRank:=@_sequence) AS rank,
@_sequence:=@_sequence+1,@_last_age:=age
FROM      person p, (SELECT @curRank := 1, @_sequence:=1, @_last_age:=0) r
ORDER BY  age;

模式和测试用例:

CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));


INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Jack', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');
INSERT INTO person VALUES (9, 'Kamal', 25, 'M');
INSERT INTO person VALUES (10, 'Saman', 32, 'M');

产出:

+------------+------+--------+------+--------------------------+-----------------+
| first_name | age  | gender | rank | @_sequence:=@_sequence+1 | @_last_age:=age |
+------------+------+--------+------+--------------------------+-----------------+
| Kathy      |   18 | F      |    1 |                        2 |              18 |
| Jane       |   20 | F      |    2 |                        3 |              20 |
| Nick       |   22 | M      |    3 |                        4 |              22 |
| Kamal      |   25 | M      |    4 |                        5 |              25 |
| Anne       |   25 | F      |    4 |                        6 |              25 |
| Bob        |   25 | M      |    4 |                        7 |              25 |
| Jack       |   30 | M      |    7 |                        8 |              30 |
| Bill       |   32 | M      |    8 |                        9 |              32 |
| Saman      |   32 | M      |    8 |                       10 |              32 |
| Steve      |   36 | M      |   10 |                       11 |              36 |
+------------+------+--------+------+--------------------------+-----------------+

虽然最受欢迎的答案排名,它不分区,你可以做一个自我加入得到整个事情也分区:

SELECT    a.first_name,
a.age,
a.gender,
count(b.age)+1 as rank
FROM  person a left join person b on a.age>b.age and a.gender=b.gender
group by  a.first_name,
a.age,
a.gender

用例

CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));


INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Jack', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');

回答 :

Bill    32  M   4
Bob     25  M   2
Jack    30  M   3
Nick    22  M   1
Steve   36  M   5
Anne    25  F   3
Jane    20  F   2
Kathy   18  F   1

确定给定值的秩的最直接的解决方案是计算值 之前的个数。假设我们有以下值:

10 20 30 30 30 40
  • 所有的 30值都被认为是 第三名
  • 所有的 40值都被认为是 六楼(等级)或 四年级(密集等级)

现在回到最初的问题。下面是一些样本数据,按照 OP 中的描述进行了排序(预期等级在右侧添加) :

+------+-----------+------+--------+    +------+------------+
| id   | firstname | age  | gender |    | rank | dense_rank |
+------+-----------+------+--------+    +------+------------+
|   11 | Emily     |   20 | F      |    |    1 |          1 |
|    3 | Grace     |   25 | F      |    |    2 |          2 |
|   20 | Jill      |   25 | F      |    |    2 |          2 |
|   10 | Megan     |   26 | F      |    |    4 |          3 |
|    8 | Lucy      |   27 | F      |    |    5 |          4 |
|    6 | Sarah     |   30 | F      |    |    6 |          5 |
|    9 | Zoe       |   30 | F      |    |    6 |          5 |
|   14 | Kate      |   35 | F      |    |    8 |          6 |
|    4 | Harry     |   20 | M      |    |    1 |          1 |
|   12 | Peter     |   20 | M      |    |    1 |          1 |
|   13 | John      |   21 | M      |    |    3 |          2 |
|   16 | Cole      |   25 | M      |    |    4 |          3 |
|   17 | Dennis    |   27 | M      |    |    5 |          4 |
|    5 | Scott     |   30 | M      |    |    6 |          5 |
|    7 | Tony      |   30 | M      |    |    6 |          5 |
|    2 | Matt      |   31 | M      |    |    8 |          6 |
|   15 | James     |   32 | M      |    |    9 |          7 |
|    1 | Adams     |   33 | M      |    |   10 |          8 |
|   18 | Smith     |   35 | M      |    |   11 |          9 |
|   19 | Zack      |   35 | M      |    |   11 |          9 |
+------+-----------+------+--------+    +------+------------+

要计算 莎拉RANK() OVER (PARTITION BY Gender ORDER BY Age),可以使用以下查询:

SELECT COUNT(id) + 1 AS rank, COUNT(DISTINCT age) + 1 AS dense_rank
FROM testdata
WHERE gender = (SELECT gender FROM testdata WHERE id = 6)
AND age < (SELECT age FROM testdata WHERE id = 6)


+------+------------+
| rank | dense_rank |
+------+------------+
|    6 |          5 |
+------+------------+

要计算 所有人行的 RANK() OVER (PARTITION BY Gender ORDER BY Age),可以使用以下查询:

SELECT testdata.id, COUNT(lesser.id) + 1 AS rank, COUNT(DISTINCT lesser.age) + 1 AS dense_rank
FROM testdata
LEFT JOIN testdata AS lesser ON lesser.age < testdata.age AND lesser.gender = testdata.gender
GROUP BY testdata.id

结果是这样的(连接的值在右边添加) :

+------+------+------------+    +-----------+-----+--------+
| id   | rank | dense_rank |    | firstname | age | gender |
+------+------+------------+    +-----------+-----+--------+
|   11 |    1 |          1 |    | Emily     |  20 | F      |
|    3 |    2 |          2 |    | Grace     |  25 | F      |
|   20 |    2 |          2 |    | Jill      |  25 | F      |
|   10 |    4 |          3 |    | Megan     |  26 | F      |
|    8 |    5 |          4 |    | Lucy      |  27 | F      |
|    6 |    6 |          5 |    | Sarah     |  30 | F      |
|    9 |    6 |          5 |    | Zoe       |  30 | F      |
|   14 |    8 |          6 |    | Kate      |  35 | F      |
|    4 |    1 |          1 |    | Harry     |  20 | M      |
|   12 |    1 |          1 |    | Peter     |  20 | M      |
|   13 |    3 |          2 |    | John      |  21 | M      |
|   16 |    4 |          3 |    | Cole      |  25 | M      |
|   17 |    5 |          4 |    | Dennis    |  27 | M      |
|    5 |    6 |          5 |    | Scott     |  30 | M      |
|    7 |    6 |          5 |    | Tony      |  30 | M      |
|    2 |    8 |          6 |    | Matt      |  31 | M      |
|   15 |    9 |          7 |    | James     |  32 | M      |
|    1 |   10 |          8 |    | Adams     |  33 | M      |
|   18 |   11 |          9 |    | Smith     |  35 | M      |
|   19 |   11 |          9 |    | Zack      |  35 | M      |
+------+------+------------+    +-----------+-----+--------+

从 MySQL 8开始,您最终可以在 MySQL 中使用窗口函数: Https://dev.mysql.com/doc/refman/8.0/en/window-functions.html

您的查询可以用完全相同的方式编写:

SELECT RANK() OVER (PARTITION BY Gender ORDER BY Age) AS `Partition by Gender`,
FirstName,
Age,
Gender
FROM Person

为了避免 Erandac 的答案中的“ 然而”与 Daniel 和 Salman 的答案相结合,可以使用以下“分区解决方案”之一

SELECT customerID, myDate


-- partition ranking works only with CTE / from MySQL 8.0 on
, RANK() OVER (PARTITION BY customerID ORDER BY dateFrom) AS rank,


-- Erandac's method in combination of Daniel's and Salman's
-- count all items in sequence, maximum reaches row count.
, IF(customerID=@_lastRank, @_curRank:=@_curRank, @_curRank:=@_sequence+1) AS sequenceRank
, @_sequence:=@_sequence+1 as sequenceOverAll


-- Dense partition ranking, works also with MySQL 5.7
-- remember to set offset values in from clause
, IF(customerID=@_lastRank, @_nxtRank:=@_nxtRank, @_nxtRank:=@_nxtRank+1 ) AS partitionRank
, IF(customerID=@_lastRank, @_overPart:=@_overPart+1, @_overPart:=1 ) AS partitionSequence


, @_lastRank:=customerID
FROM myCustomers,
(SELECT @_curRank:=0, @_sequence:=0, @_lastRank:=0, @_nxtRank:=0, @_overPart:=0 ) r
ORDER BY customerID, myDate

此代码片段中第3个变体中的分区排名将返回连续的排名数字。这将导致一个类似于 rank() over partition by结果的数据结构。作为一个例子,请看下面。特别是 对于每个新的 PartitionRank 分区序列总是以1开始,使用这种方法:

customerID    myDate   sequenceRank (Erandac)
|    sequenceOverAll
|     |   partitionRank
|     |     | partitionSequence
|     |     |    | lastRank
... lines ommitted for clarity
40    09.11.2016 11:19    1     44    1   44    40
40    09.12.2016 12:08    1     45    1   45    40
40    09.12.2016 12:08    1     46    1   46    40
40    09.12.2016 12:11    1     47    1   47    40
40    09.12.2016 12:12    1     48    1   48    40
40    13.10.2017 16:31    1     49    1   49    40
40    15.10.2017 11:00    1     50    1   50    40
76    01.07.2015 00:24    51    51    2    1    76
77    04.08.2014 13:35    52    52    3    1    77
79    15.04.2015 20:25    53    53    4    1    79
79    24.04.2018 11:44    53    54    4    2    79
79    08.10.2018 17:37    53    55    4    3    79
117   09.07.2014 18:21    56    56    5    1   117
119   26.06.2014 13:55    57    57    6    1   119
119   02.03.2015 10:23    57    58    6    2   119
119   12.10.2015 10:16    57    59    6    3   119
119   08.04.2016 09:32    57    60    6    4   119
119   05.10.2016 12:41    57    61    6    5   119
119   05.10.2016 12:42    57    62    6    6   119
...
select id,first_name,gender,age,
rank() over(partition by gender order by age) rank_g
from person


CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));


INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Jack', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');
INSERT INTO person VALUES (9,'AKSH',32,'M');
SELECT FirstName,Age,Gender, RANK() OVER (partition by Gender order by Age desc) AS 'Partition by Gender' FROM Person
  • you can use asc/desc depending on which order you want if whether ascending or descending

对丹尼尔的回答稍作调整, 一些 MySQL 版本在 IF 语句中抛出错误

SELECT    first_name,
age,
gender,
(IF(age=@_last_age,@curRank:=@curRank,@curRank:=@_sequence)) AS `rank`,
@_sequence:=@_sequence+1,@_last_age:=age
FROM      person p, (SELECT @curRank := 1, @_sequence:=1, @_last_age:=0) r
ORDER BY  age;

所以我必须用大括号包装排名中的 IF 条件测试

此 anly 计算排名,但不计算稠密排名