用MySQL计算中位数的简单方法

用MySQL计算中位数最简单(希望不会太慢)的方法是什么?我已经使用AVG(x)来寻找平均值,但我很难找到一个简单的方法来计算中位数。现在,我将所有的行返回到PHP,进行排序,然后选择中间的行,但是肯定有一些简单的方法可以在一个MySQL查询中完成它。

示例数据:

id | val
--------
1    4
2    7
3    2
4    2
5    9
6    8
7    3

val上排序会得到2 2 3 4 7 8 9,因此中位数应该是4,而SELECT AVG(val)是== 5

349886 次浏览

你可以使用找到在这里的用户定义函数。

MySQL文档中的这一页的注释有以下建议:

-- (mostly) High Performance scaling MEDIAN function per group
-- Median defined in http://en.wikipedia.org/wiki/Median
--
-- by Peter Hlavac
-- 06.11.2008
--
-- Example Table:


DROP table if exists table_median;
CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
COMMIT;




INSERT INTO table_median (id, val) VALUES
(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
(2, 4),
(3, 5), (3, 2),
(4, 5), (4, 12), (4, 1), (4, 7);






-- Calculating the MEDIAN
SELECT @a := 0;
SELECT
id,
AVG(val) AS MEDIAN
FROM (
SELECT
id,
val
FROM (
SELECT
-- Create an index n for every id
@a := (@a + 1) mod o.c AS shifted_n,
IF(@a mod o.c=0, o.c, @a) AS n,
o.id,
o.val,
-- the number of elements for every id
o.c
FROM (
SELECT
t_o.id,
val,
c
FROM
table_median t_o INNER JOIN
(SELECT
id,
COUNT(1) AS c
FROM
table_median
GROUP BY
id
) t2
ON (t2.id = t_o.id)
ORDER BY
t_o.id,val
) o
) a
WHERE
IF(
-- if there is an even number of elements
-- take the lower and the upper median
-- and use AVG(lower,upper)
c MOD 2 = 0,
n = c DIV 2 OR n = (c DIV 2)+1,


-- if its an odd number of elements
-- take the first if its only one element
-- or take the one in the middle
IF(
c = 1,
n = 1,
n = c DIV 2 + 1
)
)
) a
GROUP BY
id;


-- Explanation:
-- The Statement creates a helper table like
--
-- n id val count
-- ----------------
-- 1, 1, 1, 7
-- 2, 1, 3, 7
-- 3, 1, 4, 7
-- 4, 1, 5, 7
-- 5, 1, 6, 7
-- 6, 1, 7, 7
-- 7, 1, 8, 7
--
-- 1, 2, 4, 1


-- 1, 3, 2, 2
-- 2, 3, 5, 2
--
-- 1, 4, 1, 4
-- 2, 4, 5, 4
-- 3, 4, 7, 4
-- 4, 4, 12, 4




-- from there we can select the n-th element on the position: count div 2 + 1

如果MySQL有ROW_NUMBER,那么MEDIAN是(受SQL Server查询的启发):

WITH Numbered AS
(
SELECT *, COUNT(*) OVER () AS Cnt,
ROW_NUMBER() OVER (ORDER BY val) AS RowNum
FROM yourtable
)
SELECT id, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
;

如果您有偶数个条目,则使用IN。

如果你想找到每个组的中位数,那么只需要在你的OVER子句中PARTITION BY组。

抢劫

我只是在网上的评论中找到了另一个答案:

对于几乎所有SQL中的中位数:

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2

确保列有良好的索引,并且索引用于筛选和排序。与解释计划核对。

select count(*) from table --find the number of rows

计算“中位数”;行号。可以使用:median_row = floor(count / 2)

然后把它从列表中挑出来:

select val from table order by val asc limit median_row,1

这将返回您想要的值的一行。

我使用了两个查询方法:

  • 第一个得到count, min, Max和avg
  • 第二个语句(预处理语句)使用“LIMIT @count/ 2,1”和“ORDER BY ..”子句来获得中值

它们被包装在函数defn中,因此可以从一次调用中返回所有值。

如果您的范围是静态的,并且数据不经常更改,那么预先计算/存储这些值并使用存储的值,而不是每次都从头查询,可能会更有效。

关心奇数的计数-给出中间两个值的平均值。

SELECT AVG(val) FROM
( SELECT x.id, x.val from data x, data y
GROUP BY x.id, x.val
HAVING SUM(SIGN(1-SIGN(IF(y.val-x.val=0 AND x.id != y.id, SIGN(x.id-y.id), y.val-x.val)))) IN (ROUND((COUNT(*))/2), ROUND((COUNT(*)+1)/2))
) sq

我建议一个更快的方法。

获取行数:

SELECT CEIL(COUNT(*)/2) FROM data;

然后取排序子查询的中间值:

SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;

我用5x10e6的随机数数据集进行了测试,它将在10秒内找到中位数。

MariaDB / MySQL:

SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
FROM data d, (SELECT @rownum:=0) r
WHERE d.val is NOT NULL
-- put some where clause here
ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );

史蒂夫•科恩指出,在第一次传递之后,@rownum将包含总行数。这可用于确定中值,因此不需要第二次传递或连接。

此外,当有偶数条记录时,AVG(dd.val)dd.row_number IN(...)用于正确地产生中位数。推理:

SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3

最后,MariaDB 10.3.3+包含一个MEDIAN函数

根据魔术贴的答案,对于那些必须根据另一个参数分组的东西做中位数的人来说

< p > < >之前 SELECT grp_field, t1。val FROM ( SELECT grp_field, @rownum:=IF(@s = grp_field, @rownum + 1,0) AS row_number, @s:=IF(@s = grp_field, @s, grp_field) AS sec, d.val FROM data d, (SELECT @rownum:=0, @s:=0 ORDER BY grp_field, d.val )作为t1 JOIN ( SELECT grp_field, count(*)为total_rows 数据d GROUP BY grp_field )为t2 在t1。Grp_field = t2.grp_field 在t1.row_number =地板(total_rows / 2) + 1; < / PRE > < / p >

我发现接受的解决方案在我的MySQL安装上不起作用,返回一个空集,但这个查询在我测试的所有情况下都适用:

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1

我的代码,高效,没有表或额外的变量:

SELECT
((SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', floor(1+((count(val)-1) / 2))), ',', -1))
+
(SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', ceiling(1+((count(val)-1) / 2))), ',', -1)))/2
as median
FROM table;

不幸的是,无论是TheJacobTaylor还是velcrow的答案都不会返回当前版本MySQL的准确结果。

从上面来看,魔术贴的答案是接近的,但它不能正确计算具有偶数行数的结果集。中位数定义为1)奇数集上的中间数,或2)偶数集上两个中间数的平均值。

所以,这里是魔术贴的解决方案修补处理奇数和偶数集:

SELECT AVG(middle_values) AS 'median' FROM (
SELECT t1.median_column AS 'middle_values' FROM
(
SELECT @row:=@row+1 as `row`, x.median_column
FROM median_table AS x, (SELECT @row:=0) AS r
WHERE 1
-- put some where clause here
ORDER BY x.median_column
) AS t1,
(
SELECT COUNT(*) as 'count'
FROM median_table x
WHERE 1
-- put same where clause here
) AS t2
-- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;

要使用它,请遵循以下3个简单步骤:

  1. 将上面代码中的“median_table”(出现2次)替换为您的表名
  2. 将“median_column”(3次)替换为您希望为其查找中位数的列名
  3. 如果你有一个WHERE条件,用WHERE条件替换“WHERE 1”(2次)

你也可以选择在存储过程中这样做:

DROP PROCEDURE IF EXISTS median;
DELIMITER //
CREATE PROCEDURE median (table_name VARCHAR(255), column_name VARCHAR(255), where_clause VARCHAR(255))
BEGIN
-- Set default parameters
IF where_clause IS NULL OR where_clause = '' THEN
SET where_clause = 1;
END IF;


-- Prepare statement
SET @sql = CONCAT(
"SELECT AVG(middle_values) AS 'median' FROM (
SELECT t1.", column_name, " AS 'middle_values' FROM
(
SELECT @row:=@row+1 as `row`, x.", column_name, "
FROM ", table_name," AS x, (SELECT @row:=0) AS r
WHERE ", where_clause, " ORDER BY x.", column_name, "
) AS t1,
(
SELECT COUNT(*) as 'count'
FROM ", table_name, " x
WHERE ", where_clause, "
) AS t2
-- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
WHERE t1.row >= t2.count/2
AND t1.row <= ((t2.count/2)+1)) AS t3
");


-- Execute statement
PREPARE stmt FROM @sql;
EXECUTE stmt;
END//
DELIMITER ;




-- Sample usage:
-- median(table_name, column_name, where_condition);
CALL median('products', 'price', NULL);

因为我只需要一个中位数和百分位数的解决方案,我根据这个线程中的发现做了一个简单而相当灵活的函数。我知道,如果我发现“现成的”功能很容易包含在我的项目中,我自己会很高兴,所以我决定快速分享:

function mysql_percentile($table, $column, $where, $percentile = 0.5) {


$sql = "
SELECT `t1`.`".$column."` as `percentile` FROM (
SELECT @rownum:=@rownum+1 as `row_number`, `d`.`".$column."`
FROM `".$table."` `d`,  (SELECT @rownum:=0) `r`
".$where."
ORDER BY `d`.`".$column."`
) as `t1`,
(
SELECT count(*) as `total_rows`
FROM `".$table."` `d`
".$where."
) as `t2`
WHERE 1
AND `t1`.`row_number`=floor(`total_rows` * ".$percentile.")+1;
";


$result = sql($sql, 1);


if (!empty($result)) {
return $result['percentile'];
} else {
return 0;
}


}

使用非常简单,例子来自我目前的项目:

...
$table = DBPRE."zip_".$slug;
$column = 'seconds';
$where = "WHERE `reached` = '1' AND `time` >= '".$start_time."'";


$reaching['median'] = mysql_percentile($table, $column, $where, 0.5);
$reaching['percentile25'] = mysql_percentile($table, $column, $where, 0.25);
$reaching['percentile75'] = mysql_percentile($table, $column, $where, 0.75);
...

上面的大多数解决方案只适用于表中的一个字段,您可能需要获得查询中多个字段的中位数(第50百分位数)。

我用这个:

SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','),
',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median`
FROM table_name;

你可以将上面例子中的“50”替换为任何百分位数,这是非常有效的。

只要确保你有足够的内存给GROUP_CONCAT,你可以改变它:

SET group_concat_max_len = 10485760; #10MB max length

更多详细信息:http://web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/

这是我的办法。当然,你可以把它放到一个过程中:-)

SET @median_counter = (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`);


SET @median = CONCAT('SELECT `val` FROM `data` ORDER BY `val` LIMIT ', @median_counter, ', 1');


PREPARE median FROM @median;


EXECUTE median;

你可以避免变量@median_counter,如果你替换它:

SET @median = CONCAT( 'SELECT `val` FROM `data` ORDER BY `val` LIMIT ',
(SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`),
', 1'
);


PREPARE median FROM @median;


EXECUTE median;
我下面提出的解决方案只在一个查询中工作,而不需要创建表、变量甚至子查询。 另外,它允许您在group-by查询中获得每个组的中位数(这就是我需要的!):

SELECT `columnA`,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(`columnB` ORDER BY `columnB`), ',', CEILING((COUNT(`columnB`)/2))), ',', -1) medianOfColumnB
FROM `tableC`
-- some where clause if you want
GROUP BY `columnA`;

它之所以能够工作,是因为巧妙地使用了group_concat和substring_index。

但是,为了允许大的group_concat,必须将group_concat_max_len设置为一个更高的值(默认为1024 char)。 你可以这样设置(对于当前的sql会话):

SET SESSION group_concat_max_len = 10000;
-- up to 4294967295 in 32-bits platform.

group_concat_max_len: https://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_group_concat_max_len的更多信息

在阅读了所有之前的内容后,它们与我的实际需求不匹配,所以我实现了自己的一个,不需要任何过程或复杂的语句,只是我GROUP_CONCAT所有来自我想要获得MEDIAN的列的值,并应用COUNT DIV BY 2,我从列表中间提取值,就像下面的查询一样:

(POS是我想要获得其中位数的列的名称)

(query) SELECT
SUBSTRING_INDEX (
SUBSTRING_INDEX (
GROUP_CONCAT(pos ORDER BY CAST(pos AS SIGNED INTEGER) desc SEPARATOR ';')
, ';', COUNT(*)/2 )
, ';', -1 ) AS `pos_med`
FROM table_name
GROUP BY any_criterial

我希望这能对一些人有用,就像这个网站上的许多其他评论对我一样。

另一个对Velcrow答案的重复,但使用了一个中间表,并利用了用于行编号的变量来获得计数,而不是执行额外的查询来计算它。还开始计数,以便第一行是第0行,以便简单地使用Floor和Ceil选择中位数行。

SELECT Avg(tmp.val) as median_val
FROM (SELECT inTab.val, @rows := @rows + 1 as rowNum
FROM data as inTab,  (SELECT @rows := -1) as init
-- Replace with better where clause or delete
WHERE 2 > 1
ORDER BY inTab.val) as tmp
WHERE tmp.rowNum in (Floor(@rows / 2), Ceil(@rows / 2));

知道确切的行数,你可以使用这个查询:

SELECT <value> AS VAL FROM <table> ORDER BY VAL LIMIT 1 OFFSET <half>

<half> = ceiling(<size> / 2.0) - 1

安装并使用mysql的统计函数:http://www.xarg.org/2012/07/statistical-functions-in-mysql/

之后,计算中值就很简单了:

SELECT median(val) FROM data;

我有一个包含大约10亿行的数据库,我们需要它来确定集合中的年龄中位数。对十亿行进行排序是困难的,但如果你将可以找到的不同值(年龄范围从0到100)聚合在一起,你可以对这个列表进行排序,并使用一些算术魔术来找到你想要的任何百分位数,如下所示:

with rawData(count_value) as
(
select p.YEAR_OF_BIRTH
from dbo.PERSON p
),
overallStats (avg_value, stdev_value, min_value, max_value, total) as
(
select avg(1.0 * count_value) as avg_value,
stdev(count_value) as stdev_value,
min(count_value) as min_value,
max(count_value) as max_value,
count(*) as total
from rawData
),
aggData (count_value, total, accumulated) as
(
select count_value,
count(*) as total,
SUM(count(*)) OVER (ORDER BY count_value ROWS UNBOUNDED PRECEDING) as accumulated
FROM rawData
group by count_value
)
select o.total as count_value,
o.min_value,
o.max_value,
o.avg_value,
o.stdev_value,
MIN(case when d.accumulated >= .50 * o.total then count_value else o.max_value end) as median_value,
MIN(case when d.accumulated >= .10 * o.total then count_value else o.max_value end) as p10_value,
MIN(case when d.accumulated >= .25 * o.total then count_value else o.max_value end) as p25_value,
MIN(case when d.accumulated >= .75 * o.total then count_value else o.max_value end) as p75_value,
MIN(case when d.accumulated >= .90 * o.total then count_value else o.max_value end) as p90_value
from aggData d
cross apply overallStats o
GROUP BY o.total, o.min_value, o.max_value, o.avg_value, o.stdev_value
;

这个查询取决于你的db支持窗口函数(包括ROWS UNBOUNDED precede),但如果你没有,这是一个简单的事情,将aggData CTE与自身连接,并将所有先前的总数聚合到' cumulative '列,用于确定哪个值包含指定的预分词。上面的示例计算p10、p25、p50(中位数)、p75和p90。

屁股的

< p >摘自: http://mdb-blog.blogspot.com/2015/06/mysql-find-median-nth-element-without.html < / p > 我建议另一种方式,没有加入, 但是使用字符串

我没有用大数据表检查它, 但小型/中型表,它工作得很好

这里的好处是,它也可以通过分组工作,因此它可以返回几个项目的中位数。

下面是测试表的测试代码:

DROP TABLE test.test_median
CREATE TABLE test.test_median AS
SELECT 'book' AS grp, 4 AS val UNION ALL
SELECT 'book', 7 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 9 UNION ALL
SELECT 'book', 8 UNION ALL
SELECT 'book', 3 UNION ALL


SELECT 'note', 11 UNION ALL


SELECT 'bike', 22 UNION ALL
SELECT 'bike', 26

求每组中位数的代码:

SELECT grp,
SUBSTRING_INDEX( SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val), ',', COUNT(*)/2 ), ',', -1) as the_median,
GROUP_CONCAT(val ORDER BY val) as all_vals_for_debug
FROM test.test_median
GROUP BY grp

输出:

grp | the_median| all_vals_for_debug
bike| 22        | 22,26
book| 4         | 2,2,3,4,7,8,9
note| 11        | 11

在某些情况下,中位数的计算如下:

“中位数”是数字列表中按值排序时的“中间”值。对于偶数计数集,中位数是两个中间值的平均值。 我为此创建了一个简单的代码:

$midValue = 0;
$rowCount = "SELECT count(*) as count {$from} {$where}";


$even = FALSE;
$offset = 1;
$medianRow = floor($rowCount / 2);
if ($rowCount % 2 == 0 && !empty($medianRow)) {
$even = TRUE;
$offset++;
$medianRow--;
}


$medianValue = "SELECT column as median
{$fromClause} {$whereClause}
ORDER BY median
LIMIT {$medianRow},{$offset}";


$medianValDAO = db_query($medianValue);
while ($medianValDAO->fetch()) {
if ($even) {
$midValue = $midValue + $medianValDAO->median;
}
else {
$median = $medianValDAO->median;
}
}
if ($even) {
$median = $midValue / 2;
}
return $median;

返回的$中位数将是所需的结果:-)

按维度分组的中位数:

SELECT your_dimension, avg(t1.val) as median_val FROM (
SELECT @rownum:=@rownum+1 AS `row_number`,
IF(@dim <> d.your_dimension, @rownum := 0, NULL),
@dim := d.your_dimension AS your_dimension,
d.val
FROM data d,  (SELECT @rownum:=0) r, (SELECT @dim := 'something_unreal') d
WHERE 1
-- put some where clause here
ORDER BY d.your_dimension, d.val
) as t1
INNER JOIN
(
SELECT d.your_dimension,
count(*) as total_rows
FROM data d
WHERE 1
-- put same where clause here
GROUP BY d.your_dimension
) as t2 USING(your_dimension)
WHERE 1
AND t1.row_number in ( floor((total_rows+1)/2), floor((total_rows+2)/2) )


GROUP BY your_dimension;
set @r = 0;


select
case when mod(c,2)=0 then round(sum(lat_N),4)
else round(sum(lat_N)/2,4)
end as Med
from
(select lat_N, @r := @r+1, @r as id from station order by lat_N) A
cross join
(select (count(1)+1)/2 as c from station) B
where id >= floor(c) and id <=ceil(c)

这种方法似乎包括偶数和奇数计数,没有子查询。

SELECT AVG(t1.x)
FROM table t1, table t2
GROUP BY t1.x
HAVING SUM(SIGN(t1.x - t2.x)) = 0
SELECT
SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(field ORDER BY field),
',',
((
ROUND(
LENGTH(GROUP_CONCAT(field)) -
LENGTH(
REPLACE(
GROUP_CONCAT(field),
',',
''
)
)
) / 2) + 1
)),
',',
-1
)
FROM
table

上面的方法似乎对我有用。

基于@bob的回答,这将查询泛化为能够返回多个中位数,并按某些标准分组。

想想,例如,一个车场二手车的中位数销售价格,按年-月分组。

SELECT
period,
AVG(middle_values) AS 'median'
FROM (
SELECT t1.sale_price AS 'middle_values', t1.row_num, t1.period, t2.count
FROM (
SELECT
@last_period:=@period AS 'last_period',
@period:=DATE_FORMAT(sale_date, '%Y-%m') AS 'period',
IF (@period<>@last_period, @row:=1, @row:=@row+1) as `row_num`,
x.sale_price
FROM listings AS x, (SELECT @row:=0) AS r
WHERE 1
-- where criteria goes here
ORDER BY DATE_FORMAT(sale_date, '%Y%m'), x.sale_price
) AS t1
LEFT JOIN (
SELECT COUNT(*) as 'count', DATE_FORMAT(sale_date, '%Y-%m') AS 'period'
FROM listings x
WHERE 1
-- same where criteria goes here
GROUP BY DATE_FORMAT(sale_date, '%Y%m')
) AS t2
ON t1.period = t2.period
) AS t3
WHERE
row_num >= (count/2)
AND row_num <= ((count/2) + 1)
GROUP BY t3.period
ORDER BY t3.period;

这些方法从同一个表中选择两次。如果源数据来自一个昂贵的查询,这是一种避免运行两次的方法:

select KEY_FIELD, AVG(VALUE_FIELD) MEDIAN_VALUE
from (
select KEY_FIELD, VALUE_FIELD, RANKF
, @rownumr := IF(@prevrowidr=KEY_FIELD,@rownumr+1,1) RANKR
, @prevrowidr := KEY_FIELD
FROM (
SELECT KEY_FIELD, VALUE_FIELD, RANKF
FROM (
SELECT KEY_FIELD, VALUE_FIELD
, @rownumf := IF(@prevrowidf=KEY_FIELD,@rownumf+1,1) RANKF
, @prevrowidf := KEY_FIELD
FROM (
SELECT KEY_FIELD, VALUE_FIELD
FROM (
-- some expensive query
)   B
ORDER BY  KEY_FIELD, VALUE_FIELD
) C
, (SELECT @rownumf := 1) t_rownum
, (SELECT @prevrowidf := '*') t_previd
) D
ORDER BY  KEY_FIELD, RANKF DESC
) E
, (SELECT @rownumr := 1) t_rownum
, (SELECT @prevrowidr := '*') t_previd
) F
WHERE RANKF-RANKR BETWEEN -1 and 1
GROUP BY KEY_FIELD
create table med(id integer);
insert into med(id) values(1);
insert into med(id) values(2);
insert into med(id) values(3);
insert into med(id) values(4);
insert into med(id) values(5);
insert into med(id) values(6);


select (MIN(count)+MAX(count))/2 from
(select case when (select count(*) from
med A where A.id<B.id)=(select count(*)/2 from med) OR
(select count(*) from med A where A.id>B.id)=(select count(*)/2
from med) then cast(B.id as float)end as count from med B) C;


?column?
----------
3.5
(1 row)

select cast(avg(id) as float) from
(select t1.id from med t1 JOIN med t2 on t1.id!= t2.id
group by t1.id having ABS(SUM(SIGN(t1.id-t2.id)))=1) A;

通常,我们不仅需要为整个表计算Median,还需要为与ID相关的聚合计算Median。换句话说,计算表中每个ID的中位数,其中每个ID有许多记录。(良好的性能和工作在许多SQL +修复偶数和奇数的问题,更多关于不同的中值方法https://sqlperformance.com/2012/08/t-sql-queries/median的性能)

SELECT our_id, AVG(1.0 * our_val) as Median
FROM
( SELECT our_id, our_val,
COUNT(*) OVER (PARTITION BY our_id) AS cnt,
ROW_NUMBER() OVER (PARTITION BY our_id ORDER BY our_val) AS rn
FROM our_table
) AS x
WHERE rn IN ((cnt + 1)/2, (cnt + 2)/2) GROUP BY our_id;

希望能有所帮助

我有下面的代码,我在HackerRank上找到的,它很简单,适用于每一种情况。

SELECT M.MEDIAN_COL FROM MEDIAN_TABLE M WHERE
(SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL < M.MEDIAN_COL ) =
(SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL > M.MEDIAN_COL );

下面的SQL代码将帮助您使用用户定义的变量来计算MySQL中的中位数。

create table employees(salary int);


insert into employees values(8);
insert into employees values(23);
insert into employees values(45);
insert into employees values(123);
insert into employees values(93);
insert into employees values(2342);
insert into employees values(2238);


select * from employees;


Select salary from employees  order by salary;


set @rowid=0;
set @cnt=(select count(*) from employees);
set @middle_no=ceil(@cnt/2);
set @odd_even=null;


select AVG(salary) from
(select salary,@rowid:=@rowid+1 as rid, (CASE WHEN(mod(@cnt,2)=0) THEN @odd_even:=1 ELSE @odd_even:=0 END) as odd_even_status  from employees  order by salary) as tbl where tbl.rid=@middle_no or tbl.rid=(@middle_no+@odd_even);

如果你正在寻找详细的解释,请参考这个博客。

我发现这个答案非常有用——https://www.eversql.com/how-to-calculate-median-value-in-mysql-using-a-simple-sql-query/

SET @rowindex := -1;


SELECT
AVG(g.grade)
FROM
(SELECT @rowindex:=@rowindex + 1 AS rowindex,
grades.grade AS grade
FROM grades
ORDER BY grades.grade) AS g
WHERE
g.rowindex IN (FLOOR(@rowindex / 2) , CEIL(@rowindex / 2));

MySQL从8.0版本开始支持窗口函数,你可以使用ROW_NUMBERDENSE_RANK (使用RANK,因为它将相同的排名分配给相同的值,就像在体育排名中一样):

SELECT AVG(t1.val) AS median_val
FROM (SELECT val,
ROW_NUMBER() OVER(ORDER BY val) AS rownum
FROM data) t1,
(SELECT COUNT(*) AS num_records FROM data) t2
WHERE t1.row_num IN
(FLOOR((t2.num_records + 1) / 2),
FLOOR((t2.num_records + 2) / 2));

归档完美中位数的单个查询:

SELECT
COUNT(*) as total_rows,
IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2)) as median,
AVG(val) as average
FROM
data

一个简单的方法来计算中位数在MySQL

set @ct := (select count(1) from station);
set @row := 0;


select avg(a.val) as median from
(select * from  table order by val) a
where (select @row := @row + 1)
between @ct/2.0 and @ct/2.0 +1;

下面的查询对于奇数行和偶数行都非常有效。在子查询中,我们正在寻找前后行数相同的值。对于奇数行的情况,having子句的值将为0(前后相同的行数将抵消符号)。

类似地,对于偶数行,having子句对于两行(中间的两行)的计算结果为1,因为它们(总的来说)前后的行数相同。

在外层查询中,我们将平均出单个值(奇数行)或(偶数行2个值)。

select avg(val) as median
from
(
select d1.val
from data d1 cross join data d2
group by d1.val
having abs(sum(sign(d1.val-d2.val))) in (0,1)
) sub

注意:如果你的表有重复的值,上面的having子句应该更改为下面的条件。在这种情况下,可能有一些值超出了原来的可能性(0,1)下面的条件将使这个条件动态,并在重复的情况下工作。

having sum(case when d1.val=d2.val then 1 else 0 end)>=
abs(sum(sign(d1.val-d2.val)))

试着这样做:

SELECT
CAST (AVG(val) AS DECIMAL(10,4))
FROM
(
SELECT
val,
ROW_NUMBER() OVER( ORDER BY val ) -1 AS rn,
COUNT(1) OVER () -1 AS cnt
FROM STATION
) as tmp
WHERE rn IN (FLOOR(cnt/2),CEILING (cnt/2))

**

注意:-1的原因是使其为0索引..i。E行号 现在从0开始,而不是1

**

我没有将这个解决方案的性能与这里发布的其他答案进行比较,但我发现这是最直接易懂的,并且涵盖了数学公式用于计算中位数的全部范围。换句话说,这个解决方案对于偶数和奇数数据集足够健壮:

SELECT CASE
-- odd-numbered data sets:
WHEN MOD(COUNT(*), 2) = 1 THEN (SELECT median.<value> AS median
FROM
(SELECT t1.<value>
FROM (SELECT <value>,
ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
FROM <data>) t1,
(SELECT COUNT(*) AS num_records FROM <data>) t2
WHERE t1.rownum =(t2.num_records) / 2) as median)
-- even-numbered data sets:
ELSE (select (low_bound.<value> + up_bound.<value>) / 2 AS median
FROM
(SELECT t1.<value>
FROM (SELECT <value>,
ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
FROM <data>) t1,
(SELECT COUNT(*) AS num_records FROM <data>) t2
WHERE t1.rownum =(t2.num_records - 1) / 2) as low_bound,
(SELECT t1.<value>
FROM (SELECT <value>,
ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
FROM station) t1,
(SELECT COUNT(*) AS num_records FROM data) t2
WHERE t1.rownum =(t2.num_records + 1) / 2) as up_bound)
END
FROM <data>

最简单和快速的方法来计算中位数在mysql。

select x.col
from   (select lat_n,
count(1) over (partition by 'A')        as total_rows,
row_number() over (order by col asc) as rank_Order
from   station ft) x
where  x.rank_Order = round(x.total_rows / 2.0, 0)

ORACLE的简单解决方案:

SELECT ROUND(MEDIAN(Lat_N), 4) FROM Station;

简单的解决方案,理解MySQL:

select case MOD(count(lat_n),2)
when 1 then (select round(S.LAT_N,4) from station S where (select count(Lat_N) from station where Lat_N < S.LAT_N ) = (select count(Lat_N) from station where Lat_N > S.LAT_N))
else (select round(AVG(S.LAT_N),4) from station S where 1 = (select count(Lat_N) from station where Lat_N < S.LAT_N ) - (select count(Lat_N) from station where Lat_N > S.LAT_N))
end from station;

解释

STATION是表名。LAT_N是具有数值的列名

假设站表中有有101条记录(奇数)。这意味着如果表以asc或desc排序,则中位数是第51条记录。

在上面的查询中,对于S表的每一个S.LAT_N,我创建了两个表。一个用于小于S.LAT_N的LAT_N值的数量,另一个用于大于S.LAT_N的LAT_N值的数量。后来我比较这两个表,如果他们是匹配的,然后我选择S.LAT_N值。当我检查第51条记录时,有50个值小于第51条记录,有50个记录大于第51条记录。如您所见,两个表中都有50条记录。这就是答案。对于每一个其他记录,在两个表中创建不同数量的记录进行比较。所以,只有第51条记录符合条件。

现在假设站表中有有100条记录(偶数)。这意味着如果表以asc或desc排序,则中位数是第50条和第51条记录的平均值。

和奇怪的逻辑一样,我创建了两个表。一个用于小于S.LAT_N的LAT_N值的数量,另一个用于大于S.LAT_N的LAT_N值的数量。后来我比较这两个表,如果它们的差异等于1,那么我选择S.LAT_N值并找到平均值。当我检查第50条记录时,有49个值小于第50条记录,有51个记录大于第50条记录。如您所见,两个表中有1条记录的差异。所以这个(第50个记录)是我们的第一个平均记录。类似地,当我检查第51条记录时,有50个值小于第51条记录,有49个记录大于第51条记录。如您所见,两个表中有1条记录的差异。所以这个(第51个记录)是我们的第2个平均记录。对于每一个其他记录,在两个表中创建不同数量的记录进行比较。因此,只有第50和51条记录符合条件。

如果这是MySQL,现在有窗口函数,你可以这样做(假设你想四舍五入到最接近的整数-否则只需用CEILFLOOR或其他东西替换ROUND)。下面的解决方案适用于表,无论表的行数是偶数还是奇数:


WITH CTE AS (
SELECT val,
ROW_NUMBER() OVER (ORDER BY val ASC) AS rn,
COUNT(*) OVER () AS total_count
FROM data
)
SELECT ROUND(AVG(val)) AS median
FROM CTE
WHERE
rn BETWEEN
total_count / 2.0 AND
total_count / 2.0 + 1;


我认为这个话题最近的一些答案已经涉及到这种方法,但似乎人们想太多了,所以把它看作一个改进版本。不管SQL的风格如何,任何人都没有理由为了在2021年得到中位数而编写一段包含多个子查询的代码。但是,请注意,上面的查询只在你被要求找到< >强连续< / >强系列的中位数时有效。当然,不管行号是多少,对于连续序列,有时人们确实会区分所谓的离散值和所谓的插入值

如果你被要求找到< >强离散系列< / >强的中位数,而表的行数是偶数,那么上面的解决方案对你不起作用,你应该恢复使用其他解决方案之一,比如TheJacobTaylor的。

下面的第二个解决方案是TheJacobTaylor的稍微修改版本,其中我显式地声明CROSS JOIN。这个方法也适用于行数为奇数的表,不管你是被要求求连续序列的中位数还是离散序列的中位数,但我特别会在被要求求离散序列的中位数时使用这个方法。否则,使用第一种解决方案。这样,您就永远不必考虑数据是包含“偶数”还是“奇数”个数的数据点。


SELECT x.val AS median
FROM data x
CROSS JOIN data y
GROUP BY x.val
HAVING SUM(SIGN(1 - SIGN(y.val - x.val))) = (COUNT(*) + 1) / 2;


最后,你可以在PostgreSQL中使用内置函数轻松做到这一点。这里有一个很好的解释,以及关于离散中位数和插值中位数的有效总结。

https://leafo.net/guides/postgresql-calculating-percentile.html#calculating-the-median

对于一个表站和列lat_n,下面是MySQL代码来获得中位数:

set @rows := (select count(1) from station);
set @v1 := 0;
set @sql1 := concat('select lat_n into @v1 from station order by lat_n asc limit 1 offset ', ceil(@rows/2) - 1);
prepare statement1 from @sql1;
execute statement1;
set @v2 := 0;
set @sql2 := concat('select lat_n into @v2 from station order by lat_n asc limit 1 offset ', ceil((@rows + 1)/2) - 1);
prepare statement2 from @sql2;
execute statement2;
select (@v1 + @v2)/2;

我使用下表的解决方案在MySQL:

CREATE TABLE transactions (
transaction_id int , user_id int , merchant_name varchar(255), transaction_date date , amount int
);


INSERT INTO transactions (transaction_id, user_id, merchant_name, transaction_date, amount)
VALUES (1, 1 ,'abc', '2015-08-17', 100),(2, 2, 'ced', '2015-2-17', 100),(3, 1, 'def', '2015-2-16', 121),
(4, 1 ,'ced', '2015-3-17', 110),(5, 1, 'ced', '2015-3-17', 150),(6, 2 ,'abc', '2015-4-17', 130),
(7, 3 ,'ced', '2015-12-17', 10),(8, 3 ,'abc', '2015-8-17', 100),(9, 2 ,'abc', '2015-12-17', 140),(10, 1,'abc', '2015-9-17', 100),
(11, 1 ,'abc', '2015-08-17', 121),(12, 2 ,'ced', '2015-12-23', 130),(13, 1 ,'def', '2015-12-23', 13),(3, 4, 'abc', '2015-2-16', 120),(3, 4, 'def', '2015-2-16', 121),(3, 4, 'ced', '2015-2-16', 121);

计算“金额”列的中位数:

WITH Numbered AS
(
SELECT *, COUNT(*) OVER () AS TotatRecords,
ROW_NUMBER() OVER (ORDER BY amount) AS RowNum
FROM transactions
)
SELECT Avg(amount)
FROM Numbered
WHERE RowNum IN ( FLOOR((TotatRecords+1)/2), FLOOR((TotatRecords+2)/2) )
;

TotalRecords = 16 and Median = 120.5000

此查询将适用于两种情况,即偶数和奇数记录。

您可以使用窗口函数row_number()来回答查询以找到介质

select val
from (select val, row_number() over (order by val) as rownumber, x.cnt
from data, (select count(*) as cnt from data) x) abc
where rownumber=ceil(cnt/2);