计算 c # 中的中位数

我需要写一个接受小数数组的函数,它会找到中间值。

Is there a function in the .net Math library?

135353 次浏览
decimal Median(decimal[] xs) {
Array.Sort(xs);
return xs[xs.Length / 2];
}

Should do the trick.

编辑

对于那些想要完整的 monty 的人来说,下面是完整的、简短的、纯粹的解决方案(假设输入数组是非空的) :

decimal Median(decimal[] xs) {
var ys = xs.OrderBy(x => x).ToList();
double mid = (ys.Count - 1) / 2.0;
return (ys[(int)(mid)] + ys[(int)(mid + 0.5)]) / 2;
}

在.net 数学库中有一个函数吗?

没有。

不过写自己的并不难。初始算法对数组进行排序,并选择中间元素(或两个中间元素的平均值)。然而,该算法是 O(n log n)算法,而且有可能在 O(n)时间内解决这个问题。您需要查看 选择算法来获得这样的算法。

谢谢雷夫,这考虑到问题,你的回复张贴。

public static double GetMedian(double[] sourceNumbers) {
//Framework 2.0 version of this method. there is an easier way in F4
if (sourceNumbers == null || sourceNumbers.Length == 0)
throw new System.Exception("Median of empty array not defined.");


//make sure the list is sorted, but use a new array
double[] sortedPNumbers = (double[])sourceNumbers.Clone();
Array.Sort(sortedPNumbers);


//get the median
int size = sortedPNumbers.Length;
int mid = size / 2;
double median = (size % 2 != 0) ? (double)sortedPNumbers[mid] : ((double)sortedPNumbers[mid] + (double)sortedPNumbers[mid - 1]) / 2;
return median;
}

Looks like other answers are using sorting. That's not optimal from performance point of view because it takes O(n logn) time. It is possible to calculate median in O(n) time instead. The generalized version of this problem is known as "n-order statistics" which means finding an element K in a set such that we have n elements smaller or equal to K and rest are larger or equal K. So 0th order statistic would be minimal element in the set (Note: Some literature use index from 1 to N instead of 0 to N-1). Median is simply (Count-1)/2-order statistic.

下面是从 算法导论,Cormen 等人,第3版中采用的代码。

/// <summary>
/// Partitions the given list around a pivot element such that all elements on left of pivot are <= pivot
/// and the ones at thr right are > pivot. This method can be used for sorting, N-order statistics such as
/// as median finding algorithms.
/// Pivot is selected ranodmly if random number generator is supplied else its selected as last element in the list.
/// Reference: Introduction to Algorithms 3rd Edition, Corman et al, pp 171
/// </summary>
private static int Partition<T>(this IList<T> list, int start, int end, Random rnd = null) where T : IComparable<T>
{
if (rnd != null)
list.Swap(end, rnd.Next(start, end+1));


var pivot = list[end];
var lastLow = start - 1;
for (var i = start; i < end; i++)
{
if (list[i].CompareTo(pivot) <= 0)
list.Swap(i, ++lastLow);
}
list.Swap(end, ++lastLow);
return lastLow;
}


/// <summary>
/// Returns Nth smallest element from the list. Here n starts from 0 so that n=0 returns minimum, n=1 returns 2nd smallest element etc.
/// Note: specified list would be mutated in the process.
/// Reference: Introduction to Algorithms 3rd Edition, Corman et al, pp 216
/// </summary>
public static T NthOrderStatistic<T>(this IList<T> list, int n, Random rnd = null) where T : IComparable<T>
{
return NthOrderStatistic(list, n, 0, list.Count - 1, rnd);
}
private static T NthOrderStatistic<T>(this IList<T> list, int n, int start, int end, Random rnd) where T : IComparable<T>
{
while (true)
{
var pivotIndex = list.Partition(start, end, rnd);
if (pivotIndex == n)
return list[pivotIndex];


if (n < pivotIndex)
end = pivotIndex - 1;
else
start = pivotIndex + 1;
}
}


public static void Swap<T>(this IList<T> list, int i, int j)
{
if (i==j)   //This check is not required but Partition function may make many calls so its for perf reason
return;
var temp = list[i];
list[i] = list[j];
list[j] = temp;
}


/// <summary>
/// Note: specified list would be mutated in the process.
/// </summary>
public static T Median<T>(this IList<T> list) where T : IComparable<T>
{
return list.NthOrderStatistic((list.Count - 1)/2);
}


public static double Median<T>(this IEnumerable<T> sequence, Func<T, double> getValue)
{
var list = sequence.Select(getValue).ToList();
var mid = (list.Count - 1) / 2;
return list.NthOrderStatistic(mid);
}

Few notes:

  1. This code replaces tail recursive code from the original version in book in to iterative loop.
  2. 它还消除了 start = = end 时对原始版本不必要的额外检查。
  3. 我提供了两个版本的 Median,一个接受 IEnumable,然后创建一个列表。如果您使用接受 IList 的版本,请记住它会修改列表中的顺序。
  4. Above methods calculates median or any i-order statistics in O(n) 预计时间. If you want O(n) 最坏的情况 then there is technique to use median-of-median. While this would improve worse case performance, it degrades average case because constant in O(n) is now larger. However if you would be calculating median mostly on very large data then its worth to look at.
  5. NthOrderStatistics 方法允许传递随机数生成器,然后使用该生成器在分区期间选择随机轴。这通常是没有必要的,除非您知道您的数据具有某些模式,这样最后一个元素就不会足够随机,或者您的代码以某种方式暴露在外面以供有针对性的利用。
  6. 中值的定义是清楚的,如果你有奇数个元素。它只是排序数组中带有 (Count-1)/2索引的元素。但是,当你偶数的元素 (Count-1)/2不再是一个整数,你有两个中位数: 较低的中位数 Math.Floor((Count-1)/2)Math.Ceiling((Count-1)/2)。一些教科书使用较低的中位数作为“标准”,而另一些教科书建议使用两个平均数。这个问题对于2个元素的集合来说变得尤为关键。以上代码返回较低的中位数。如果你想代替平均的低和高,那么你需要调用上述代码两次。在这种情况下,确保测量数据的性能,以决定是否应该使用上面的代码 VS 仅仅进行直接排序。
  7. 对于.net 4.5 + ,可以在 Swap<T>方法上添加 MethodImplOptions.AggressiveInlining属性,以略微提高性能。

下面是一个 快速选择实现。它是从这个 文章中提取的 unsafe C 代码的实现,它比较了几种算法,发现 QuickSelect 是 平均最快

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
private static unsafe void SwapElements(int* p, int* q)
{
int temp = *p;
*p = *q;
*q = temp;
}


public static unsafe int Median(int[] arr, int n)
{
int middle, ll, hh;


int low = 0; int high = n - 1; int median = (low + high) / 2;
fixed (int* arrptr = arr)
{
for (;;)
{
if (high <= low)
return arr[median];


if (high == low + 1)
{
if (arr[low] > arr[high])
SwapElements(arrptr + low, arrptr + high);
return arr[median];
}


middle = (low + high) / 2;
if (arr[middle] > arr[high])
SwapElements(arrptr + middle, arrptr + high);


if (arr[low] > arr[high])
SwapElements(arrptr + low, arrptr + high);


if (arr[middle] > arr[low])
SwapElements(arrptr + middle, arrptr + low);


SwapElements(arrptr + middle, arrptr + low + 1);


ll = low + 1;
hh = high;
for (;;)
{
do ll++; while (arr[low] > arr[ll]);
do hh--; while (arr[hh] > arr[low]);


if (hh < ll)
break;


SwapElements(arrptr + ll, arrptr + hh);
}


SwapElements(arrptr + low, arrptr + hh);


if (hh <= median)
low = ll;
if (hh >= median)
high = hh - 1;
}
}
}

下面的代码工作: 但不是很有效的方法

static void Main(String[] args) {
int n = Convert.ToInt32(Console.ReadLine());
int[] medList = new int[n];


for (int x = 0; x < n; x++)
medList[x] = int.Parse(Console.ReadLine());


//sort the input array:
//Array.Sort(medList);
for (int x = 0; x < n; x++)
{
double[] newArr = new double[x + 1];
for (int y = 0; y <= x; y++)
newArr[y] = medList[y];


Array.Sort(newArr);
int curInd = x + 1;
if (curInd % 2 == 0) //even
{
int mid = (x / 2) <= 0 ? 0 : (newArr.Length / 2);
if (mid > 1) mid--;
double median = (newArr[mid] + newArr[mid+1]) / 2;
Console.WriteLine("{0:F1}", median);
}
else //odd
{
int mid = (x / 2) <= 0 ? 0 : (newArr.Length / 2);
double median = newArr[mid];
Console.WriteLine("{0:F1}", median);
}
}


}

CenterSpace 的 NMath 库提供了一个功能:

double[] values = new double[arraySize];
double median = NMathFunctions.Median(values);

Optionally you can opt to use NaNMedian (if your array may contain null values) but you will need to convert the array to a vector:

double median = NMathFunctions.NaNMedian(new DoubleVector(values));

CenterSpace 的 NMath Library 并不是免费的,但是许多大学都有许可证

Here's a generic version of Jason's answer

    /// <summary>
/// Gets the median value from an array
/// </summary>
/// <typeparam name="T">The array type</typeparam>
/// <param name="sourceArray">The source array</param>
/// <param name="cloneArray">If it doesn't matter if the source array is sorted, you can pass false to improve performance</param>
/// <returns></returns>
public static T GetMedian<T>(T[] sourceArray, bool cloneArray = true) where T : IComparable<T>
{
//Framework 2.0 version of this method. there is an easier way in F4
if (sourceArray == null || sourceArray.Length == 0)
throw new ArgumentException("Median of empty array not defined.");


//make sure the list is sorted, but use a new array
T[] sortedArray = cloneArray ? (T[])sourceArray.Clone() : sourceArray;
Array.Sort(sortedArray);


//get the median
int size = sortedArray.Length;
int mid = size / 2;
if (size % 2 != 0)
return sortedArray[mid];


dynamic value1 = sortedArray[mid];
dynamic value2 = sortedArray[mid - 1];
return (sortedArray[mid] + value2) * 0.5;
}

将来的某个时候,我想这是最简单的方法了。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;


namespace Median
{
class Program
{
static void Main(string[] args)
{
var mediaValue = 0.0;
var items = new[] { 1, 2, 3, 4,5 };
var getLengthItems = items.Length;
Array.Sort(items);
if (getLengthItems % 2 == 0)
{
var firstValue = items[(items.Length / 2) - 1];
var secondValue = items[(items.Length / 2)];
mediaValue = (firstValue + secondValue) / 2.0;
}
if (getLengthItems % 2 == 1)
{
mediaValue = items[(items.Length / 2)];
}
Console.WriteLine(mediaValue);
Console.WriteLine("Enter to Exit!");
Console.ReadKey();
}
}
}

Math.NET is an opensource library that offers a method for calculating the 中位数. 这个 Nuget 包被称为 MathNet 数字

用法很简单:

using MathNet.Numerics.Statistics;


IEnumerable<double> data;
double median = data.Median();

我的5美分(因为对于短名单来说,它看起来更简单明了) :

public static T Median<T>(this IEnumerable<T> items)
{
var i = (int)Math.Ceiling((double)(items.Count() - 1) / 2);
if (i >= 0)
{
var values = items.ToList();
values.Sort();
return values[i];
}


return default(T);
}

P.S. 使用 ShitalShah 所描述的“较高中值”。

我有一个变量: 组的直方图
Here how I calculate my median :

int[] group = new int[nbr];


// -- Fill the group with values---


// sum all data in median
int median = 0;
for (int i =0;i<nbr;i++) median += group[i];


// then divide by 2
median = median / 2;


// find 50% first part
for (int i = 0; i < nbr; i++)
{
median -= group[i];
if (median <= 0)
{
median = i;
break;
}
}

中位数是中位数的组指数