将密度曲线拟合到 R 中的直方图

R 中是否有一个函数可以使曲线与直方图相吻合?

假设你有下面的直方图

hist(c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4)))

看起来很正常,但是有点歪。我想拟合一个正常的曲线,它是倾斜的,围绕着这个直方图。

这个问题相当基本,但我似乎无法在互联网上找到 R 的答案。

298752 次浏览

If I understand your question correctly, then you probably want a density estimate along with the histogram:

X <- c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4))
hist(X, prob=TRUE)            # prob=TRUE for probabilities not counts
lines(density(X))             # add a density estimate with defaults
lines(density(X, adjust=2), lty="dotted")   # add another "smoother" density

Edit a long while later:

Here is a slightly more dressed-up version:

X <- c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4))
hist(X, prob=TRUE, col="grey")# prob=TRUE for probabilities not counts
lines(density(X), col="blue", lwd=2) # add a density estimate with defaults
lines(density(X, adjust=2), lty="dotted", col="darkgreen", lwd=2)

along with the graph it produces:

enter image description here

Here's the way I do it:

foo <- rnorm(100, mean=1, sd=2)
hist(foo, prob=TRUE)
curve(dnorm(x, mean=mean(foo), sd=sd(foo)), add=TRUE)

A bonus exercise is to do this with ggplot2 package ...

Such thing is easy with ggplot2

library(ggplot2)
dataset <- data.frame(X = c(rep(65, times=5), rep(25, times=5),
rep(35, times=10), rep(45, times=4)))
ggplot(dataset, aes(x = X)) +
geom_histogram(aes(y = ..density..)) +
geom_density()

or to mimic the result from Dirk's solution

ggplot(dataset, aes(x = X)) +
geom_histogram(aes(y = ..density..), binwidth = 5) +
geom_density()

Dirk has explained how to plot the density function over the histogram. But sometimes you might want to go with the stronger assumption of a skewed normal distribution and plot that instead of density. You can estimate the parameters of the distribution and plot it using the sn package:

> sn.mle(y=c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4)))
$call
sn.mle(y = c(rep(65, times = 5), rep(25, times = 5), rep(35,
times = 10), rep(45, times = 4)))


$cp
mean     s.d. skewness
41.46228 12.47892  0.99527

Skew-normal distributed data plot

This probably works better on data that is more skew-normal:

Another skew-normal plot

I had the same problem but Dirk's solution didn't seem to work. I was getting this warning messege every time

"prob" is not a graphical parameter

I read through ?hist and found about freq: a logical vector set TRUE by default.

the code that worked for me is

hist(x,freq=FALSE)
lines(density(x),na.rm=TRUE)

It's the kernel density estimation, and please hit this link to check a great illustration for the concept and its parameters.

The shape of the curve depends mostly on two elements: 1) the kernel(usually Epanechnikov or Gaussian) that estimates a point in the y coordinate for every value in the x coordinate by inputting and weighing all data; and it is symmetric and usually a positive function that integrates into one; 2) the bandwidth, the larger the smoother the curve, and the smaller the more wiggled the curve.

For different requirements, different packages should be applied, and you can refer to this document: Density estimation in R. And for multivariate variables, you can turn to the multivariate kernel density estimation.

Some comments requested scaling the density estimate line to the peak of the histogram so that the y axis would remain as counts rather than density. To achieve this I wrote a small function to automatically pull the max bin height and scale the y dimension of the density function accordingly.

hist_dens <- function(x, breaks = "Scott", main = "title", xlab = "x", ylab = "count") {
  

dens <- density(x, na.rm = T)
  

raw_hist <- hist(x, breaks = breaks, plot = F)
  

scale <- max(raw_hist$counts)/max(raw_hist$density)
  

hist(x, breaks = breaks, prob = F, main = main, xlab = xlab, ylab = ylab)
  

lines(list(x = dens$x, y = scale * dens$y), col = "red", lwd = 2)
  

}


hist_dens(rweibull(1000, 2))

Created on 2021-12-19 by the reprex package (v2.0.1)