用熊猫和 matplotlib 绘制分类数据

我有一个包含绝对数据的数据框架:

     colour  direction
1    red     up
2    blue    up
3    green   down
4    red     left
5    red     right
6    yellow  down
7    blue    down

我想生成一些图表,如饼图和直方图的基础上的类别。有没有可能不创建虚拟的数值变量?差不多

df.plot(kind='hist')
218517 次浏览

你可以简单地在系列中使用 value_counts:

df['colour'].value_counts().plot(kind='bar')

enter image description here

像这样:

df.groupby('colour').size().plot(kind='bar')

你可以从 statsmodel 中找到有用的 mosaic图,它也可以给出方差的统计突出显示。

from statsmodels.graphics.mosaicplot import mosaic
plt.rcParams['font.size'] = 16.0
mosaic(df, ['direction', 'colour']);

enter image description here

但是要小心0大小的单元格-它们会导致标签问题。

详情请参阅 这个答案

您也可以使用 seaborn中的 countplot。这个包构建在 pandas基础上,创建一个高级绘图接口。它给你良好的造型和正确的轴标签免费。

import pandas as pd
import seaborn as sns
sns.set()


df = pd.DataFrame({'colour': ['red', 'blue', 'green', 'red', 'red', 'yellow', 'blue'],
'direction': ['up', 'up', 'down', 'left', 'right', 'down', 'down']})
sns.countplot(df['colour'], color='gray')

enter image description here

它还支持使用一个小技巧为条上色

sns.countplot(df['colour'],
palette={color: color for color in df['colour'].unique()})

enter image description here

要在同一个图表上绘制多个分类特征作为条形图,我建议:

import pandas as pd
import matplotlib.pyplot as plt


df = pd.DataFrame(
{
"colour": ["red", "blue", "green", "red", "red", "yellow", "blue"],
"direction": ["up", "up", "down", "left", "right", "down", "down"],
}
)


categorical_features = ["colour", "direction"]
fig, ax = plt.subplots(1, len(categorical_features))
for i, categorical_feature in enumerate(df[categorical_features]):
df[categorical_feature].value_counts().plot("bar", ax=ax[i]).set_title(categorical_feature)
fig.show()

enter image description here

您可以简单地使用 value_counts,将 sort选项设置为 False。这将保留类别的顺序

df['colour'].value_counts(sort=False).plot.bar(rot=0)

link to image

使用阴谋

import plotly.express as px
px.bar(df["colour"].value_counts())

Roman 的回答是非常有用和正确的,但是在最新版本中,您还需要指定 kind,因为参数的顺序可能会发生变化。

import pandas as pd
import matplotlib.pyplot as plt


df = pd.DataFrame(
{
"colour": ["red", "blue", "green", "red", "red", "yellow", "blue"],
"direction": ["up", "up", "down", "left", "right", "down", "down"],
}
)


categorical_features = ["colour", "direction"]
fig, ax = plt.subplots(1, len(categorical_features))
for i, categorical_feature in enumerate(df[categorical_features]):
df[categorical_feature].value_counts().plot(kind="bar", ax=ax[i]).set_title(categorical_feature)
fig.show()

Pandas.Series.plot.pie

Https://pandas.pydata.org/docs/reference/api/pandas

我们可以在不偏离内置功能的情况下做得更好一些。

人们喜欢在饼状图上讨厌,但是它们和马赛克/树有同样的好处; 它们有助于保持整体比例的可解释性。

kwargs = dict(
startangle = 90,
colormap   = 'Pastel2',
fontsize   = 13,
explode    = (0.1,0.1,0.1),
figsize    = (60,5),
autopct    = '%1.1f%%',
title      = 'Chemotherapy Stratification'
)


df['treatment_chemo'].value_counts().plot.pie(**kwargs)

enter image description here