如何扩展输出显示以查看Pandas DataFrame的更多列?

有没有办法在交互式或脚本执行模式下扩大输出显示?

具体来说,我在PandasDataFrame上使用describe()函数。当DataFrame宽五列(标签)时,我得到我想要的描述性统计数据。但是,如果DataFrame有更多的列,统计数据将被抑制并返回类似这样的内容:

>> Index: 8 entries, count to max>> Data columns:>> x1          8  non-null values>> x2          8  non-null values>> x3          8  non-null values>> x4          8  non-null values>> x5          8  non-null values>> x6          8  non-null values>> x7          8  non-null values

无论有6列还是7列,都会给出“8”值。“8”指的是什么?

我已经尝试过将IDLE窗口拖大,以及增加“配置IDLE”宽度选项,但无济于事。

1295328 次浏览

您可以使用print df.describe().to_string()强制它显示整个表。(您可以像这样对任何DataFrame使用to_string()describe的结果只是一个DataFrame本身。)

8是包含“描述”的DataFrame中的行数(因为describe计算8个统计信息,最小值,最大值,平均值等)。

您可以使用set_printoptions调整Pandas打印选项。

In [3]: df.describe()Out[3]:<class 'pandas.core.frame.DataFrame'>Index: 8 entries, count to maxData columns:x1    8  non-null valuesx2    8  non-null valuesx3    8  non-null valuesx4    8  non-null valuesx5    8  non-null valuesx6    8  non-null valuesx7    8  non-null valuesdtypes: float64(7)
In [4]: pd.set_printoptions(precision=2)
In [5]: df.describe()Out[5]:x1       x2       x3       x4       x5       x6       x7count      8.0      8.0      8.0      8.0      8.0      8.0      8.0mean   69024.5  69025.5  69026.5  69027.5  69028.5  69029.5  69030.5std       17.1     17.1     17.1     17.1     17.1     17.1     17.1min    69000.0  69001.0  69002.0  69003.0  69004.0  69005.0  69006.025%    69012.2  69013.2  69014.2  69015.2  69016.2  69017.2  69018.250%    69024.5  69025.5  69026.5  69027.5  69028.5  69029.5  69030.575%    69036.8  69037.8  69038.8  69039.8  69040.8  69041.8  69042.8max    69049.0  69050.0  69051.0  69052.0  69053.0  69054.0  69055.0

但是,这并非在所有情况下都有效,因为Pandas会检测您的控制台宽度,并且只有在输出适合控制台时才会使用to_string(请参阅set_printoptions的文档字符串)。在这种情况下,您可以显式调用to_string作为BrenBarn的回答。

更新

在0.10版中,打印宽数据帧的方式改变

In [3]: df.describe()Out[3]:x1            x2            x3            x4            x5  \count      8.000000      8.000000      8.000000      8.000000      8.000000mean   59832.361578  27356.711336  49317.281222  51214.837838  51254.839690std    22600.723536  26867.192716  28071.737509  21012.422793  33831.515761min    31906.695474   1648.359160     56.378115  16278.322271     43.74557425%    45264.625201  12799.540572  41429.628749  40374.273582  29789.64387550%    56340.214856  18666.456293  51995.661512  54894.562656  47667.68442275%    75587.003417  31375.610322  61069.190523  67811.893435  76014.884048max    98136.474782  84544.484627  91743.983895  75154.587156  99012.695717
x6            x7count      8.000000      8.000000mean   41863.000717  33950.235126std    38709.468281  29075.745673min     3590.990740   1833.46415425%    15145.759625   6879.52394950%    22139.243042  33706.02994675%    72038.983496  51449.893980max    98601.190488  83309.051963

此外,用于设置Pandas选项的API发生了变化:

In [4]: pd.set_option('display.precision', 2)
In [5]: df.describe()Out[5]:x1       x2       x3       x4       x5       x6       x7count      8.0      8.0      8.0      8.0      8.0      8.0      8.0mean   59832.4  27356.7  49317.3  51214.8  51254.8  41863.0  33950.2std    22600.7  26867.2  28071.7  21012.4  33831.5  38709.5  29075.7min    31906.7   1648.4     56.4  16278.3     43.7   3591.0   1833.525%    45264.6  12799.5  41429.6  40374.3  29789.6  15145.8   6879.550%    56340.2  18666.5  51995.7  54894.6  47667.7  22139.2  33706.075%    75587.0  31375.6  61069.2  67811.9  76014.9  72039.0  51449.9max    98136.5  84544.5  91744.0  75154.6  99012.7  98601.2  83309.1

更新:熊猫0.23.4起

这不是必需的。如果您设置pd.options.display.width = 0,Pandas会自动检测终端窗口的大小。(对于旧版本,请参阅底部。)

pandas.set_printoptions(...)已弃用。相反,请使用pandas.set_option(optname, val)或等效的pd.options.<opt.hierarchical.name> = val。比如:

import pandas as pdpd.set_option('display.max_rows', 500)pd.set_option('display.max_columns', 500)pd.set_option('display.width', 1000)

以下是帮助#0

set_option(pat,value) - Sets the value of the specified option
Available options:display.[chop_threshold, colheader_justify, column_space, date_dayfirst,date_yearfirst, encoding, expand_frame_repr, float_format, height,line_width, max_columns, max_colwidth, max_info_columns, max_info_rows,max_rows, max_seq_items, mpl_style, multi_sparse, notebook_repr_html,pprint_nest_depth, precision, width]mode.[sim_interactive, use_inf_as_null]
Parameters----------pat - str/regexp which should match a single option.
Note: partial matches are supported for convenience, but unless you use thefull option name (e.g., *x.y.z.option_name*), your code may break in futureversions if new options with similar names are introduced.
value - new value of option.
Returns-------None
Raises------KeyError if no such option exists
display.chop_threshold: [default: None] [currently: None]: float or Noneif set to a float value, all float values smaller then the given thresholdwill be displayed as exactly 0 by repr and friends.display.colheader_justify: [default: right] [currently: right]: 'left'/'right'Controls the justification of column headers. used by DataFrameFormatter.display.column_space: [default: 12] [currently: 12]No description available.
display.date_dayfirst: [default: False] [currently: False]: booleanWhen True, prints and parses dates with the day first, eg 20/01/2005display.date_yearfirst: [default: False] [currently: False]: booleanWhen True, prints and parses dates with the year first, e.g., 2005/01/20display.encoding: [default: UTF-8] [currently: UTF-8]: str/unicodeDefaults to the detected encoding of the console.Specifies the encoding to be used for strings returned by to_string,these are generally strings meant to be displayed on the console.display.expand_frame_repr: [default: True] [currently: True]: booleanWhether to print out the full DataFrame repr for wide DataFramesacross multiple lines, `max_columns` is still respected, but the output willwrap-around across multiple "pages" if it's width exceeds `display.width`.display.float_format: [default: None] [currently: None]: callableThe callable should accept a floating point number and returna string with the desired format of the number. This is usedin some places like SeriesFormatter.See core.format.EngFormatter for an example.display.height: [default: 60] [currently: 1000]: intDeprecated.(Deprecated, use `display.height` instead.)
display.line_width: [default: 80] [currently: 1000]: intDeprecated.(Deprecated, use `display.width` instead.)
display.max_columns: [default: 20] [currently: 500]: intmax_rows and max_columns are used in __repr__() methods to decide ifto_string() or info() is used to render an object to a string.  In casepython/IPython is running in a terminal this can be set to 0 and Pandaswill correctly auto-detect the width the terminal and swap to a smallerformat in case all columns would not fit vertically. The IPython notebook,IPython qtconsole, or IDLE do not run in a terminal and hence it is notpossible to do correct auto-detection.'None' value means unlimited.display.max_colwidth: [default: 50] [currently: 50]: intThe maximum width in characters of a column in the repr ofa Pandas data structure. When the column overflows, a "..."placeholder is embedded in the output.display.max_info_columns: [default: 100] [currently: 100]: intmax_info_columns is used in DataFrame.info method to decide ifper column information will be printed.display.max_info_rows: [default: 1690785] [currently: 1690785]: int or Nonemax_info_rows is the maximum number of rows for which a frame willperform a null check on its columns when repr'ing To a console.The default is 1,000,000 rows. So, if a DataFrame has more1,000,000 rows there will be no null check performed on thecolumns and thus the representation will take much less time todisplay in an interactive session. A value of None means alwaysperform a null check when repr'ing.display.max_rows: [default: 60] [currently: 500]: intThis sets the maximum number of rows Pandas should output when printingout various output. For example, this value determines whether the repr()for a dataframe prints out fully or just a summary repr.'None' value means unlimited.display.max_seq_items: [default: None] [currently: None]: int or None
when pretty-printing a long sequence, no more then `max_seq_items`will be printed. If items are ommitted, they will be denoted by the additionof "..." to the resulting string.
If set to None, the number of items to be printed is unlimited.display.mpl_style: [default: None] [currently: None]: bool
Setting this to 'default' will modify the rcParams used by matplotlibto give plots a more pleasing visual style by default.Setting this to None/False restores the values to their initial value.display.multi_sparse: [default: True] [currently: True]: boolean"sparsify" MultiIndex display (don't display repeatedelements in outer levels within groups)display.notebook_repr_html: [default: True] [currently: True]: booleanWhen True, IPython notebook will use html representation forPandas objects (if it is available).display.pprint_nest_depth: [default: 3] [currently: 3]: intControls the number of nested levels to process when pretty-printingdisplay.precision: [default: 7] [currently: 7]: intFloating point output precision (number of significant digits). This isonly a suggestiondisplay.width: [default: 80] [currently: 1000]: intWidth of the display in characters. In case python/IPython is running ina terminal this can be set to None and Pandas will correctly auto-detect thewidth.Note that the IPython notebook, IPython qtconsole, or IDLE do not run in aterminal and hence it is not possible to correctly detect the width.mode.sim_interactive: [default: False] [currently: False]: booleanWhether to simulate interactive mode for purposes of testingmode.use_inf_as_null: [default: False] [currently: False]: booleanTrue means treat None, NaN, INF, -INF as null (old way),False means None and NaN are null, but INF, -INF are not null(new way).Call def:   pd.set_option(self, *args, **kwds)

旧版本信息。其中大部分已被弃用。

作为@bMu提到,熊猫自动检测(默认情况下)显示区域的大小,当对象repr不适合显示时,将使用摘要视图。你提到调整IDLE窗口的大小,没有效果。如果你做print df.describe().to_string(),它适合IDLE窗口吗?

终端大小由pandas.util.terminal.get_terminal_size()(已弃用并删除)确定,这将返回一个包含显示(width, height)的元组。输出是否与IDLE窗口的大小匹配?可能有问题(之前在Emacs中运行终端时存在一个问题)。

请注意,可以绕过自动检测,如果行数、列数不超过给定限制,pandas.set_printoptions(max_rows=200, max_columns=10)将永远不会切换到摘要视图。


“max_colwidth”选项有助于查看每列的未截断形式。

被截断的列显示

试试这个:

pd.set_option('display.expand_frame_repr', False)

从留档:

display.expand_frame_repr:boolean

是否跨多行打印出宽DataFrame的完整DataFrame repr,max_columns仍然受到尊重,但如果宽度超过display.width.,输出将跨多个“页面”环绕[默认值:True][当前:True]

见:pandas.set_option

您可以设置输出显示以匹配当前终端宽度:

pd.set_option('display.width', pd.util.terminal.get_terminal_size()[0])

如果您想临时设置选项以显示一个大型DataFrame,您可以使用option_context

with pd.option_context('display.max_rows', None, 'display.max_columns', None):print (df)

退出with块时,选项值会自动恢复。

使用以下命令设置列的最大宽度:

pd.set_option('max_colwidth', 800)

此特定语句将最大宽度设置为每列800像素。

根据留档v0.18.0,如果您在终端中运行(即,不是IPython笔记本,qt控制台或IDLE),让Pandas自动检测您的屏幕宽度并动态调整显示的列数是两行的:

pd.set_option('display.large_repr', 'truncate')pd.set_option('display.max_columns', 0)

似乎之前所有的答案都解决了这个问题。还有一点:你可以使用(auto-fined-able)而不是pd.set_option('option_name')

pd.options.display.width = None

熊猫留档:选项和设置

选项有一个完整的“点缀样式”,不区分大小写的名称(例如:display.max_rows)。您可以直接获取/设置选项作为顶层options属性:

In [1]: import pandas as pd
In [2]: pd.options.display.max_rowsOut[2]: 15
In [3]: pd.options.display.max_rows = 999
In [4]: pd.options.display.max_rowsOut[4]: 999

[…]

对于max_...参数:

max_rowsmax_columns__repr__()方法中用于决定to_string()还是info()用于将对象呈现为字符串。如果Python/IPython在终端中运行,可以将其设置为0,并且Pandas将正确地自动检测终端的宽度,并在所有列都无法垂直安装的情况下交换为较小的格式。IPython笔记本、IPython qt控制台或IDLE不在终端中运行,因此无法进行正确的自动检测。“#5”值表示无限。[强调不在原文中]

对于width参数:

以字符为单位的显示宽度。如果Python/IPython在终端中运行,可以将其设置为None,Pandas将正确地自动检测宽度。请注意,IPython笔记本、IPython qt控制台或IDLE不在终端中运行,因此无法正确检测宽度。

只有这三条线对我有用:

pd.set_option('display.max_columns', None)pd.set_option('display.expand_frame_repr', False)pd.set_option('max_colwidth', -1)

它适用于蟒蛇、Python 3.6.5、Pandas 0.23.0和Visual Studio Code 1.26。

当数据规模很大时,我使用了这些设置。

# Environment settings:pd.set_option('display.max_column', None)pd.set_option('display.max_rows', None)pd.set_option('display.max_seq_items', None)pd.set_option('display.max_colwidth', 500)pd.set_option('expand_frame_repr', True)

您可以参考留档这里

import pandas as pdpd.set_option('display.max_columns', 100)pd.set_option('display.width', 1000)
SentenceA = "William likes Piano and Piano likes William"SentenceB = "Sara likes Guitar"SentenceC = "Mamoosh likes Piano"SentenceD = "William is a CS Student"SentenceE = "Sara is kind"SentenceF = "Mamoosh is kind"

bowA = SentenceA.split(" ")bowB = SentenceB.split(" ")bowC = SentenceC.split(" ")bowD = SentenceD.split(" ")bowE = SentenceE.split(" ")bowF = SentenceF.split(" ")
# Creating a set consisting of all words
wordSet = set(bowA).union(set(bowB)).union(set(bowC)).union(set(bowD)).union(set(bowE)).union(set(bowF))print("Set of all words is: ", wordSet)
# Initiating dictionary with 0 value for all BOWs
wordDictA = dict.fromkeys(wordSet, 0)wordDictB = dict.fromkeys(wordSet, 0)wordDictC = dict.fromkeys(wordSet, 0)wordDictD = dict.fromkeys(wordSet, 0)wordDictE = dict.fromkeys(wordSet, 0)wordDictF = dict.fromkeys(wordSet, 0)
for word in bowA:wordDictA[word] += 1for word in bowB:wordDictB[word] += 1for word in bowC:wordDictC[word] += 1for word in bowD:wordDictD[word] += 1for word in bowE:wordDictE[word] += 1for word in bowF:wordDictF[word] += 1
# Printing term frequency
print("SentenceA TF: ", wordDictA)print("SentenceB TF: ", wordDictB)print("SentenceC TF: ", wordDictC)print("SentenceD TF: ", wordDictD)print("SentenceE TF: ", wordDictE)print("SentenceF TF: ", wordDictF)
print(pd.DataFrame([wordDictA, wordDictB, wordDictB, wordDictC, wordDictD, wordDictE, wordDictF]))

输出:

   CS  Guitar  Mamoosh  Piano  Sara  Student  William  a  and  is  kind  likes0   0       0        0      2     0        0        2  0    1   0     0      21   0       1        0      0     1        0        0  0    0   0     0      12   0       1        0      0     1        0        0  0    0   0     0      13   0       0        1      1     0        0        0  0    0   0     0      14   1       0        0      0     0        1        1  1    0   1     0      05   0       0        0      0     1        0        0  0    0   1     1      06   0       0        1      0     0        0        0  0    0   1     1      0

如果你不想弄乱你的显示选项,你只是想看到这个特定的列列表,而不扩展你查看的每个数据框,你可以尝试:

df.columns.values

你也可以试试循环:

for col in df.columns:print(col)

下面的行足以显示数据框中的所有列。

pd.set_option('display.max_columns', None)

您可以简单地执行以下步骤,

  • 您可以更改Pandasmax_columns功能的选项,如下所示:

    import pandas as pdpd.options.display.max_columns = 10

    (这允许显示10列,您可以根据需要更改它。)

  • 像这样,您可以更改需要显示的行数,如下所示(如果您也需要更改最大行数):

    pd.options.display.max_rows = 999

    (这允许一次打印999行。

请参考留档更改熊猫的不同选项/设置。

pd.options.display.max_columns = 100

您可以根据max_columns中的要求指定列数。

您可以使用此自定义功能来显示PandasDataframe的内容。

def display_all(df):     # For any Dataframe dfwith pd.option_context('display.max_rows',1000): # Change number of rows accordinglywith pd.option_context('display.max_columns',1000): # Change number of columns accordinglydisplay(df)

display_all(df.head()) # Pass this function to your dataframe and voilà!

您不必为整个笔记本使用pd.set_option,只需用于单个单元格。

这些答案都不适合我。其中一些确实会打印所有列,但看起来很马虎。就像所有信息一样,但格式不正确。我在Neovim中使用终端,所以我怀疑这是原因。

这个迷你函数完全符合我的需要,只需在两个地方更改df_data,它是为您的数据框名称(col_range设置为熊猫自然显示的内容,对我来说它是5,但它可以更大或更小)。

import mathcol_range = 5for _ in range(int(math.ceil(len(df_data.columns)/col_range))):idx1 = _*col_rangeidx2 = idx1+col_rangeprint(df_data.iloc[:, idx1:idx2].describe())

下面将增加打印NumPy数组时的宽度。

它在jupyter笔记本中给出了良好的结果。

import numpy as npnp.set_printoptions(linewidth=160)

严格来说,这不是答案,但让我们记住,我们可以df.describe().transpose(),甚至df.head(n).transpose(),或df.tail(n).transpose()

我还发现,当标题结构化时,将其作为列阅读更容易:

header1_xxx,

header2_xxx,

header3_xxx,

我认为终端和应用程序处理垂直滚动更自然,如果这是必要的转置后。

标头通常大于它们的值,将它们全部放在一个列(索引)中可以最大限度地减少它们对总表宽度的影响。

最后,其他df描述也可以合并,这是一个可能的想法:

def df_overview(df: pd.DataFrame, max_colwidth=25, head=3, tail=3):return(df.describe([0.5]).transpose().merge(df.dtypes.rename('dtypes'), left_index=True, right_index=True).merge(df.head(head).transpose(), left_index=True, right_index=True).merge(df.tail(tail).transpose(), left_index=True, right_index=True).to_string(max_colwidth=max_colwidth, float_format=lambda x: "{:.4G}".format(x)))