有没有一种方法可以用熊猫自动调整 Excel 列的宽度?

我被要求生成一些 Excel 报告。我目前正在使用熊猫相当大量的数据,所以自然我想使用 pandas.ExcelWriter方法来生成这些报告。但是,固定的列宽是个问题。

到目前为止,我的代码非常简单。假设我有一个名为 df的数据框架:

writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
df.to_excel(writer, sheet_name="Summary")

我查看了熊猫文档,没有看到任何设置列宽的选项。是否有一种技巧可以使列自动调整到数据?或者我可以在事后对 xlsx 文件做些什么来调整列宽?

(我正在使用 OpenPyXL 库,并生成. xlsx 文件——如果这有什么不同的话。)

158879 次浏览

现在可能还没有自动的方法来做这件事,但是当你使用 openpyxl 时,下面一行(改编自 手动怎么办上用户 胡说给出的另一个答案)允许你指定一个合理的值(以字符宽度为单位) :

writer.sheets['Summary'].column_dimensions['A'].width = 15

我最近开始使用一个很好的包 StyleFrame。

它获得 DataFrame,并让您可以非常容易地对其进行样式化... ..。

默认情况下,列宽是自动调整的。

例如:

from StyleFrame import StyleFrame
import pandas as pd


df = pd.DataFrame({'aaaaaaaaaaa': [1, 2, 3],
'bbbbbbbbb': [1, 1, 1],
'ccccccccccc': [2, 3, 4]})
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(excel_writer=excel_writer, row_to_add_filters=0,
columns_and_rows_to_freeze='B2')
excel_writer.save()

也可以更改列的宽度:

sf.set_column_width(columns=['aaaaaaaaaaa', 'bbbbbbbbb'],
width=35.3)

更新1

在版本1.4中,best_fit参数被添加到 StyleFrame.to_excel。 看看 文件

更新2

下面是一个适用于 StyleFrame 3.x. x 的代码示例

from styleframe import StyleFrame
import pandas as pd


columns = ['aaaaaaaaaaa', 'bbbbbbbbb', 'ccccccccccc', ]
df = pd.DataFrame(data={
'aaaaaaaaaaa': [1, 2, 3, ],
'bbbbbbbbb': [1, 1, 1, ],
'ccccccccccc': [2, 3, 4, ],
}, columns=columns,
)
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(
excel_writer=excel_writer,
best_fit=columns,
columns_and_rows_to_freeze='B2',
row_to_add_filters=0,
)
excel_writer.save()

我发布这篇文章是因为我碰到了同样的问题,发现 Xlsxwriter 和熊猫的官方文档仍然将这个功能列为不受支持。我找到了解决问题的办法。我基本上只是遍历每一列并使用 worksheet.set _ column 来设置列宽 = = 该列内容的最大长度。

不过,有一点很重要。此解决方案不适合列标题,只适合列值。如果你需要适应标题,这应该是一个容易的改变。希望这对某些人有所帮助:)

import pandas as pd
import sqlalchemy as sa
import urllib




read_server = 'serverName'
read_database = 'databaseName'


read_params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+read_server+";DATABASE="+read_database+";TRUSTED_CONNECTION=Yes")
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)


#Output some SQL Server data into a dataframe
my_sql_query = """ SELECT * FROM dbo.my_table """
my_dataframe = pd.read_sql_query(my_sql_query,con=read_engine)


#Set destination directory to save excel.
xlsFilepath = r'H:\my_project' + "\\" + 'my_file_name.xlsx'
writer = pd.ExcelWriter(xlsFilepath, engine='xlsxwriter')


#Write excel to file using pandas to_excel
my_dataframe.to_excel(writer, startrow = 1, sheet_name='Sheet1', index=False)


#Indicate workbook and worksheet for formatting
workbook = writer.book
worksheet = writer.sheets['Sheet1']


#Iterate through each column and set the width == the max length in that column. A padding length of 2 is also added.
for i, col in enumerate(my_dataframe.columns):
# find length of column i
column_len = my_dataframe[col].astype(str).str.len().max()
# Setting the length if the column header is larger
# than the max column value length
column_len = max(column_len, len(col)) + 2
# set the column length
worksheet.set_column(i, i, column_len)
writer.save()

User6178746的答案的启发,我有以下几点:

# Given a dict of dataframes, for example:
# dfs = {'gadgets': df_gadgets, 'widgets': df_widgets}


writer = pd.ExcelWriter(filename, engine='xlsxwriter')
for sheetname, df in dfs.items():  # loop through `dict` of dataframes
df.to_excel(writer, sheet_name=sheetname)  # send df to writer
worksheet = writer.sheets[sheetname]  # pull worksheet object
for idx, col in enumerate(df):  # loop through all columns
series = df[col]
max_len = max((
series.astype(str).map(len).max(),  # len of largest item
len(str(series.name))  # len of column name/header
)) + 1  # adding a little extra space
worksheet.set_column(idx, idx, max_len)  # set column width
writer.save()

最简单的解决方案是在 set _ column 方法中指定列的宽度。

    for worksheet in writer.sheets.values():
worksheet.set_column(0,last_column_value, required_width_constant)

将其他答案和评论结合起来,同时支持多种指数:

def autosize_excel_columns(worksheet, df):
autosize_excel_columns_df(worksheet, df.index.to_frame())
autosize_excel_columns_df(worksheet, df, offset=df.index.nlevels)


def autosize_excel_columns_df(worksheet, df, offset=0):
for idx, col in enumerate(df):
series = df[col]
max_len = max((
series.astype(str).map(len).max(),
len(str(series.name))
)) + 1
worksheet.set_column(idx+offset, idx+offset, max_len)


sheetname=...
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
worksheet = writer.sheets[sheetname]
autosize_excel_columns(worksheet, df)
writer.save()
import re
import openpyxl
..
for col in _ws.columns:
max_lenght = 0
print(col[0])
col_name = re.findall('\w\d', str(col[0]))
col_name = col_name[0]
col_name = re.findall('\w', str(col_name))[0]
print(col_name)
for cell in col:
try:
if len(str(cell.value)) > max_lenght:
max_lenght = len(cell.value)
except:
pass
adjusted_width = (max_lenght+2)
_ws.column_dimensions[col_name].width = adjusted_width

通过使用 Pandas 和 xlsxwriter,您可以完成您的任务,下面的代码将在 Python 3.x 中完美工作。有关使用 XlsxWriter 处理熊猫的更多详细信息,这个链接可能是有用的 https://xlsxwriter.readthedocs.io/working_with_pandas.html

import pandas as pd
writer = pd.ExcelWriter(excel_file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name="Summary")
workbook = writer.book
worksheet = writer.sheets["Summary"]
#set the column width as per your requirement
worksheet.set_column('A:A', 25)
writer.save()

我发现根据列标题而不是列内容来调整列更有用。

使用 df.columns.values.tolist(),我生成一个列标题的列表,并使用这些标题的长度来确定列的宽度。

见下面的完整代码:

import pandas as pd
import xlsxwriter


writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name=sheetname)


workbook = writer.book # Access the workbook
worksheet= writer.sheets[sheetname] # Access the Worksheet


header_list = df.columns.values.tolist() # Generate list of headers
for i in range(0, len(header_list)):
worksheet.set_column(i, i, len(header_list[i])) # Set column widths based on len(header)


writer.save() # Save the excel file

在工作中,我总是编写数据框架来 Excel 文件。因此,我没有一遍又一遍地编写相同的代码,而是创建了一个模。现在我只需要导入它并使用它来编写和格式化 excel 文件。但是有一个缺点,如果数据帧过大,则需要很长时间。 这是密码:

def result_to_excel(output_name, dataframes_list, sheet_names_list, output_dir):
out_path = os.path.join(output_dir, output_name)
writerReport = pd.ExcelWriter(out_path, engine='xlsxwriter',
datetime_format='yyyymmdd', date_format='yyyymmdd')
workbook = writerReport.book
# loop through the list of dataframes to save every dataframe into a new sheet in the excel file
for i, dataframe in enumerate(dataframes_list):
sheet_name = sheet_names_list[i]  # choose the sheet name from sheet_names_list
dataframe.to_excel(writerReport, sheet_name=sheet_name, index=False, startrow=0)
# Add a header format.
format = workbook.add_format({
'bold': True,
'border': 1,
'fg_color': '#0000FF',
'font_color': 'white'})
# Write the column headers with the defined format.
worksheet = writerReport.sheets[sheet_name]
for col_num, col_name in enumerate(dataframe.columns.values):
worksheet.write(0, col_num, col_name, format)
worksheet.autofilter(0, 0, 0, len(dataframe.columns) - 1)
worksheet.freeze_panes(1, 0)
# loop through the columns in the dataframe to get the width of the column
for j, col in enumerate(dataframe.columns):
max_width = max([len(str(s)) for s in dataframe[col].values] + [len(col) + 2])
# define a max width to not get to wide column
if max_width > 50:
max_width = 50
worksheet.set_column(j, j, max_width)
writerReport.save()
return output_dir + output_name


动态调整所有列的长度

writer = pd.ExcelWriter('/path/to/output/file.xlsx')
df.to_excel(writer, sheet_name='sheetName', index=False, na_rep='NaN')


for column in df:
column_length = max(df[column].astype(str).map(len).max(), len(column))
col_idx = df.columns.get_loc(column)
writer.sheets['sheetName'].set_column(col_idx, col_idx, column_length)


writer.save()

使用列名手动调整列

col_idx = df.columns.get_loc('columnName')
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

使用列索引手动调整列

writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

如果以上任何一种方法都失败了

AttributeError: 'Worksheet' object has no attribute 'set_column'

确保安装 xlsxwriter:

pip install xlsxwriter

要获得更全面的解释,您可以阅读关于 TDS 的 如何使用熊猫 ExcelWriter 自动调整 Excel 列的宽度文章。

您可以通过调用以下函数来解决这个问题,其中 df 是希望获得大小的数据框,而 sheetname 是 Excel 中希望进行修改的工作表

def auto_width_columns(df, sheetname):
workbook = writer.book
worksheet= writer.sheets[sheetname]
    

for i, col in enumerate(df.columns):
column_len = max(df[col].astype(str).str.len().max(), len(col) + 2)
worksheet.set_column(i, i, column_len)

是的,随后可以对 xlsx 文件执行一些操作来调整列宽。 对 自动改装列使用 xlwing。这是一个非常简单的解决方案,请参阅示例代码的最后6行。这个过程的优点是,你不必担心字体大小,字体类型或其他任何东西。 要求: Excel 安装。

import pandas as pd
import xlwings as xw


path = r"test.xlsx"


# Export your dataframe in question.
df = pd._testing.makeDataFrame()
df.to_excel(path)


# Autofit all columns with xlwings.
with xw.App(visible=False) as app:
wb = xw.Book(path)


for ws in wb.sheets:
ws.autofit(axis="columns")


wb.save(path)
wb.close()

这个函数对我很有用,还可以修改索引宽度

def write_to_excel(writer, X, sheet_name, sep_only=False):
#writer=writer object
#X=dataframe
#sheet_name=name of sheet
#sep_only=True:write only as separate excel file, False: write as sheet to the writer object
if sheet_name=="":
print("specify sheet_name!")
else:
X.to_excel(f"{output_folder}{prefix_excel_save}_{sheet_name}.xlsx")
if not sep_only:
X.to_excel(writer, sheet_name=sheet_name)
            

#fix column widths
worksheet = writer.sheets[sheet_name]  # pull worksheet object
for idx, col in enumerate(X.columns):  # loop through all columns
series = X[col]
max_len = max((
series.astype(str).map(len).max(),  # len of largest item
len(str(series.name))  # len of column name/header
)) + 1  # adding a little extra space
worksheet.set_column(idx+1, idx+1, max_len)  # set column width (=1 because index = 1)
                

#fix index width
max_len=pd.Series(X.index.values).astype(str).map(len).max()+1
worksheet.set_column(0, 0, max_len)
            

if sep_only:
print(f'{sheet_name} is written as seperate file')
else:
print(f'{sheet_name} is written as seperate file')
print(f'{sheet_name} is written as sheet')
return writer

电话例子:

writer = write_to_excel(writer, dataframe, "Statistical_Analysis")

我可能会有点晚,但这个代码的工作时,使用 「 openpyxl 」作为你的引擎,有时 pip install xlsxwriter不会解决这个问题。下面的代码非常有效。你可以随意编辑任何部分。

def text_length(text):
"""
Get the effective text length in characters, taking into account newlines
"""
if not text:
return 0
lines = text.split("\n")
return max(len(line) for line in lines)


def _to_str_for_length(v, decimals=3):
"""
Like str() but rounds decimals to predefined length
"""
if isinstance(v, float):
# Round to [decimal] places
return str(Decimal(v).quantize(Decimal('1.' + '0' * decimals)).normalize())
else:
return str(v)




def auto_adjust_xlsx_column_width(df, writer, sheet_name, margin=3, length_factor=1.0, decimals=3, index=False):


sheet = writer.sheets[sheet_name]
_to_str = functools.partial(_to_str_for_length, decimals=decimals)
# Compute & set column width for each column
for column_name in df.columns:
# Convert the value of the columns to string and select the
column_length =  max(df[column_name].apply(_to_str).map(text_length).max(), text_length(column_name)) + 5
# Get index of column in XLSX
# Column index is +1 if we also export the index column
col_idx = df.columns.get_loc(column_name)
if index:
col_idx += 1
# Set width of column to (column_length + margin)
sheet.column_dimensions[openpyxl.utils.cell.get_column_letter(col_idx + 1)].width = column_length * length_factor + margin
# Compute column width of index column (if enabled)
if index: # If the index column is being exported
index_length =  max(df.index.map(_to_str).map(text_length).max(), text_length(df.index.name))
sheet.column_dimensions["A"].width = index_length * length_factor + margin

基于@alichaudry 代码的 openpyxl版本。
代码1)加载一个 Excel 文件,2)调整列宽,3)保存它。

def auto_adjust_column_widths(excel_file : "Excel File Path", extra_space = 1) -> None:
"""
Adjusts column widths of the excel file and replaces it with the adjusted one.
Adjusting columns is based on the lengths of columns values (including column names).
Parameters
----------
excel_file :
excel_file to adjust column widths.
    

extra_space :
extra column width in addition to the value-based-widths
"""


from openpyxl import load_workbook
from openpyxl.utils import get_column_letter




wb = load_workbook(excel_file)


    

for ws in wb:
df = pd.DataFrame(ws.values,)


        

for i,r in (df.astype(str).applymap(len).max(axis=0) + extra_space).iteritems():
ws.column_dimensions[get_column_letter(i+1)].width = r


    

wb.save(excel_file)