如何写入现有的 Excel 文件而不覆盖数据(使用熊猫) ?

我用熊猫来写 Excel 文件,方式如下:

import pandas


writer = pandas.ExcelWriter('Masterfile.xlsx')


data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])


writer.save()

Xlsx 已经包含了许多不同的选项卡,但是它还没有包含“ Main”。

熊猫正确地写到“主”表,不幸的是,它也删除所有其他标签。

299097 次浏览

熊猫文档说它使用 openpyxl 来处理 xlsx 文件。快速浏览一下 ExcelWriter中的代码,你会发现类似的情况可能会发生:

import pandas
from openpyxl import load_workbook


book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl')
writer.book = book


## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.


writer.sheets = dict((ws.title, ws) for ws in book.worksheets)


data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])


writer.save()
def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
try:
master_book = load_workbook(master_file_path)
master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
master_writer.book = master_book
master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
header=None,
index_col=None)
current_frames.to_excel(master_writer, sheet_name, index=None, header=False)


master_writer.save()
except Exception as e:
raise e

这个工作非常好,唯一的问题是主文件(我们添加新工作表的文件)的格式丢失了。

有了 openpyxlversion 2.4.0pandasversion 0.19.2,这个过程@ski 就变得更简单了:

import pandas
from openpyxl import load_workbook


with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
writer.book = load_workbook('Masterfile.xlsx')
data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!

老问题了,但我猜有些人还在找这个,所以..。

我发现这个方法很不错,因为所有的工作表都被加载到一个工作表名称和数据框对的字典中,该字典由使用 sheetname = Nothing 选项的熊猫创建。在将电子表格读取为 dict 格式和从 dict 中将其写回之间,添加、删除或修改工作表非常简单。对于我来说,xlsxwriter 在速度和格式方面比 openpyxl 更适合这个特定的任务。

注意: 熊猫的未来版本(0.21.0 +)将把“ sheetname”参数更改为“ sheet _ name”。

# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
sheetname=None)


# all worksheets are accessible as dataframes.


# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']


# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df


# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe


# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
engine='xlsxwriter',
datetime_format='yyyy-mm-dd',
date_format='yyyy-mm-dd') as writer:


for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name)

以2013年的问题为例:

ws_dict = pd.read_excel('Masterfile.xlsx',
sheetname=None)


ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]


with pd.ExcelWriter('Masterfile.xlsx',
engine='xlsxwriter') as writer:


for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name)

我知道这是一个较老的线程,但这是您在搜索时找到的第一个项目,如果您需要在已创建的工作簿中保留图表,则上述解决方案不起作用。在这种情况下,xlwing 是一个更好的选择——它允许您写入 Excel 手册并保存图表/图表数据。

简单的例子:

import xlwings as xw
import pandas as pd


#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5


#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')


ws = wb.sheets['chartData']


ws.range('A1').options(index=False).value = df


wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')


xw.apps[0].quit()
writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)

“ keep _ date _ coll”希望对您有所帮助

更新: 从 Pandas 1.3.0开始,下面的函数将不能正常工作,因为函数 DataFrame.to_excel()pd.ExcelWriter()已经更改-引入了一个新的 if_sheet_exists参数,这使得下面的函数无效。

在这里 你可以找到一个 更新版本的 append_df_to_excel(),这是熊猫1.3.0 + 工作。


下面是一个 helper 函数:

import os
from openpyxl import load_workbook




def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.


@param filename: File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
@param df: DataFrame to save to workbook
@param sheet_name: Name of sheet which will contain DataFrame.
(default: 'Sheet1')
@param startrow: upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
@param truncate_sheet: truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
@param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
[can be a dictionary]
@return: None


Usage examples:


>>> append_df_to_excel('d:/temp/test.xlsx', df)


>>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)


>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False)


>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False, startrow=25)


(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
"""
# Excel file doesn't exist - saving and exiting
if not os.path.isfile(filename):
df.to_excel(
filename,
sheet_name=sheet_name,
startrow=startrow if startrow is not None else 0,
**to_excel_kwargs)
return
    

# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')


writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')


# try to open an existing workbook
writer.book = load_workbook(filename)
    

# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row


# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
    

# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}


if startrow is None:
startrow = 0


# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)


# save the workbook
writer.save()

使用以下版本进行测试:

  • 熊猫1.2.3
  • Openpyxl 3.0.5
book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()

从熊猫0.24开始,你可以用 ExcelWritermode关键字参数来简化:

import pandas as pd


with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer:
data_filtered.to_excel(writer)

熊猫有一个更好的解决方案0.24:

with pd.ExcelWriter(path, mode='a') as writer:
s.to_excel(writer, sheet_name='another sheet', index=False)

以前:

enter image description here

之后:

enter image description here

所以现在就升级你的熊猫吧:

pip install --upgrade pandas

方法:

  • 如果不存在,可以创建文件
  • 按工作表名称附加到现有 Excel
import pandas as pd
from openpyxl import load_workbook


def write_to_excel(df, file):
try:
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, **kwds)
writer.save()
except FileNotFoundError as e:
df.to_excel(file, **kwds)

用法:

df_a = pd.DataFrame(range(10), columns=["a"])
df_b = pd.DataFrame(range(10, 20), columns=["b"])
write_to_excel(df_a, "test.xlsx", sheet_name="Sheet a", columns=['a'], index=False)
write_to_excel(df_b, "test.xlsx", sheet_name="Sheet b", columns=['b'])

由@MaxU 提供的解决方案非常有效,我只有一个建议:

如果指定了 truncate _ sheet = True,则不应从现有工作表中保留“ startrow”。我建议:

        if startrow is None and sheet_name in writer.book.sheetnames:
if not truncate_sheet: # truncate_sheet would use startrow if provided (or zero below)
startrow = writer.book[sheet_name].max_row


我用的答案是 给你

from openpyxl import load_workbook
writer = pd.ExcelWriter(p_file_name, engine='openpyxl', mode='a')
writer.book = load_workbook(p_file_name)
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
df.to_excel(writer, 'Data', startrow=10, startcol=20)
writer.save()

@ MaxU 的解决方案不适用于 python 和相关包的更新版本,它引发了错误: “ zipfile. BadZipFile: File is not a zip File”

我生成了一个新版本的函数,它可以很好地使用 Python 和相关软件包的更新版本 ,并用 python: 3.9 | openpyxl: 3.0.6 | anda: 1.2.3进行了测试

此外,我还为 helper 函数添加了更多功能:

  1. 现在它根据单元格内容宽度调整所有列的大小,所有变量都将可见(请参阅“ resizeColumns”)
  2. 您可以处理 NaN,如果您希望将 NaN 显示为 NaN 或空单元格(请参见“ na _ rep”)
  3. 添加了“ startcoll”,您可以决定从特定列开始写入,否则将从 col= 0开始

功能如下:

import pandas as pd


def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.


Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file


resizeColumns: default = True . It resize all columns based on cell content width
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''




Returns: None


*******************


CONTRIBUTION:
Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)




Features of the new helper function:
1) Now it works with python 3.9 and latest versions of pandas and openpxl
---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
3) You can handle NaN,  if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0


*******************






"""
from openpyxl import load_workbook
from string import ascii_uppercase
from openpyxl.utils import get_column_letter
from openpyxl import Workbook


# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')


try:
f = open(filename)
# Do something with the file
except IOError:
# print("File not accessible")
wb = Workbook()
ws = wb.active
ws.title = sheet_name
wb.save(filename)


writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')




# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError




try:
# try to open an existing workbook
writer.book = load_workbook(filename)


# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row


# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)


# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass


if startrow is None:
# startrow = -1
startrow = 0


if startcol is None:
startcol = 0


# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)




if resizeColumns:


ws = writer.book[sheet_name]


def auto_format_cell_width(ws):
for letter in range(1,ws.max_column):
maximum_value = 0
for cell in ws[get_column_letter(letter)]:
val_to_check = len(str(cell.value))
if val_to_check > maximum_value:
maximum_value = val_to_check
ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2


auto_format_cell_width(ws)


# save the workbook
writer.save()

示例用法:

# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
'colors': ['red', 'white', 'blue'],
'colorsTwo': ['yellow', 'white', 'blue'],
'NaNcheck': [float('NaN'), 1, float('NaN')],
})


# EDIT YOUR PATH FOR THE EXPORT
filename = r"C:\DataScience\df.xlsx"


# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCELFILE
  

append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN

我推荐使用 xlwing (https://docs.xlwings.org/en/stable/api.html) ,它对于这个应用程序来说真的很强大... ... 我是这样使用它的:

import xlwings as xw
import pandas as pd
import xlsxwriter


# function to get the active workbook
def getActiveWorkbook():
try:
# logic from xlwings to grab the current excel file
activeWb = xw.books.active
except:
# print error message if unable to get the current workbook
print('Unable to grab the current Workbook')
pause()
exitProgram()
else:
return activeWb


# function that returns the last row number and last cell of a sheet
def getLastRow(myBook, sheetName):
lastRow = myBook.sheets[sheetName].range("A1").current_region.last_cell.row
lastCol = str(xlsxwriter.utility.xl_col_to_name(myBook.sheets[sheetName].range("A1").current_region.last_cell.column))
return str(lastRow), lastCol + str(lastRow)


activeWb = getActiveWorkbook()
df = pd.DataFrame(data=[1,2,3])


# look at worksheet = Part Number Status
sheetName = "Sheet1"
ws = activeWb.sheets[sheetName]
lastRow, lastCell = getLastRow(activeWb, sheetName)
if int(lastRow) > 1:
ws.range("A1:" + lastCell).clear()
ws.range("A1").options(index=False, header=False).value = df.fillna('')

这似乎对我的应用程序非常有效,因为。Xlsm 工作簿可能非常棘手。您可以将其作为一个 python 脚本执行,或者将其转换为并使用 pyinstaller 执行,然后运行。通过 Excel 宏执行。您还可以使用 xlwing 从 Python 调用 VBA 宏,这非常有用。