如何保存一个新的工作表在现有的 Excel 文件,使用熊猫?

我想使用 excel 文件来存储用 python 进行详细说明的数据。我的问题是我不能将工作表添加到现有的 Excel 文件。在这里,我建议使用一个示例代码来解决这个问题

import pandas as pd
import numpy as np


path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"


x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)


x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)


writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df1.to_excel(writer, sheet_name = 'x1')
df2.to_excel(writer, sheet_name = 'x2')
writer.save()
writer.close()

这段代码将两个 DataFrame 保存到两个工作表中,分别命名为“ x1”和“ x2”。如果我创建两个新的 DataFrames 并尝试使用相同的代码添加两个新的工作表,‘ x3’和‘ x4’,则原始数据将丢失。

import pandas as pd
import numpy as np


path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"


x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)


x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)


writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df3.to_excel(writer, sheet_name = 'x3')
df4.to_excel(writer, sheet_name = 'x4')
writer.save()
writer.close()

我想要一个 Excel 文件有四个表: ‘ x1’,‘ x2’,‘ x3’,‘ x4’。 我知道“ xlsxwriter”不是唯一的“引擎”,还有“ openpyxl”。我也看到已经有其他人写了这个问题,但我仍然不明白如何做到这一点。

这是从 链接中提取的代码

import pandas
from openpyxl import load_workbook


book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)


data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])


writer.save()

他们说这种方法有效,但是很难弄清楚它是如何起作用的。我不明白“ ws.title”、“ ws”和“ dict”在这个上下文中是什么意思。

保存“ x1”和“ x2”,然后关闭文件,再次打开并添加“ x3”和“ x4”的最佳方法是什么?

261374 次浏览

在您分享的示例中,您正在将现有文件加载到 book并将 writer.book值设置为 book。在 writer.sheets = dict((ws.title, ws) for ws in book.worksheets)行中,您以 ws的形式访问工作簿中的每个工作表。工作表标题然后是 ws,所以您正在创建一个字典的 {sheet_titles: sheet}键,值对。然后,将该字典设置为 writer.sheet。本质上,这些步骤只是从 'Masterfile.xlsx'加载现有数据并用它们填充编写器。

现在假设您已经有了一个以表格形式显示 x1x2的文件。您可以使用示例代码来加载文件,然后可以执行类似的操作来添加 x3x4

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"
writer = pd.ExcelWriter(path, engine='openpyxl')
df3.to_excel(writer, 'x3', index=False)
df4.to_excel(writer, 'x4', index=False)
writer.save()

这应该能达到你的要求。

我强烈建议您直接使用 Openpyxl,因为它现在支持熊猫数据框架

这使您可以专注于相关的 Excel 和 Panda 代码。

谢谢,我相信一个完整的例子对任何有同样问题的人来说都是好的:

import pandas as pd
import numpy as np


path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"


x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)


x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)


writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df1.to_excel(writer, sheet_name = 'x1')
df2.to_excel(writer, sheet_name = 'x2')
writer.close()

在这里,我生成了一个 excel 文件,根据我的理解,它是通过“ xslxwriter”还是通过“ openpyxl”引擎生成的并不重要。

当我想写不丢失原始数据的时候

import pandas as pd
import numpy as np
from openpyxl import load_workbook


path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"


book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book


x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)


x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)


df3.to_excel(writer, sheet_name = 'x3')
df4.to_excel(writer, sheet_name = 'x4')
writer.close()

这个代码可以完成任务!

一个简单的示例,用于一次写入多个数据以使其出类拔萃。也可以在写入的 Excel 文件(已关闭的 Excel 文件)上向工作表追加数据。

当你第一次写 Excel 的时候(写“ df1”和“ df2”到“1st _ sheet”和“2nd _ sheet”)

import pandas as pd
from openpyxl import load_workbook


df1 = pd.DataFrame([[1],[1]], columns=['a'])
df2 = pd.DataFrame([[2],[2]], columns=['b'])
df3 = pd.DataFrame([[3],[3]], columns=['c'])


excel_dir = "my/excel/dir"


with pd.ExcelWriter(excel_dir, engine='xlsxwriter') as writer:
df1.to_excel(writer, '1st_sheet')
df2.to_excel(writer, '2nd_sheet')
writer.save()

在您关闭您的 Excel 之后,但是您希望将数据“附加”到同一个 Excel 文件但是另一个工作表上,让我们将“ df3”改为工作表名称“3rd _ sheet”。

book = load_workbook(excel_dir)
with pd.ExcelWriter(excel_dir, engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)


## Your dataframe to append.
df3.to_excel(writer, '3rd_sheet')


writer.save()

请注意,Excel 格式不能是 xls,可以使用 xlsx one。

你可以将你感兴趣的现有表格,例如‘ x1’,‘ x2’,读入内存,并在添加更多表格之前将它们“写回”(请记住,文件中的表格和内存中的表格是两种不同的东西,如果你不读它们,它们就会丢失)。这种方法只使用‘ xlsxwriter’,不涉及 openpyxl。

import pandas as pd
import numpy as np


path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"


# begin <== read selected sheets and write them back
df1 = pd.read_excel(path, sheet_name='x1', index_col=0) # or sheet_name=0
df2 = pd.read_excel(path, sheet_name='x2', index_col=0) # or sheet_name=1
writer = pd.ExcelWriter(path, engine='xlsxwriter')
df1.to_excel(writer, sheet_name='x1')
df2.to_excel(writer, sheet_name='x2')
# end ==>


# now create more new sheets
x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)


x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)


df3.to_excel(writer, sheet_name='x3')
df4.to_excel(writer, sheet_name='x4')
writer.save()
writer.close()

如果希望保留所有现有工作表,可以将上面的代码在开始和结束之间替换为:

# read all existing sheets and write them back
writer = pd.ExcelWriter(path, engine='xlsxwriter')
xlsx = pd.ExcelFile(path)
for sheet in xlsx.sheet_names:
df = xlsx.parse(sheet_name=sheet, index_col=0)
df.to_excel(writer, sheet_name=sheet)
#This program is to read from excel workbook to fetch only the URL domain names and write to the existing excel workbook in a different sheet..
#Developer - Nilesh K
import pandas as pd
from openpyxl import load_workbook #for writting to the existing workbook


df = pd.read_excel("urlsearch_test.xlsx")


#You can use the below for the relative path.
# r"C:\Users\xyz\Desktop\Python\


l = [] #To make a list in for loop


#begin
#loop starts here for fetching http from a string and iterate thru the entire sheet. You can have your own logic here.
for index, row in df.iterrows():
try:
str = (row['TEXT']) #string to read and iterate
y = (index)
str_pos = str.index('http') #fetched the index position for http
str_pos1 = str.index('/', str.index('/')+2) #fetched the second 3rd position of / starting from http
str_op = str[str_pos:str_pos1] #Substring the domain name
l.append(str_op) #append the list with domain names


#Error handling to skip the error rows and continue.
except ValueError:
print('Error!')
print(l)
l = list(dict.fromkeys(l)) #Keep distinct values, you can comment this line to get all the values
df1 = pd.DataFrame(l,columns=['URL']) #Create dataframe using the list
#end


#Write using openpyxl so it can be written to same workbook
book = load_workbook('urlsearch_test.xlsx')
writer = pd.ExcelWriter('urlsearch_test.xlsx',engine = 'openpyxl')
writer.book = book
df1.to_excel(writer,sheet_name = 'Sheet3')
writer.save()
writer.close()


#The below can be used to write to a different workbook without using openpyxl
#df1.to_excel(r"C:\Users\xyz\Desktop\Python\urlsearch1_test.xlsx",index='false',sheet_name='sheet1')

另一个相当简单的方法是创建一个类似下面这样的方法:

def _write_frame_to_new_sheet(path_to_file=None, sheet_name='sheet', data_frame=None):
book = None
try:
book = load_workbook(path_to_file)
except Exception:
logging.debug('Creating new workbook at %s', path_to_file)
with pd.ExcelWriter(path_to_file, engine='openpyxl') as writer:
if book is not None:
writer.book = book
data_frame.to_excel(writer, sheet_name, index=False)

这里的想法是在 Path _ to _ file加载工作簿(如果它存在的话) ,然后用 Sheet _ nameData _ frame作为一个新的工作表追加。如果工作簿不存在,则创建它。看起来 OpenpyxlXlsxwriter都没有附加,所以在上面@Stefano 的示例中,您确实需要加载然后重写才能附加。

无需使用 ExcelWriter,使用 openpyxl 中的工具即可完成 这使得使用 openpyxl.styles向新工作表添加字体变得更加容易

import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows


#Location of original excel sheet
fileLocation =r'C:\workspace\data.xlsx'


#Location of new file which can be the same as original file
writeLocation=r'C:\workspace\dataNew.xlsx'


data = {'Name':['Tom','Paul','Jeremy'],'Age':[32,43,34],'Salary':[20000,34000,32000]}


#The dataframe you want to add
df = pd.DataFrame(data)


#Load existing sheet as it is
book = load_workbook(fileLocation)
#create a new sheet
sheet = book.create_sheet("Sheet Name")


#Load dataframe into new sheet
for row in dataframe_to_rows(df, index=False, header=True):
sheet.append(row)


#Save the modified excel at desired location
book.save(writeLocation)

用于创建新文件

x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)
with pd.ExcelWriter('sample.xlsx') as writer:
df1.to_excel(writer, sheet_name='x1')

对于附加到文件,使用 pd.ExcelWriter中的参数 mode='a'

x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)
with pd.ExcelWriter('sample.xlsx', engine='openpyxl', mode='a') as writer:
df2.to_excel(writer, sheet_name='x2')

默认值是 mode ='w'。 参见 文件

import pandas as pd
import openpyxl


writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
data_df.to_excel(writer, 'sheet_name')
writer.save()
writer.close()

每当你想把熊猫数据框保存到 Excel 中时,你可以调用这个函数:

import os


def save_excel_sheet(df, filepath, sheetname, index=False):
# Create file if it does not exist
if not os.path.exists(filepath):
df.to_excel(filepath, sheet_name=sheetname, index=index)


# Otherwise, add a sheet. Overwrite if there exists one with the same name.
else:
with pd.ExcelWriter(filepath, engine='openpyxl', if_sheet_exists='replace', mode='a') as writer:
df.to_excel(writer, sheet_name=sheetname, index=index)

如果要添加空白工作表

xw = pd.ExcelWriter(file_path, engine='xlsxwriter')
pd.DataFrame().to_excel(xw, 'sheet11')

如果你得到的是空白的表格

sheet = xw.sheets['sheet11']

下面的解决方案对我很有效:

    # dataframe to save
df = pd.DataFrame({"A":[1,2], "B":[3,4]})
    

# path where you want to save
path = "./..../..../.../test.xlsx"
    

# if an excel sheet named `test` is already present append on sheet 2
if os.path.isfile(path):
with pd.ExcelWriter(path, mode='a') as writer:
df.to_excel(writer, sheet_name= "sheet_2")
else:
# if not present then write to a excel file on sheet 1
with pd.ExcelWriter(path) as writer:
df.to_excel(writer, sheet_name= "sheet_1")

现在,如果您想在不同的工作表上编写多个数据帧,只需添加一个循环并继续更改 sheet_name