将 Excel 工作簿中的所有工作表读入具有 data.frame 的 R 列表中

我知道 XLConnect可以用来将 Excel 工作表读入 R。例如,这可以将工作簿中的第一个工作表 test.xls读入 R。

library(XLConnect)
readWorksheetFromFile('test.xls', sheet = 1)

我有一个包含多个工作表的 Excel 工作簿。

如何将工作簿中的所有工作表导入到 R 中的列表中,其中列表中的每个元素都是给定工作表的 data.frame,并且每个元素的名称与 Excel 中的工作表的名称相对应?

167516 次浏览

You can load the work book and then use lapply, getSheets and readWorksheet and do something like this.

wb.mtcars <- loadWorkbook(system.file("demoFiles/mtcars.xlsx",
package = "XLConnect"))
sheet_names <- getSheets(wb.mtcars)
names(sheet_names) <- sheet_names


sheet_list <- lapply(sheet_names, function(.sheet){
readWorksheet(object=wb.mtcars, .sheet)})

Updated answer using readxl (22nd June 2015)

Since posting this question the readxl package has been released. It supports both xls and xlsx format. Importantly, in contrast to other excel import packages, it works on Windows, Mac, and Linux without requiring installation of additional software.

So a function for importing all sheets in an Excel workbook would be:

library(readxl)
read_excel_allsheets <- function(filename, tibble = FALSE) {
# I prefer straight data.frames
# but if you like tidyverse tibbles (the default with read_excel)
# then just pass tibble = TRUE
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
if(!tibble) x <- lapply(x, as.data.frame)
names(x) <- sheets
x
}

This could be called with:

mysheets <- read_excel_allsheets("foo.xls")

Old Answer

Building on the answer provided by @mnel, here is a simple function that takes an Excel file as an argument and returns each sheet as a data.frame in a named list.

library(XLConnect)


importWorksheets <- function(filename) {
# filename: name of Excel file
workbook <- loadWorkbook(filename)
sheet_names <- getSheets(workbook)
names(sheet_names) <- sheet_names
sheet_list <- lapply(sheet_names, function(.sheet){
readWorksheet(object=workbook, .sheet)})
}

Thus, it could be called with:

importWorksheets('test.xls')

Note that most of XLConnect's functions are already vectorized. This means that you can read in all worksheets with one function call without having to do explicit vectorization:

require(XLConnect)
wb <- loadWorkbook(system.file("demoFiles/mtcars.xlsx", package = "XLConnect"))
lst = readWorksheet(wb, sheet = getSheets(wb))

With XLConnect 0.2-0 lst will already be a named list.

excel.link will do the job.

I actually found it easier to use compared to XLConnect (not that either package is that difficult to use). Learning curve for both was about 5 minutes.

As an aside, you can easily find all R packages that mention the word "Excel" by browsing to http://cran.r-project.org/web/packages/available_packages_by_name.html

I tried the above and had issues with the amount of data that my 20MB Excel I needed to convert consisted of; therefore the above did not work for me.

After more research I stumbled upon openxlsx and this one finally did the trick (and fast) Importing a big xlsx file into R?

https://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf

Since this is the number one hit to the question: Read multi sheet excel to list:

here is the openxlsx solution:

filename <-"myFilePath"


sheets <- openxlsx::getSheetNames(filename)
SheetList <- lapply(sheets,openxlsx::read.xlsx,xlsxFile=filename)
names(SheetList) <- sheets

I stumbled across this old question and I think the easiest approach is still missing.

You can use rio to import all excel sheets with just one line of code.

library(rio)
data_list <- import_list("test.xls")

If you're a fan of the tidyverse, you can easily import them as tibbles by adding the setclass argument to the function call.

data_list <- import_list("test.xls", setclass = "tbl")

Suppose they have the same format, you could easily row bind them by setting the rbind argument to TRUE.

data_list <- import_list("test.xls", setclass = "tbl", rbind = TRUE)

From official readxl (tidyverse) documentation (changing first line):

path <- "data/datasets.xlsx"


path %>%
excel_sheets() %>%
set_names() %>%
map(read_excel, path = path)

Details at: http://readxl.tidyverse.org/articles/articles/readxl-workflows.html#iterate-over-multiple-worksheets-in-a-workbook

To read multiple sheets from a workbook, use readxl package as follows:

library(readxl)
library(dplyr)


final_dataFrame <- bind_rows(path_to_workbook %>%
excel_sheets() %>%
set_names() %>%
map(read_excel, path = path_to_workbook))

Here, bind_rows (dplyr) will put all data rows from all sheets into one data frame, and path_to_workbook is the location of your data: "dir/of/the/data/workbook".

Adding to Paul's answer. The sheets can also be concatenated using something like this:

data = path %>%
excel_sheets() %>%
set_names() %>%
map_df(~ read_excel(path = path, sheet = .x), .id = "Sheet")

Libraries needed:

if(!require(pacman))install.packages("pacman")
pacman::p_load("tidyverse","readxl","purrr")

Just for simplifying the very useful response of @Jeromy Anglim:

allsheets <- sapply(readxl::excel_sheets("your_file.xlsx"), simplify = F, USE.NAMES = T,
function(X) readxl::read_excel("your_file.xlsx", sheet = X))