使用 python 将多页 pdf 文件分割为多个 pdf 文件?

我想采取一个多页的 pdf 文件,并创建每页单独的 pdf 文件。

我已经下载了 报告实验室并且浏览了文档,但是它似乎是针对 pdf 生成的。我还没有看到任何关于处理 PDF 文件本身的内容。

有没有一种简单的方法可以在 python 中实现这一点?

109754 次浏览
from PyPDF2 import PdfFileWriter, PdfFileReader


inputpdf = PdfFileReader(open("document.pdf", "rb"))


for i in range(inputpdf.numPages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
with open("document-page%s.pdf" % i, "wb") as outputStream:
output.write(outputStream)

etc.

I missed here a solution where you split the PDF to two parts consisting of all pages so I append my solution if somebody was looking for the same:

from PyPDF2 import PdfFileWriter, PdfFileReader


def split_pdf_to_two(filename,page_number):
pdf_reader = PdfFileReader(open(filename, "rb"))
try:
assert page_number < pdf_reader.numPages
pdf_writer1 = PdfFileWriter()
pdf_writer2 = PdfFileWriter()


for page in range(page_number):
pdf_writer1.addPage(pdf_reader.getPage(page))


for page in range(page_number,pdf_reader.getNumPages()):
pdf_writer2.addPage(pdf_reader.getPage(page))


with open("part1.pdf", 'wb') as file1:
pdf_writer1.write(file1)


with open("part2.pdf", 'wb') as file2:
pdf_writer2.write(file2)


except AssertionError as e:
print("Error: The PDF you are cutting has less pages than you want to cut!")

I know that the code is not related to python, however i felt like posting this piece of R code which is simple, flexible and works amazingly. The PDFtools package in R is amazing in splitting merging PDFs at ease.

library(pdftools) #Rpackage
pdf_subset('D:\\file\\20.02.20\\22 GT 2017.pdf',
pages = 1:51, output = "subset.pdf")

The PyPDF2 package gives you the ability to split up a single PDF into multiple ones.

import os
from PyPDF2 import PdfFileReader, PdfFileWriter


pdf = PdfFileReader(path)
for page in range(pdf.getNumPages()):
pdf_writer = PdfFileWriter()
pdf_writer.addPage(pdf.getPage(page))


output_filename = '{}_page_{}.pdf'.format(fname, page+1)


with open(output_filename, 'wb') as out:
pdf_writer.write(out)


print('Created: {}'.format(output_filename))

Source: https://www.blog.pythonlibrary.org/2018/04/11/splitting-and-merging-pdfs-with-python/