将 Word 文档和 docx 格式转换为.NETCore 中的 PDF，而不需要 Microsoft.Office.Interop

小开

最佳答案

这是如此的痛苦，难怪所有的第三方解决方案都要向每个开发人员收取500美元。

好消息是 OpenXMLSDK 最近增加了对.Net 标准的支持，所以它看起来像你的运气与 .docx格式。

坏消息 此时此刻没有太多的选择为 PDF 生成库。NET 核心。因为你看起来不想付钱，而且你不能合法地使用第三方服务，我们别无选择，只能自己动手。

主要问题是将 Word 文档内容转换为 PDF。一种流行的方法是将 Docx 读入 HTML 并导出为 PDF。虽然很难找到，但还是有的。支持将 Docx 转换为 HTML 的 OpenXMLSDK-PowerTools的 Net Core 版本。拉请求是“即将被接受”，你可以从这里得到它:

Https://github.com/officedev/open-xml-powertools/tree/abfbaac510d0d60e2f492503c60ef897247716cf

现在我们可以将文档内容提取为 HTML，我们需要将其转换为 PDF。有一些库可以将 HTML 转换为 PDF，例如 DinkToPdf是 Webkit HTML 到 PDF 库 libwkhtmltox 的跨平台包装器。

我以为 DinkToPdf 比 https://code.msdn.microsoft.com/How-to-export-HTML-to-PDF-c5afd0ce好

Docx 到 HTML

让我们把这些都放在一起，下载 OpenXMLSDK-PowerTools。Net Core 项目并构建它(只是 OpenXMLPowerTools。Core 和 OpenXMLPowerTools。核心。示例-忽略其他项目)。

设置 OpenXMLPowerTools。核心。例如启动项目。向项目中添加一个 Word 文档(例如 test.docx) ，并设置这个 docx 文件的属性 Copy To Output = If Newer

运行控制台项目:

static void Main(string[] args)
{
var source = Package.Open(@"test.docx");
var document = WordprocessingDocument.Open(source);
HtmlConverterSettings settings = new HtmlConverterSettings();
XElement html = HtmlConverter.ConvertToHtml(document, settings);


Console.WriteLine(html.ToString());
var writer = File.CreateText("test.html");
writer.WriteLine(html.ToString());
writer.Dispose();
Console.ReadLine();

确保 test.docx 是一个带有文本的有效 word 文档，否则您可能会得到一个错误:

指定的包无效。主要部分丢失

如果你运行这个项目，你会看到 HTML 看起来和 Word 文档中的内容几乎一模一样:

然而，如果你尝试使用带有图片或链接的 Word 文档，你会发现它们丢失或损坏了。

这篇 CodeProject 文章讨论了这些问题: https://www.codeproject.com/Articles/1162184/Csharp-Docx-to-HTML-to-Docx

我必须更改 static Uri FixUri(string brokenUri)方法以返回 Uri并添加用户友好的错误消息。

static void Main(string[] args)
{
var fileInfo = new FileInfo(@"c:\temp\MyDocWithImages.docx");
string fullFilePath = fileInfo.FullName;
string htmlText = string.Empty;
try
{
htmlText = ParseDOCX(fileInfo);
}
catch (OpenXmlPackageException e)
{
if (e.ToString().Contains("Invalid Hyperlink"))
{
using (FileStream fs = new FileStream(fullFilePath,FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
UriFixer.FixInvalidUri(fs, brokenUri => FixUri(brokenUri));
}
htmlText = ParseDOCX(fileInfo);
}
}


var writer = File.CreateText("test1.html");
writer.WriteLine(htmlText.ToString());
writer.Dispose();
}
        

public static Uri FixUri(string brokenUri)
{
string newURI = string.Empty;
if (brokenUri.Contains("mailto:"))
{
int mailToCount = "mailto:".Length;
brokenUri = brokenUri.Remove(0, mailToCount);
newURI = brokenUri;
}
else
{
newURI = " ";
}
return new Uri(newURI);
}


public static string ParseDOCX(FileInfo fileInfo)
{
try
{
byte[] byteArray = File.ReadAllBytes(fileInfo.FullName);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument wDoc =
WordprocessingDocument.Open(memoryStream, true))
{
int imageCounter = 0;
var pageTitle = fileInfo.FullName;
var part = wDoc.CoreFilePropertiesPart;
if (part != null)
pageTitle = (string)part.GetXDocument()
.Descendants(DC.title)
.FirstOrDefault() ?? fileInfo.FullName;


WmlToHtmlConverterSettings settings = new WmlToHtmlConverterSettings()
{
AdditionalCss = "body { margin: 1cm auto; max-width: 20cm; padding: 0; }",
PageTitle = pageTitle,
FabricateCssClasses = true,
CssClassPrefix = "pt-",
RestrictToSupportedLanguages = false,
RestrictToSupportedNumberingFormats = false,
ImageHandler = imageInfo =>
{
++imageCounter;
string extension = imageInfo.ContentType.Split('/')[1].ToLower();
ImageFormat imageFormat = null;
if (extension == "png") imageFormat = ImageFormat.Png;
else if (extension == "gif") imageFormat = ImageFormat.Gif;
else if (extension == "bmp") imageFormat = ImageFormat.Bmp;
else if (extension == "jpeg") imageFormat = ImageFormat.Jpeg;
else if (extension == "tiff")
{
extension = "gif";
imageFormat = ImageFormat.Gif;
}
else if (extension == "x-wmf")
{
extension = "wmf";
imageFormat = ImageFormat.Wmf;
}


if (imageFormat == null) return null;


string base64 = null;
try
{
using (MemoryStream ms = new MemoryStream())
{
imageInfo.Bitmap.Save(ms, imageFormat);
var ba = ms.ToArray();
base64 = System.Convert.ToBase64String(ba);
}
}
catch (System.Runtime.InteropServices.ExternalException)
{ return null; }


ImageFormat format = imageInfo.Bitmap.RawFormat;
ImageCodecInfo codec = ImageCodecInfo.GetImageDecoders()
.First(c => c.FormatID == format.Guid);
string mimeType = codec.MimeType;


string imageSource =
string.Format("data:{0};base64,{1}", mimeType, base64);


XElement img = new XElement(Xhtml.img,
new XAttribute(NoNamespace.src, imageSource),
imageInfo.ImgStyleAttribute,
imageInfo.AltText != null ?
new XAttribute(NoNamespace.alt, imageInfo.AltText) : null);
return img;
}
};


XElement htmlElement = WmlToHtmlConverter.ConvertToHtml(wDoc, settings);
var html = new XDocument(new XDocumentType("html", null, null, null),
htmlElement);
var htmlString = html.ToString(SaveOptions.DisableFormatting);
return htmlString;
}
}
}
catch
{
return "The file is either open, please close it or contains corrupt data";
}
}

您可能需要 System.Drawing.Common NuGet 包来使用 ImageFormat

现在我们可以得到图像:

如果您只想显示 Word。在 web 浏览器中，docx 文件最好不要将 HTML 转换为 PDF，因为这样会显著增加带宽。您可以使用 VPP 技术将 HTML 存储在文件系统、云或数据库中。

HTML 转 PDF

接下来我们需要做的是将 HTML 传递给 DinkToPdf。下载 DinkToPdf (90MB)解决方案。构建解决方案——恢复所有包和编译解决方案都需要一段时间。

重要提示:

如果希望在 Linux 和 Windows 上运行，DinkToPdf 库需要项目根目录中的 libwkhtmltox.so 和 libwkhtmltox.dll 文件。如果您需要，还有一个用于 Mac 的 libwhtmltox.dylib 文件。

这些 DLL 位于 v0.12.4文件夹中。根据您的电脑，32或64位，复制3个文件到 DinkToPdf-master DinkToPfd。TestConsole App bin Debug netcoreapp1.1文件夹。

重点2:

确保在 Docker 映像或 Linux 机器上安装了 libgdiplus。所以库依赖于它。

设置 DinkToPfd。TestConsole App 作为 StartUp 项目，并更改 Program.cs 文件以从用 Open-Xml-PowerTools 保存的 HTML 文件中读取 htmlContent，而不是使用 Lorium Ipsom 文本。

var doc = new HtmlToPdfDocument()
{
GlobalSettings = {
ColorMode = ColorMode.Color,
Orientation = Orientation.Landscape,
PaperSize = PaperKind.A4,
},
Objects = {
new ObjectSettings() {
PagesCount = true,
HtmlContent = File.ReadAllText(@"C:\TFS\Sandbox\Open-Xml-PowerTools-abfbaac510d0d60e2f492503c60ef897247716cf\ToolsTest\test1.html"),
WebSettings = { DefaultEncoding = "utf-8" },
HeaderSettings = { FontSize = 9, Right = "Page [page] of [toPage]", Line = true },
FooterSettings = { FontSize = 9, Right = "Page [page] of [toPage]" }
}
}
};

Docx VS PDF 的结果令人印象深刻，我怀疑很多人会挑出许多不同之处(特别是如果他们从未看过原版) :

P.我知道你想把 .doc和 .docx都转换成 PDF 格式。我建议你自己动手改变信仰。使用特定的非服务器 Windows/Microsoft 技术从 doc 到 docx。Doc 格式是二进制的，不适用于办公室服务器端自动化。

使用 EXE 和命令行:

您可以在这里使用 wkhtmltopdf.exe 进行纯粹的转换: Https://wkhtmltopdf.org/libwkhtmltox/

小开

使用 LibreOffice 二进制文件

LibreOffice 项目是微软 Office 的一个跨平台开源替代品。我们可以使用它的功能将 doc和 docx文件导出到 PDF。目前，LibreOffice 没有针对以下领域的官方 API。NET，因此，我们将直接与 soffice二进制文件对话。

这是一种“粗制滥造”的解决方案，但我认为这种解决方案的缺陷较少，并且维护成本也是可能的。这种方法的另一个优点是不限于从 doc和 docx进行转换: 您可以从所有支持 LibreOffice 的格式(例如 odt、 html、电子表格等)进行转换。

执行

我编写了一个简单的使用 soffice二进制的 c#程序。这只是一个概念验证(也是我在 c#中的第一个程序)。它支持开箱即用的 Windows和只有在安装了 LibreOffice 包的情况下才支持 Linux。

这里是 main.cs:

using System;
using System.Collections.Generic;
using System.Text;
using System.Diagnostics;
using System.Reflection;


namespace DocToPdf
{
public class LibreOfficeFailedException : Exception
{
public LibreOfficeFailedException(int exitCode)
: base(string.Format("LibreOffice has failed with {}", exitCode))
{}
}


class Program
{
static string getLibreOfficePath() {
switch (Environment.OSVersion.Platform) {
case PlatformID.Unix:
return "/usr/bin/soffice";
case PlatformID.Win32NT:
string binaryDirectory = System.IO.Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location);
return binaryDirectory + "\\Windows\\program\\soffice.exe";
default:
throw new PlatformNotSupportedException ("Your OS is not supported");
}
}


static void Main(string[] args) {
string libreOfficePath = getLibreOfficePath();


// FIXME: file name escaping: I have not idea how to do it in .NET.
ProcessStartInfo procStartInfo = new ProcessStartInfo(libreOfficePath, string.Format("--convert-to pdf --nologo {0}", args[0]));
procStartInfo.RedirectStandardOutput = true;
procStartInfo.UseShellExecute = false;
procStartInfo.CreateNoWindow = true;
procStartInfo.WorkingDirectory = Environment.CurrentDirectory;


Process process = new Process() { StartInfo =      procStartInfo, };
process.Start();
process.WaitForExit();


// Check for failed exit code.
if (process.ExitCode != 0) {
throw new LibreOfficeFailedException(process.ExitCode);
}
}
}
}

资源

项目存储库 : 包含 Windows LibreOffice 二进制文件的包示例。

结果

我在 Arch Linux 上测试过，用 mono编译的。我使用 mon 和 Linux 二进制运行它，并使用 wine: 使用 Windows 二进制。

您可以在测试目录中找到结果:

输入文件: Testdoc.doc，Docx

产出:

酒: Testdoc，Testdocx。
单声道: Testdoc，Testdocx。

小开

我最近用 FreeSpire 博士做了这个。它的免费版本最多只有3页，但它可以很容易地将 docx 文件转换成 PDF 文件，使用类似下面的东西:

private void ConvertToPdf()
{
try
{
for (int i = 0; i < listOfDocx.Count; i++)
{
CurrentModalText = "Converting To PDF";
CurrentLoadingNum += 1;


string savePath = PdfTempStorage + i + ".pdf";
listOfPDF.Add(savePath);


Spire.Doc.Document document = new Spire.Doc.Document(listOfDocx[i], FileFormat.Auto);
document.SaveToFile(savePath, FileFormat.PDF);
}
}
catch (Exception e)
{
throw e;
}
}

然后我用 ITextSharp.pdf把这些 PDF 文件缝在一起:

public static byte[] concatAndAddContent(List<byte[]> pdfByteContent, List<MailComm> localList)
{
using (var ms = new MemoryStream())
{
using (var doc = new Document())
{
using (var copy = new PdfSmartCopy(doc, ms))
{
doc.Open();
// add checklist at the start
using (var db = new StudyContext())
{
var contentId = localList[0].ContentID;
var temp = db.MailContentTypes.Where(x => x.ContentId == contentId).ToList();
if (!temp[0].Code.Equals("LAB"))
{
pdfByteContent.Insert(0, CheckListCreation.createCheckBox(localList));
}
}


// Loop through each byte array
foreach (var p in pdfByteContent)
{
// Create a PdfReader bound to that byte array
using (var reader = new PdfReader(p))
{
// Add the entire document instead of page-by-page
copy.AddDocument(reader);
}
}


doc.Close();
}
}


// Return just before disposing
return ms.ToArray();
}
}

我不知道这是否适合您的用例，因为您还没有指定要编写的文档的大小，但是如果它们 < 3页或者您可以操作它们少于3页，它将允许您将它们转换为 PDF。

正如在下面的评论中提到的，它也不能帮助 RTL 语言，谢谢@Aria 指出这一点。

小开

对不起，我没有足够的名声来评论，但是我想对杰里米 · 汤普森的回答提出我的意见。希望这能帮到别人。

当我在浏览 Jeremy Thompson 的答案时，在下载了 OpenXMLSDK-PowerTools并运行了 OpenXMLPowerTools.Core.Example之后，我得到了类似于

the specified package is invalid. the main part is missing

在前线

var document = WordprocessingDocument.Open(source);

在挣扎了几个小时之后，我发现复制到 bin 文件的 test.docx只有1kb。要解决这个问题，右键单击 test.docx > Properties，将 Copy to Output Directory设置为 Copy always可以解决这个问题。

希望这对像我这样的新手有所帮助:)

小开

为了将 DOCX 转换成 PDF 甚至占位符，我已经创建了一个免费的 “报告-从 DocX-HTML-To-PDF-Converter”库。NET 核心下的 麻省理工学院执照，因为我是如此紧张，没有简单的解决方案存在，所有的商业解决方案是超级昂贵。你可以在这里找到它与一个广泛的描述和示例项目:

Https://github.com/smartinmedia/net-core-docx-html-to-pdf-converter

你只需要免费的 LibreOffice。我建议使用 LibreOffice 便携版，这样它就不会改变服务器设置中的任何内容。看一下文件“ soffice.exe”(在 Linux 上它的名称不同)的位置，因为您需要它来填充变量“ locationOfLibreOfficeOffice”。

下面是从 DOCX 到 HTML 的转换过程:

string locationOfLibreOfficeSoffice =   @"C:\PortableApps\LibreOfficePortable\App\libreoffice\program\soffice.exe";


var docxLocation = "MyWordDocument.docx";


var rep = new ReportGenerator(locationOfLibreOfficeSoffice);


//Convert from DOCX to PDF
test.Convert(docxLocation, Path.Combine(Path.GetDirectoryName(docxLocation), "Test-Template-out.pdf"));




//Convert from DOCX to HTML
test.Convert(docxLocation, Path.Combine(Path.GetDirectoryName(docxLocation), "Test-Template-out.html"));

如您所见，您还可以将 DOCX 转换为 HTML。此外，您可以将占位符放入 Word 文档中，然后可以用值“填充”该文档。但是，这不在您的问题范围之内，但是您可以在 Github (README)上阅读相关内容。

小开

这是对杰里米 · 汤普森非常有帮助的回答的补充。除了 word document body 之外，我还希望将 word document 的页眉(和页脚)转换为 HTML。我不想修改 Open-Xml-PowerTools，所以我从 Jeremy 的示例中修改了 Main ()和 ParseDOCX () ，并添加了两个新函数。ParseDOCX 现在接受一个字节数组，因此原始的 WordDocx 不会被修改。

static void Main(string[] args)
{
var fileInfo = new FileInfo(@"c:\temp\MyDocWithImages.docx");
byte[] fileBytes = File.ReadAllBytes(fileInfo.FullName);
string htmlText = string.Empty;
string htmlHeader = string.Empty;
try
{
htmlText = ParseDOCX(fileBytes, fileInfo.Name, false);
htmlHeader = ParseDOCX(fileBytes, fileInfo.Name, true);
}
catch (OpenXmlPackageException e)
{
if (e.ToString().Contains("Invalid Hyperlink"))
{
using (FileStream fs = new FileStream(fullFilePath, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
UriFixer.FixInvalidUri(fs, brokenUri => FixUri(brokenUri));
}
htmlText = ParseDOCX(fileBytes, fileInfo.Name, false);
htmlHeader = ParseDOCX(fileBytes, fileInfo.Name, true);
}
}


var writer = File.CreateText("test1.html");
writer.WriteLine(htmlText.ToString());
writer.Dispose();
var writer2 = File.CreateText("header1.html");
writer2.WriteLine(htmlHeader.ToString());
writer2.Dispose();
}


private static string ParseDOCX(byte[] fileBytes, string filename, bool headerOnly)
{
try
{
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(fileBytes, 0, fileBytes.Length);
using (WordprocessingDocument wDoc = WordprocessingDocument.Open(memoryStream, true))
{
int imageCounter = 0;
var pageTitle = filename;
var part = wDoc.CoreFilePropertiesPart;
if (part != null)
{
pageTitle = (string)part.GetXDocument()
.Descendants(DC.title)
.FirstOrDefault() ?? filename;
}


WmlToHtmlConverterSettings settings = new WmlToHtmlConverterSettings()
{
AdditionalCss = "body { margin: 1cm auto; max-width: 20cm; padding: 0; }",
PageTitle = pageTitle,
FabricateCssClasses = true,
CssClassPrefix = "pt-",
RestrictToSupportedLanguages = false,
RestrictToSupportedNumberingFormats = false,
ImageHandler = imageInfo =>
{
++imageCounter;
string extension = imageInfo.ContentType.Split('/')[1].ToLower();
ImageFormat imageFormat = null;
if (extension == "png") imageFormat = ImageFormat.Png;
else if (extension == "gif") imageFormat = ImageFormat.Gif;
else if (extension == "bmp") imageFormat = ImageFormat.Bmp;
else if (extension == "jpeg") imageFormat = ImageFormat.Jpeg;
else if (extension == "tiff")
{
extension = "gif";
imageFormat = ImageFormat.Gif;
}
else if (extension == "x-wmf")
{
extension = "wmf";
imageFormat = ImageFormat.Wmf;
}


if (imageFormat == null) return null;


string base64 = null;
try
{
using (MemoryStream ms = new MemoryStream())
{
imageInfo.Bitmap.Save(ms, imageFormat);
var ba = ms.ToArray();
base64 = System.Convert.ToBase64String(ba);
}
}
catch (System.Runtime.InteropServices.ExternalException)
{ return null; }


ImageFormat format = imageInfo.Bitmap.RawFormat;
ImageCodecInfo codec = ImageCodecInfo.GetImageDecoders()
.First(c => c.FormatID == format.Guid);
string mimeType = codec.MimeType;


string imageSource =
string.Format("data:{0};base64,{1}", mimeType, base64);


XElement img = new XElement(Xhtml.img,
new XAttribute(NoNamespace.src, imageSource),
imageInfo.ImgStyleAttribute,
imageInfo.AltText != null ?
new XAttribute(NoNamespace.alt, imageInfo.AltText) : null);
return img;
}
};


// Put header into document body, and remove everything else
if (headerOnly)
{
MoveHeaderToDocumentBody(wDoc);
}


XElement htmlElement = WmlToHtmlConverter.ConvertToHtml(wDoc, settings);
var html = new XDocument(new XDocumentType("html", null, null, null),
htmlElement);
var htmlString = html.ToString(SaveOptions.DisableFormatting);
return htmlString;
}
}
}
catch
{
return "The file is either open, please close it or contains corrupt data";
}
}


private static void MoveHeaderToDocumentBody(WordprocessingDocument wDoc)
{
MainDocumentPart mainDocument = wDoc.MainDocumentPart;
XElement docRoot = mainDocument.GetXDocument().Root;
XElement body = docRoot.Descendants(W.body).First();
// Only handles first header. Header info: https://learn.microsoft.com/en-us/office/open-xml/how-to-replace-the-header-in-a-word-processing-document
HeaderPart header = mainDocument.HeaderParts.FirstOrDefault();
XElement headerRoot = header.GetXDocument().Root;


AddXElementToBody(headerRoot, body);


// document body will have new headers when we return from this function
return;
}


private static void AddXElementToBody(XElement sourceElement, XElement body)
{
// Clone the children nodes
List<XElement> children = sourceElement.Elements().ToList();
List<XElement> childClones = children.Select(el => new XElement(el)).ToList();


// Clone the section properties nodes
List<XElement> sections = body.Descendants(W.sectPr).ToList();
List<XElement> sectionsClones = sections.Select(el => new XElement(el)).ToList();


// clear body
body.Descendants().Remove();


// add source elements to body
foreach (var child in childClones)
{
body.Add(child);
}


// add section properties to body
foreach (var section in sectionsClones)
{
body.Add(section);
}


// get text from alternate content if needed - either choice or fallback node
XElement alternate = body.Descendants(MC.AlternateContent).FirstOrDefault();
if (alternate != null)
{
var choice = alternate.Descendants(MC.Choice).FirstOrDefault();
var fallback = alternate.Descendants(MC.Fallback).FirstOrDefault();
if (choice != null)
{
var choiceChildren = choice.Elements();
foreach(var choiceChild in choiceChildren)
{
body.Add(choiceChild);
}
}
else if (fallback != null)
{
var fallbackChildren = fallback.Elements();
foreach (var fallbackChild in fallbackChildren)
{
body.Add(fallbackChild);
}
}
}
}

您可以添加类似的方法来处理 Word 文档页脚。

在我的示例中，然后将 HTML 文件转换为图像(使用 Net-Core-Html-To-Image，也基于 wkHtmlToX)。我将头部图像和身体图像组合在一起(使用 NET-Q16-AnyCPU) ，将头部图像放置在身体图像的顶部。

小开

如果您有访问365办公室的权限，可以实现另一种解决方案。这比我之前的回答有更少的限制，但是需要购买。

我得到一个图形 API 令牌、我想要使用的站点和我想要使用的驱动器。

然后我获取 docx 的字节数组

    public static async Task<Stream> GetByteArrayOfDocumentAsync(string baseFilePathLocation)
{
var byteArray = File.ReadAllBytes(baseFilePathLocation);
using var stream = new MemoryStream();
stream.Write(byteArray, 0, (int) byteArray.Length);


return stream;
}

然后，使用客户端设置和我们的图形 api 令牌，将该流上传到图形 api

        public static async Task<string> UploadFileAsync(HttpClient client,
string siteId,
MemoryStream stream,
string driveId,
string fileName,
string folderName = "root")
{


var result = await client.PutAsync(
$"https://graph.microsoft.com/v1.0/sites/{siteId}/drives/{driveId}/items/{folderName}:/{fileName}:/content",
new ByteArrayContent(stream.ToArray()));
var res = JsonSerializer.Deserialize<SharepointDocument>(await result.Content.ReadAsStringAsync());
return res.id;
}

然后，我们从 api 图形下载使用给定的 api 获得 PDF 通过

        public static async Task<Stream> GetPdfOfDocumentAsync(HttpClient client,
string siteId,
string driveId,
string documentId)
{




var getRequest =
await client.GetAsync(
$"https://graph.microsoft.com/v1.0/sites/{siteId}/drives/{driveId}/items/{documentId}/content?format=pdf");
return await getRequest.Content.ReadAsStreamAsync();


}

这给出了一个由刚刚创建的文档组成的流。

小开

如果你使用容器化解决方案(Docker)没有问题，那里有一个非常好的项目:

高登堡计划

Https://gotenberg.dev/

我之前试过了。它已经使用了 LibreOffice for docx to pdf，但它还有更多的功能。而且它是一个无状态的修改过的 API，可以自给自足。