Linux 中将多个 JPG 合并为单个 PDF

我使用以下命令将一个目录中的所有 JPG 文件转换并合并为一个 PDF 文件:

convert *.jpg file.pdf

目录中的文件编号从 1.jpg123.jpg。转换进行得很好,但转换后,页面都混在一起了。我希望 PDF 有页面从 1.jpg123.jpg在同样的顺序,因为他们被命名。我也用下面的命令尝试了一下:

cd 1
FILES=$( find . -type f -name "*jpg" | cut -d/ -f 2)
mkdir temp && cd temp
for file in $FILES; do
BASE=$(echo $file | sed 's/.jpg//g');
convert ../$BASE.jpg $BASE.pdf;
done &&
pdftk *pdf cat output ../1.pdf &&
cd ..
rm -rf temp

但还是不走运,操作系统是 Linux。

82839 次浏览

The problem is because your shell is expanding the wildcard in a purely alphabetical order, and because the lengths of the numbers are different, the order will be incorrect:

$ echo *.jpg
1.jpg 10.jpg 100.jpg 101.jpg 102.jpg ...

The solution is to pad the filenames with zeros as required so they're the same length before running your convert command:

$ for i in *.jpg; do num=`expr match "$i" '\([0-9]\+\).*'`;
> padded=`printf "%03d" $num`; mv -v "$i" "${i/$num/$padded}"; done

Now the files will be matched by the wildcard in the correct order, ready for the convert command:

$ echo *.jpg
001.jpg 002.jpg 003.jpg 004.jpg 005.jpg 006.jpg 007.jpg 008.jpg ...

From the manual of ls:

-v natural sort of (version) numbers within text

So, doing what we need in a single command:

convert $(ls -v *.jpg) foobar.pdf

Mind that convert is part of ImageMagick.

This is how I do it:
First line convert all jpg files to pdf it is using convert command.
Second line is merging all pdf files to one single as pdf per page. This is using gs ((PostScript and PDF language interpreter and previewer))

for i in $(find . -maxdepth 1 -name "*.jpg" -print); do convert $i ${i//jpg/pdf}; done
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=merged_file.pdf -dBATCH `find . -maxdepth 1 -name "*.pdf" -print"`

Mixing first idea with their reply, I think this code maybe satisfactory

jpgs2pdf.sh


#!/bin/bash


cd $1
FILES=$( find . -type f -name "*jpg" | cut -d/ -f 2)
mkdir temp > /dev/null
cd temp


for file in $FILES; do
BASE=$(echo $file | sed 's/.jpg//g');
convert ../$BASE.jpg $BASE.pdf;
done &&


pdftk `ls -v *pdf` cat output ../`basename $1`.pdf
cd ..
rm -rf temp

You could use

convert '%d.jpg[1-132]' file.pdf

via https://www.imagemagick.org/script/command-line-processing.php:

Another method of referring to other image files is by embedding a formatting character in the filename with a scene range. Consider the filename image-%d.jpg[1-5]. The command

magick image-%d.jpg[1-5] causes ImageMagick to attempt to read images with these filenames:

image-1.jpg image-2.jpg image-3.jpg image-4.jpg image-5.jpg

See also https://www.imagemagick.org/script/convert.php

All of the above answers failed for me, when I wanted to merge many high-resolution jpeg images (from a scanned book).

Imagemagick tried to load all files into RAM, I therefore used the following two-step approach:

find -iname "*.JPG" | xargs -I'{}' convert {} {}.pdf
pdfunite *.pdf merged_file.pdf

Note that with this approach, you can also use GNU parallel to speed up the conversion:

find -iname "*.JPG" | parallel -I'{}' convert {} {}.pdf

How to create A PDF document from a list of images

Step 1: Install parallel from Repository. This will speed up the process

Step 2: Convert each jpg to pdf file

find -iname "*.JPG" | sort -V | parallel -I'{}' convert -compress jpeg -quality 25 {} {}.pdf

The sort -V will sort the file names in natural order.

Step 3: Merge all PDFs into one

pdfunite $(find -iname '*.pdf' | sort -V) output_document.pdf

Credit Gregor Sturm

Combining Felix Defrance's and Delan Azabani's answer(from above):

convert `for file in $FILES; do echo $file; done` test_2.pdf

https://gitlab.mister-muffin.de/josch/img2pdf

In all of the proposed solutions involving ImageMagick, the JPEG data gets fully decoded and re-encoded. This results in generation loss, as well as performance "ten to hundred" times worse than img2pdf.

img2pdf is also available from many Linux distros, as well as via pip3.