潘多克和外国人物

我一直在尝试使用 Pandoc 将一些 Markdown 转换成 PDF 文件。这是一个 Pandoc 不会为我转换的例子:

# Header!


## Sub Header


themselves derived respectively from the Greek ἀναρχία i.e. 'anarchy'

这是我从维基百科数据库里找到的。潘多克一点也不喜欢这样。这是它给我的错误消息:

pandoc: Error producing PDF from TeX source.
! Package inputenc Error: Unicode char \u8:ἀ not set up for use with LaTeX.


See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
...


l.53 ...es derived respectively from the Greek ἀ

有没有一个指令开关,我可以给它来解决这个问题?我试着按照建议去做这样的事情,但是失败了:

iconv -t utf-8 test.md | pandoc -o test.pdf

在遵循约翰下面的建议之前,看这个

更新2 这个命令最终让它工作了起来,希望这个命令能帮到某些人:

pandoc test2.md -o test2.pdf --latex-engine=xelatex --template=my.latex --variable mainfont="DejaVu Serif" --variable sansfont=Arial

这是 my.latex的内容:

\documentclass[$if(fontsize)$$fontsize$,$endif$$if(lang)$$lang$,$endif$$if(papersize)$$papersize$,$endif$]{$documentclass$}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
% use microtype if available
\IfFileExists{microtype.sty}{\usepackage{microtype}}{}
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[utf]{inputenc}
\usepackage{ucs}
$if(euro)$
\usepackage{eurosym}
$endif$
\else % if luatex or xelatex
\usepackage{fontspec}
\ifxetex
\usepackage{xltxtra,xunicode}
\fi
\defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
\setromanfont{TeX Gyre Pagella}
\newcommand{\euro}{€}
$if(mainfont)$
\setmainfont{$mainfont$}
$endif$
$if(sansfont)$
\setsansfont{$sansfont$}
$endif$
$if(monofont)$
\setmonofont{$monofont$}
$endif$
$if(mathfont)$
\setmathfont{$mathfont$}
$endif$
\fi
$if(geometry)$
\usepackage[$for(geometry)$$geometry$$sep$,$endfor$]{geometry}
$endif$
$if(natbib)$
\usepackage{natbib}
\bibliographystyle{plainnat}
$endif$
$if(biblatex)$
\usepackage{biblatex}
$if(biblio-files)$
\bibliography{$biblio-files$}
$endif$
$endif$
$if(listings)$
\usepackage{listings}
$endif$
$if(lhs)$
\lstnewenvironment{code}{\lstset{language=Haskell,basicstyle=\small\ttfamily}}{}
$endif$
$if(highlighting-macros)$
$highlighting-macros$
$endif$
$if(verbatim-in-note)$
\usepackage{fancyvrb}
$endif$
$if(tables)$
\usepackage{longtable}
$endif$
$if(graphics)$
\usepackage{graphicx}
% We will generate all images so they have a width \maxwidth. This means
% that they will get their normal width if they fit onto the page, but
% are scaled down if they would overflow the margins.
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth
\else\Gin@nat@width\fi}
\makeatother
\let\Oldincludegraphics\includegraphics
\renewcommand{\includegraphics}[1]{\Oldincludegraphics[width=\maxwidth]{#1}}
$endif$
\ifxetex
\usepackage[setpagesize=false, % page size defined by xetex
unicode=false, % unicode breaks when used with xetex
xetex]{hyperref}
\else
\usepackage[unicode=true]{hyperref}
\fi
\hypersetup{breaklinks=true,
bookmarks=true,
pdfauthor={$author-meta$},
pdftitle={$title-meta$},
colorlinks=true,
urlcolor=$if(urlcolor)$$urlcolor$$else$blue$endif$,
linkcolor=$if(linkcolor)$$linkcolor$$else$magenta$endif$,
pdfborder={0 0 0}}
\urlstyle{same}  % don't use monospace font for urls
$if(links-as-notes)$
% Make links footnotes instead of hotlinks:
\renewcommand{\href}[2]{#2\footnote{\url{#1}}}
$endif$
$if(strikeout)$
\usepackage[normalem]{ulem}
% avoid problems with \sout in headers with hyperref:
\pdfstringdefDisableCommands{\renewcommand{\sout}{}}
$endif$
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
\setlength{\emergencystretch}{3em}  % prevent overfull lines
$if(numbersections)$
$else$
\setcounter{secnumdepth}{0}
$endif$
$if(verbatim-in-note)$
\VerbatimFootnotes % allows verbatim text in footnotes
$endif$
$if(lang)$
\ifxetex
\usepackage{polyglossia}
\setmainlanguage{$mainlang$}
\else
\usepackage[$lang$]{babel}
\fi
$endif$
$for(header-includes)$
$header-includes$
$endfor$


$if(title)$
\title{$title$}
$endif$
\author{$for(author)$$author$$sep$ \and $endfor$}
\date{$date$}


\begin{document}
$if(title)$
\maketitle
$endif$


$for(include-before)$
$include-before$


$endfor$
$if(toc)$
{
\hypersetup{linkcolor=black}
\setcounter{tocdepth}{$toc-depth$}
\tableofcontents
}
$endif$
$body$


$if(natbib)$
$if(biblio-files)$
$if(biblio-title)$
$if(book-class)$
\renewcommand\bibname{$biblio-title$}
$else$
\renewcommand\refname{$biblio-title$}
$endif$
$endif$
\bibliography{$biblio-files$}


$endif$
$endif$
$if(biblatex)$
\printbibliography$if(biblio-title)$[title=$biblio-title$]$endif$


$endif$
$for(include-after)$
$include-after$


$endfor$
\end{document}
34051 次浏览

Use the --pdf-engine=xelatex option.

If you are using LaTeX intermediate output, then you can use inline \mbox{t\'ext} to get accented characters. Without the \mbox{}, the backslash often isn't interpreted correctly by the Pandoc parser.

By default, Pandoc use the pdflatex engine when converting markdown file to pdf files. pdflatex can not handle Unicode characters very smoothly as xelatex. You should try xelatex instead. But, merely using xelatex command is not enough. As is often the case, you need to choose a proper font which contains glyphs for the Unicode characters your want to typeset.

I am a Chinese user, so take Chinese for example. If you have a test.md which contains the following content:

你好汉字

you can use the following command to compile this markdown file:

pandoc --pdf-engine=xelatex -V CJKmainfont="KaiTi" test.md -o test.pdf

In the above command, --pdf-engine=xelatex is used to select the LaTeX engine (for the new version of Pandoc, --latex-engine option is deprecated). -V CJKmainfont="KaiTi" is used to select the proper font which support Chinese. For other languages, you may use the flag -C mainfont="<FONT_NAME>".

How to find a font which support your language

In order to find a font which supports your language, you need to know your language code. Then, if you are on Linux system or on Windows systems with TeX Live installed. You can use the following command to find a valid font for you language:

fc-list :lang=zh #find the font which support Chinese (language code is `zh`)

The output on my Linux system is shown belowenter image description here

If you choose to use, e.g. the font Source Han Serif CN, then use the following command to compile your markdown file:

 pandoc --pdf-engine=xelatex -V CJKmainfont="Source Han Serif CN" test.md -o test.pdf

UPDATE: the answer below seems to be valid for pandoc 1.x but with later versions the syntax has changed


Coming back to this post in five years time and the issue is still there. The command

pandoc -s test.md -t latex -o test.pdf

fails when test.md contains text with non-latin characters, Greek, Cyrillic, CJK, Hebrew and Arabic included.

LaTeX was designed before Unicode and its support for different character sets is robust in some areas but far from comprehensive, so the advice to use XeLaTeX is valid yet requires one to choose the main font carefully, since there is no automatic choice.

Below is a small taxonomy of possible issues and some solutions. All tested with Pandoc 1.19.

Cyrillic

Support for Cyrillic alphabet in LaTeX is provided via T2A font encoding.

Consider a small sample:

# Header


## Subheader


Tetris (Russian: Тетрис) quoting Wikipedia is a tile-matching puzzle
video game

Running this example with pandoc would fail with:

! Package inputenc Error: Unicode char Т (U+422)
(inputenc)                not set up for use with LaTeX.


See the inputenc package documentation for explanation.

A fix is available as fontenc option is a predefined variable in default.latex template.

Running this example with

pandoc -t latex -o tetris.pdf -V fontenc=T2A cyrillic.md

would produce correct rendering

Text with cyrillic characters rendered correctly

This however would not handle other language features correctly such as hyphenation. A better way would be to use Babel and have it select the correct font encoding.

pandoc -t latex -o tetris.pdf -V lang -V babel-lang=russian cyrillic.md

Or to switch languages with Babel commands inside Markdown

# Header


## Subheader


Tetris (Russian: \foreignlanguage{russian}{Тетрис}) quoting Wikipedia
is a tile-matching puzzle video game

And run with

pandoc -t latex -o tetris.pdf -V lang -V babel-lang=english \
-V babel-otherlangs=russian cyrillic2.md

Greek

The example in the original post contains characters both from the main and extended Greek Unicode codepages.

Anyway, the widely used LGR greek font encoding is not covered by LaTeX 3 project and is classified as a local encoding, i.e. it may vary from site to site and from system to system according to the LaTeX Encoding Guide.

On TeX Live the following packages need to be installed: texlive-greek-inputenc, texlive-greek-fontenc and texlive-cbfonts. Note that you need Babel 3.9 or later. However the result of

pandoc -t latex -o anarchy.pdf -V fontenc=LGR greek.md

may appear unexpected.

Text with both Greek and Latin characters typed as Greek

In order to correct this issue one has to setup LaTeX Babel package correctly. And insert commands to switch between the languages in the original text:

# Header!


## Sub Header


themselves derived respectively from the Greek \textgreek{ἀναρχία}
i.e. 'anarchy'

Compiling this with the following command

pandoc -s greek2.md -t latex -V fontenc=T2A -V lang -V babel-lang=english \
-V babel-otherlangs=greek -o greek.pdf

would produce the output exactly as you would expect it to be:

Text with greek characters rendered correctly

XeLaTeX

All of this would not be needed if we were using XeLaTeX.

Just running the original example with

pandoc -s greek.md --latex-engine=xelatex -t latex -o greek.pdf

would produce

Text with Greek characters omitted

Because the font does not contain anything in the greek character positions the output contains some white space instead.

Selecting one of the popular fonts as the new mainfont would help a bit

pandoc -s greek.md --latex-engine=xelatex \
-V mainfont="Liberation Serif" -t latex -o greek.pdf

Text with only basic Greek characters rendered correctly

However characters from the extended Greek codepage such as the small letter alpha with psili accent are not rendered.

The Font Setup for Greek with XeTeX/LuaTeX Guide suggests to use DejaVu, Libertine or Free font families.

Indeed with DejaVu Serif, Linux Libertine O as well as Tempora and perhaps some other fonts, the result would be as expected. See below the rendering with XeLaTeX and Linux Libertine fonts.

pandoc -s greek.md --latex-engine=xelatex -V mainfont="Linux Libertine O" \
-t latex -o greek.pdf

Text with Greek characters rendered correctly with XeLaTeX and Libertine fonts

You can use --latex-engine=xelatex, as said before, but the best I have found is to use the lang variable to specify the document language in the header, like this: lang: ru-RU. A working example on my debian workstation:

---
title: Lady Macbeth de Mzensk (Chostakovitch, livret d'Alexandre Preis, 1934)
lang: ru-RU
---


# Acte I / Tableau 1


*[Народ ненадежный]*
Ха, ха, ха, ха, ха, ха, ха. *[...]* Чуыствуем
На кого ты нас покидаешь?
Без хозяина будет скучно,
скучно, тоскливо, безрадостно.


Не работа. Без тебя невеселье. Воз вращайся
Как можно скорей, скорей !

Then you can launch:

$ pandoc -o your-file-output.pdf your-source-file.md

I had a similar issue trying to get mathematical symbols to show up in the output.

As others have mentioned, with recent pandoc versions (v2.2.3.2 in my case) the option to use is pdf-engine=xelatex. I did not need to specify a font in this case:

pandoc -o MyDoc.pdf --pdf-engine=xelatex  MyDoc.md

I did get an error that the latinmodern-math font was missing. I installed it using:

tlmgr install collection-fontsrecommended

Works for Cyrillic characters

pandoc myfile.md --pdf-engine=xelatex -V mainfont=Arial