Generating a PDF from a Jekyll post using `pandoc`
I discovered pandoc
.
If you need to convert files from one markup format into another, pandoc is your swiss-army knife.
It can convert from markdown to PDF using LaTeX, so I used it to generate the PDF version of TheMarketplace.Guide.
It ended up being very straightforward. Here are the generalized instructions, instead of a detailed tutorial, simply because my use case was very specific.
Input is a markdown file
Because any jekyll post is valid input, you can get started just by running pandoc
on it directly.
In fact, here is this post in PDF with no customizations whatsoever.
The command is:
pandoc generating-pdf-from-jekyll-using-pandoc.md \\
--output=generating-pdf-from-jekyll-using-pandoc.pdf
It already looks better than wkhtmltopdf
.
I added a pdf
field to the frontmatter
Any posts with that field set to true will have their PDF automatically generated.
The code for that looks like this:
site.collections['posts'].select{|x| x.data['pdf']}.each do |post|
`pandoc --from=markdown --output=#{post.data['slug']}.pdf \\
#{post.path}` unless File.exists?("#{post.data['slug']}.pdf")
end
It is in the post_write
hook. It also only generates documents that don’t already exist. This is a tradeoff between making it run quicker and having to delete files to update them.
You can also include additonal LaTeX functionality
You can use the frontmatter to customize the LaTeX generated.
The fields I found most useful are title
, author
and date
which allow you to generate a title page. In the example PDF above, the title
from the frontmatter is the title in the PDF.
You can also specify the documentclass
. The default is article
, but there are many more including book
and report
.
There is also the header-includes
field that adds commands into the preamble. It must look something like this:
---
...
header-includes: |
\usepackage{hyperref}
---
Focus is always on building new things. But I think we need to be stricter with what we do build.
I found it somewhat finicky. I had to put it last in the frontmatter, or it’d not get picked up and it had to follow exactly that alignment.
\usepackage[left=1in,top=1in,right=1in,bottom=1in]{geometry}
is a useful snippet to add in the header-includes
. It sets all the margins to one inch.
And you can sprinkle LaTeX in
Your input file can also include valid LaTeX commands.
I used \chapter
and \section
commands to properly structure the document.
\newpage{}
between sections.
I also manually generated some content, not in the original markdown, and I used \item[$\blacksquare$]
.
The final command
For TheMarketplace.Guide PDF the full command I use is:
pandoc --from=markdown --output='The Marketplace Guide.pdf' \\
combined.md --shift-heading-level-by=1
I used --shift-heading-level-by
to have an additional level for the table of contents, by using \chapter
.
I added this command into the post_write
hook in Jekyll. The command takes about 15 seconds to run because it combines some hundred pages. Therefore, the hook runs the command only if the site isn’t serving (!site.config['serving']
), only when running jekyll b
.
The result is a nicely typeset document for which I’m comfortable charging.