Pandoc supports intermediate modification of the Abstract Syntax Tree (AST) between the parsing and writing phase using filters. This supplement paper highlights the use cases of such filters written in the Lua language on LaTeX, markdown and native AST.
To fully understand pandoc filters, one must first grasp the document conversion process. Pandoc will read/parse an article and then store it internally as an intermediate form, known as an abstract syntax tree (AST). Then, A document writer(a specialized type-setter) will read the AST and produce the contents in the chosen article format. Converting markdown to HTML, for example, will entail these pandoc operations. Figure 1 shows the conversion workflow within pandoc.
When specific elements require adjustment, the Lua filters enter the picture. Assume you’re converting markdown to HTML and want the page to have automated numbering for figures or tables. If you use markdown, there is no system for automatically numbering figures/tables; to retain 1:1 conversion, the HTML output will also lack numbering.
At this point, you might wish there was a way to add such a feature. But including every customization as an option in pandoc or the writer is not viable. To address this, the idea of filters emerged, with which you could modify the AST before the writer could read it to obtain the desired result (John MacFarlane and other pandoc authors, 2023).
Pandoc filters can technically be written in any language (called JSON filters) 1.
Continuing the example above, let’s create a dummy article, where we
need to add numbering. One way could be to keep a counter of images, and
add a prefix "Figure X :" to each figure caption. This will serve the
purpose of numbering the images in the end result.
{width="10%"}
{width="15%"}pandoc example.md --from markdown --to html5 --output example.html
Now if we convert the above markdown file to HTML5 using the pandoc command in Figure 2, we get Figure 3.
As we can see, there is no figure numbering done automatically, which is generally the expected result. If we want to include numbering, we would need to write a Lua filter. This Lua filter will modify the AST and make the changes we desire.
We call a Lua filter in the pandoc command show in
Figure 2 using the
--lua-filter name_of_filter.lua option in pandoc.
In the next section we write a Lua filter to manipulate the figures in Figure 4.
figures = 0
is_fig = 0
function Figure(el)
local label = ""
pandoc.walk_block(el,{ Image = function(el)
is_fig = 1
end})
if is_fig == 1 then
figures = figures + 1
label = "Figure " .. tostring(figures) .. ":"
end
local caption = el.caption
if not caption then
caption = {pandoc.Str(label)}
else
caption = {pandoc.Str(label),pandoc.Space()}
end
el.caption.long[1].content = caption .. el.caption.long[1].content
is_fig = 0
return el
end
Everything in Lua revolves around tables; for example, the pandoc AST or the document is one giant table with sub-tables.
In pandoc, there are two sorts of elements: ‘Blocks’ and ‘Inlines’.
‘Blocks’ are complicated constructions constructed from simpler pieces
(‘Inlines’). A ‘Para’, for example, is a ‘Block’ made up of
several ‘Str’ ‘Inlines’.
The first step in writing a filter is to select a target type, that is,
which ‘Block’ or ‘Inline’ this filter will target. Following the
selection, we name the function after the target type and each element
in the argument as ‘el’. For instance, in the case of our filter, it
would be function Figure(el). Pandoc is smart enough to match the
names of filter functions to the AST elements.
Now within the filter function we can assume that el will contain the
table object of a Figure element from the document. We can use many
functions over it or ‘Inlines’ contained in it. One such function to
walk over all the elements inside the block is walk_block. Walk
functions are a great way to check or count the presence of certain
elements in a block. We use walk_block to check if there are any
‘Image’ inline elements within the ‘Figure’ block. This is because
after pandoc 3 (Krewinkel and A. Lucero, 2023) ‘Figure’ blocks can now contain elements
other than ‘Images’ as well.
If there is an Image then we append "Figure X :" to the caption of
the Figure element. and return the modified element.
The filter in Figure 4 when included in the pandoc command will generate Figure 6 using the command in Figure 5.
pandoc example.md --from markdown --to html5 --output filtered-example.html
--lua-filter image_numbering_filter.lua
Pandoc Lua filters are used in the texor package to modify the AST for various markups. Table 1 summarizes the use of each filter.
| File Name | Description |
|---|---|
abs_filter.lua |
Filters out unicode 182 character. |
bib_filter.lua |
Clears out the bibliography from the article itself as it will be added to the metadata. |
R_code.lua |
Adds a class component 'r' to CodeBlocks for code highlighting. |
image_filter.lua |
Fixes extensions for image paths without an extension or with .pdf extension. |
table_caption.lua |
Adds table numbering in the captions similar to the ones found in R Markdown and kable-based tables. |
image_caption.lua |
Adds Figure/Algorithm numbering as well as clears out residuals of tikz/algorithm images. |
conversion_compat_check.lua |
This filter keeps a count of all Inline and Block elements and writes it to a yaml file. |
equation_filter.lua |
Adds bookdown style equation numbering to LaTeX equation/math environments. |
bookdown_ref.lua |
Corrects numbering for various elements. |
issue_checker.lua |
Searches and notifies any occurrence of unrecognized math commands. |
find_pdf_files.lua |
This filter creates a list for all image PDF files included in the article. |
widetable_patcher.lua |
This filter sets the representation for widetables. |
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Ulayil, "Understanding pandoc lua filters", The R Journal, 2025
BibTeX citation
@article{RJ-2025-000,
author = {Ulayil, Abhishek},
title = {Understanding pandoc lua filters},
journal = {The R Journal},
year = {2025},
note = {https://rjournal.github.io/},
volume = {17},
issue = {3},
issn = {2073-4859},
pages = {1}
}