Skip to content

Filtering with Custom Filters

Filtering Text with Custom Filters

This guide will skip most of the details of writing custom filters, since it would rapidly expand the scope of this tutorial.

However, we include a few notes about how custom filters work and how to apply them.

How Custom Filters Work

pandoc converts every document, regardless of structure, into a universal JSON document structure. This is what enables it to convert among formats, and it is also where any document filters are applied.

To illustrate: imagine we have some abstract custom filter called foobar that will make all bold text underlined (for example), and we are applying it when converting from Markdown to HTML.

The process to apply this filter is as follows:

Input fmt --(pandoc)--> JSON -- Filter --> JSON --(pandoc)--> Output fmt 

In practice, we can create this pipeline on the command line, or from Python.

Also note, it is important to use pandoc with the -s (standalone document) flag.

Writing Custom Filters

See this example for a Panflute filter template.

The important parts of a pandoc filter are:

  • Prepare function - run once, before the document is filtered. This is useful for initializing lists, dictionaries, or files.

  • Finalize function - run once, after the document has been filtered. This is useful for doing something with information extracted from the document.

  • Filter actions - these functions are run once for each chunk of text. Each filter action is passed a chunk of text in. If a filter action returns nothing, the section of text is left unmodified. Otherwise, the (filtered) text returned by the function is used in its place. There can be multiple filter actions per document.

  • Main function - this is where the filter is actually applied.

Here is a barebones filter:

def main(doc=None):
    return pf.run_filter(action,

def prepare(doc):

def action(elem, doc):

def finalize(doc):

if __name__=="__main__":

Using Custom Filters

Command Line

Following the example above, let us apply a filter while converting from github flavored markdown to HTML.

Start with the initial step, which is markdown to JSON (include the -s flag):

$ cat | pandoc -s -f gfm -t json

Now this JSON is passed through a filter, which also returns JSON (more details below). If we have a panflute filter that is in the file, it will take JSON input and return JSON output. We use it like this:

$ cat | pandoc -s -f gfm -t json | python

This will output filtered JSON text. (More on the filter itself in a moment.)

The output from the filter can now be passed to another Pandoc process and converted to a document in the same or a different format:

$ cat | pandoc -s -f gfm -t json | python | pandoc -s -f json -t html -o index.html

Lastly, if we make into an executable file called foobar by adding the header #!/usr/bin/env python and chmod +x foobar, we can use it as follows:

$ cat | pandoc -s -f gfm -t json | foobar | pandoc -s -f json -t html -o index.html


Custom filters can also be applied using pypandoc, a thin Python wrapper for the Pandoc command line client.