Parsing tags in markdown

First published: 2020-10-19

Last Edited: 2024-05-12

Number of edits: 10

Luhmann method suggests using tags for categorizing entries. I am not sure how I feel about it, but for sure it can help re-discover new notes based on a topic, following an apparently random jumping process. Therefore, to be able to include tags into notes without a fixed structure, I wanted to parse the markdown file and identify strings that start with a #. The regex was inspired by the parses of sublimeless:

RE_TAGS = r"(#+([^#\s.,\/!$%\^&\*;{}\[\]'\"=`~()<>”\\]|:[a-zA-Z0-9])+)"

The tricky part was to add it as an extension to the markdown parser in python so that it would not only transform the content, but it would also store it for using in the main script. I followed a similar approach to that defined for the wikilinks:

First, define an InlineProcessor that will happen after the code has been standardized. Fortunately, the extensions API is well documented, but I still had to go to the code to find out what priority to give to it.

In my case, when I define the TagExtension, I use:

md.inlinePatterns.register(TagInlineProcessor(TagInlineProcessor.RE_TAGS, md), 'tags', 65)

The 65 means it will happen right after the SimpleTextInlineProcessor and the AsteriskProcessor, but I am not sure this is the best place.

Backlinks

These are the other notes that link to this one.

Nothing links here, how did you reach this page then?

Comment

Share your thoughts on this note. Comments are not public, they are messages sent directly to my inbox.

Aquiles Carattino

This note you are reading is part of my digital garden. Follow the links to learn more, and remember that these notes evolve over time. After all, this website is not a blog.