guest@acallard.net: $ ~ stat blog/2024-04-23.a-new-blog-update-org-mode-processing-in-pelican.html
guest@acallard.net: $ ~ cat blog/2024-04-23.a-new-blog-update-org-mode-processing-in-pelican.html

A new blog update: Org-mode processing in Pelican

This website just received its first big upgrade, and this post contains some technical explanations of what changed, how… along with the story behind it all! This update should, apart from some very minor changes to CSS, remain completely imperceptible to the end user1; however, on the writing side of the website (i.e. me), the processing pipeline changed quite a lot: while Pelican used to compile Markdown files into a website, I am very happy to say that I completely got rid of the Markdown format to transition to another one… Org-mode!

Some context

How did this happen?

This is all Léo’s fault2! 😂

To understand where this is coming from, you need to consider what happens on a regular basis in a computer science lab: some people definitely enjoy spending a lot of time dabbling and tinkering with their OS (paradoxically, most of these specimen appear in the theoretically-focused teams), and even though they are not a majority, they can sometimes be quite vocal about it. Their computer can have a somewhat obscure Linux distribution (Pascal, my PhD advisor, is the one who introduced me to Void Linux a few years ago, before I even began my PhD with him), or they can be obsessed with some unusual programming languages. For example, I may be a bit of an OCaml fanatic (though relatively to the amount of code I produce, which isn’t much): I like using OCaml because it is a fantastic programming language, it contains mostly everything I need, it is quite fast, and functional programming languages are a soft spot of mine (they’re just so cool! 🤩). I often like to joke3 that we should (for once) get ahead of MIT, decide by ourselves the programming languages we teach to bachelor’s students, and definitely remove Python, Java and co altogether from the students’ curriculum to replace them with OCaml. Wouldn’t they be thankful for enjoying the awesomeness of functional programming, elegant language designs, and every other benefits that come with it? Realistically, though, I have to concede that my declarations of love for OCaml are often met with tired shared looks by my colleagues, along the lines of “oh, here he goes again…”. But because I know they expect me to say something when conversations about programming languages come up, I have to perform said conversation and not disappoint!

Anyway, these computer-fiddling people are to be expected in a computer science lab. It’s completely okay to be in the lab and not enjoy these topics for daily chattering, but one should probably expect some people there to be into these kinds of things. And to be honest, being quite fond of OCaml is really not the most exotic opinion you could find. In terms of originality, I may be the only one putting up with a Wayland setup on my machine, but many people could do it if they actually wanted to. Other opinions about computers usually include a variety of Linux distributions/pieces of software, one of which being of particular importance in our line of work: the choice of a text editor. Nowadays, the text editor war has been over for quite some time, and people now live in peace and harmony using Vim, Emacs or Codium depending on their personal opinions. And then, there’s Léo. If I had to mention one computer-related thing to know about him, I’d probably have to say the following: Léo really loves Emacs.

To be completely honest: he’s really good at it. When you know enough about Lisp (and Elisp) to add any functionality you want to Emacs, I am under the impression that you can pretty much do anything: the ability to access and edit Emacs internals seems incredibly powerful, and helps a lot into personalizing it according to your own tastes. It also makes the thing powerful enough that people in the Emacs community semi-seriously (or semi-humorously?) call it an OS in itself: emails, navigating the filesystem, editing files, git… if you want to stay within Emacs and its (quite efficient) buffer system without ever relying on another piece of software, you probably can.

Anyway, Emacs being this awesome and Léo being good at it meant two things. The first of them: when he learned that I was using Emacs for text-editing4 while being completely ignorant about how one is supposed to set it up, he helped me change my init.el configuration file as to make Emacs more practical. Now, I have an awesome auto-complete system, a shell with utop integration inside my OCaml file editor, a terrific documentation, and I didn’t know I needed any of these before I actually got them (and now, I wouldn’t give these up for anything). I am still awful at Lisp (half of the time, I have no idea what I am doing when I am editing configuration files), but at least I have a text editor and a file manager that I am very happy about. The other thing Léo did was that, when he learned that I was using Markdown files for my website, he looked at me with genuine surprise5 and asked: “but why do you use Markdown for text formatting when there is something better in every single aspect?”

This was my first introduction to Org-mode!

What is Org-mode?

According to the official website7, Org mode is a markup language for note taking (pretty much like Markdown) that (unlike Markdown) integrated many different features. To name a few:

  • the syntax for tables is very similar to Markdown, except that the editing system supports many different keybindings8 which make your life infinitely more practical: insert a column, align a column, move a column, automatic filling…
  • Org files can be exported to other formats, like Tex of HTML. I will talk a bit more about this below, but everything can be personalized for export. Plugging in your favorite LaTeX class/your own CSS, this means you can generate pretty much any document you want.
  • Org mode supports extensive linking functionalities: for each type of link (and, of course, you can declare your own types/syntaxes), you can program what happens when you click on it (what file/URL should be opened, with which program…), how you export it (what is the HTML wrapping around this link?), etc… This is particularly useful for my Pelican setup, so I will talk a bit more about this below.
  • One of the main uses of Org mode is also task management, making it easy to manage scattered tasks written while taking notes. A calendar system is also integrated in Org-mode, and agendas can be automatically generated from an .org file’s content.
  • Finally, since code in pretty much any language can be integrated directly into an Org file, to be either executed or just appear verbatim, you get some kind of coding notebook system (like Jupyter).

So, Org mode sounds a lot like Markdown, and in terms of plain text file there probably isn’t much more to it; but since it has been developed to be part of Emacs, its integration within Emacs leads to a ton of additional features that no other text editor can compete with. Since Emacs is increasingly likely to become my favorite text editor, I am slowly learning more about the Emacs ecosystem and all the features that come with it9; and even though the learning curve is a bit hectic, the steps taken to become proficient in Emacs are actually worth it when you realize this knowledge can apply to pretty much anything you need, since you can do everything in Emacs if you are experienced enough. So, if I am to integrate all my usual activities within Emacs, why not integrate blogging into it as well?

Furthermore, the export system of Org-mode is marvelous to use, especially for managing the compilation pipeline of a website. Basically, Org mode relies on backends to export to other formats: an Org file is just a structured file (i.e. a tree) that contains blocks, and a backend is a long list of functions that tell how each block will be exported to a given format. What is amazing is that it is incredibly easy to override some specific functions of a given backend, as to change its output; or just to copy a given backend and change it into a new one. Since the syntax of Org mode is already quite permissive, you get a very flexible system that easily adapts to one’s needs.

It so happens that I had wanted for some time already to declare a new type of blocks in my blog. In Markdown, there is no easy way that I was aware of to pull this off, at least not without learning about a whole Markdown plugin machinery for Python-Markdown, whose inner workings are somewhat obscure to me. So what I did until now was to put plain HTML into my Markdown files: it worked, but this is very ugly and I did not like it. On the other hand, it is very easy to find that Org-mode’s code for HTML exporting is contained in the file ox-html.el.gz, which is basically a very long list of functions defining which HTML tags should be used to wrap the content of each block. Overriding these local functions/creating new ones is then extremely easy (you can just tell Emacs to replace each call to the original function by a call to the new one instead), so that I can chose whichever HTML tags and code will appear in the end. This is all a matter of Lisp coding, and in the end I get the “exact HTML I desire” (up to my own coding abilities, of course: I still know very little about Lisp, but I won’t let this slight annoyance stop me!).

My current Pelican pipeline

So, how do I integrate all this Org mode awesomeness into my website? First of all, as I already mentioned in this post, my website is powered by the static website generator Pelican. The way it works is quite straightforward:

  1. For each .md file, collect all the mandatory and optional metadata (title, date…) and call Python-Markdown to process the body of the file into an HTML export.
  2. Create a big Python dictionary containing all the articles and their metadata, and do some processing: automatically generate the index of all articles, replace links using Pelican’s internal linking syntax with the correct HTML links…
  3. Export the website into HTML files using Jinja, a templating system (so that one can chose the structure of an exported HTML file, call the right CSS/Javascript files, etc…).

For this update, I only had to replace the first step entirely by:

  • For each .org file, collect all the mandatory and optional metadata (title, date…). This step was heavily inspired by the org_python_reader plugin from the Pelican plugin repository.
  • For each .org file, call Emacs and run the org-export-as function to export the body of each article into HTML code. This step was pretty much entirely copied from the org_reader plugin from the same repository.

I was expecting everything to break apart at the first occasion, but somehow it worked really, really well: there is little to no difference between the original Markdown exports and their newer Org mode counterparts, and apart from two minor things (namely, Pelican’s linking system and and footnotes), I only did minor adjustments to Org-mode’s HTML exporting system to get my new Org-based blog generator to be at least as functional as its Markdown ancestor.

Some details I want to share to fellow Pelican users

As mentioned above, while the system was mostly functional from the start, there were a few moments where I needed to rewrite some functions of the Org-mode HTML exporting backend (say hello to my new friend ox-html.el.gz! 👋) to suit my needs. I will list them here for the interested reader, but said hypothetical reader probably needs to know at least a little bit about Pelican for it to be either interesting or understandable.

The Pelican linking system (mandatory)

As mentioned in the official documentation, Pelican developed a very useful internal linking system. The idea is to track links from the source content to the generated content: when writing articles and pages on my computer, I edit .org files (or .md if you didn’t switch to Org mode) that form what is called the source content; after Pelican processes them into HTML and generates the final exported website, the organization and hierarchy of the website forms the generated content. Here is the point: there is no reason for the URLs of the final exported website to match with the organization of the .org source files on my computer. And this is very useful: one can change the URL structure of a website (say, if I wanted to move my URLs from /blog/{article-name}.html to blog/{category}/{date}_{article-name}) without effectively having to rename all of the source files by hand; second, I do not have to keep up with some internal file organization and I can instead put all of them into the same directory11, since the exporting system will take care of the organizing for me.

Concretely, assume that I want to create a link to another blog post in the new entry I am currently writing (which I did a few paragraphs ago when I mentioned this post). Instead of having to find what the final URL of the post is (and take the risk that, if I were to change my URL schemes, all links on the website would end up broken), I can instead link to the source file of the post thanks to Pelican’s linking syntax {filename}/path/to/file, and Pelican will sort things out during the processing and export! Since Pelican does this rewriting step (changing links using Pelican’s internal linking system to the final URL) on article bodies that have already been transformed into HTML, the only thing to do here when transitioning to .org compilation is to ensure that links to {filename}/path/to/file are correctly exported into HTML as <a href\="{filename}/path/to/file">, and Pelican will take care of the rest.

Unfortunately, Org mode is too smart for me. The markup syntax for links [[url-of-my-link][replacement-text]] is actually tied to a lot of very useful functionalities, one of which in particular being an automatic check on whether the given link is dead or not. This even works on files: the link [[/path/to/file.org][replacement-text]] checks whether the file file.org really exists or not on the designated path. If not, Org mode raises an error and refuses to export the link. The problem here is that {filename}/path/to/file.org is obviously not a valid path on my filesystem, so Emacs was not happy and refused to export articles containing such links, even though Pelican would have been very happy to replace the source link with the generated link in the final export. 😢

My solution to this predicament involved using Org mode linking functionalities to declare a new type of link, as mentioned in the documentation. Using org-link-set-parameters in org-link-set-parameters "pelican" :export #'pelican/export-pelican-link, I could declare a new syntax for links of Pelican, namely pelican:my-link, so that [[pelican:{filename}/path/to/file.org][replacement-text]] will correctly be exported to <a href\="{filename}/path/to/file">replacement-text</a> by the function pelican/export-pelican-link, and later Pelican will rewrite the link to its corresponding URL accordingly during the final publishing. For now, this pelican: type of links is very basic, but I will (someday, when I understand Lisp a bit better) declare a :follow attribute to define what Emacs should do when trying to open links of this type. In particular, I could probably get the best of both worlds12: keep the checks for dead links, while working with Pelican’s internal linking syntax!

Abbreviations

One functionality that doesn’t exist in Org mode is a way to manage abbreviations. In HTML, for example, there exists an <abbr> tag that can be used to denote whenever something is an abbreviation (and, in which case, what is being abbreviated). Since an example is worth a thousand words, here is a self-explanatory illustration: <abbr title="GNU is Not Unix">GNU</abbr>. On this website, a Python-Markdown extension used to handle the marking of abbreviations as such, but I found nothing for Org-mode, unfortunately.

However, Emacs is awesome, so this was the occasion to rely on another component of Emacs sorcery: hooks! Basically, a hook is a collection of functions that are called on specific occasions. The trick is that one can easily add new functions to some already defined hooks, which enables to run additional functions at said specific moments. For example, there are tons of hooks in Emacs (for every single moment, from pressing/releasing keys to charging new files, searching in a list…); I think my init.el configuration file mostly consists in setting global variables and adding functions to hooks. In the context of adding abbreviations to Org mode, the hook that interests us is 'org-export-before-parsing-hook (it is run, as the name nicely tells, whenever Org mode begins to export a file to another format).

The way abbreviations work in my setup is the following: whenever the hook 'org-export-before-parsing-hook is run, a bit of code looks at a list of abbreviations that were defined in the source file that is to be exported (the syntax for defining an abbreviation is #+ABBR: [GNU] GNU is Not Unix, which seems practical enough!), and each occurrence of the word GNU in the body of the file is replaced by its HTML counterpart <abbr title="GNU is Not Unix">GNU</abbr>. The export of the file then resumes as usual, and this is how I get abbreviations like OS to display nicely in a web browser!

To be perfectly honest, though, I had no idea how to pull off such a feat on my own; so Léo, being the very kind Elisp magician that he is, just sent me a bunch of code doing it! 🧙

Minor modifications

There are a few other slight alterations that I added to Org mode exports, mostly for my own comfort13. The funny thing is that these minor modifications are probably the ones that took most of the time to implement, because I was trying to copy and modify a bunch of Org mode functions (written in Elisp, which I do not know), and I only succeeded through long trial and error!

  • Org mode supports syntax highlighting for code blocks in a very low-level way: since Org mode is used in Emacs, and that Emacs supports syntax highlighting, Org mode just checks what color is currently used in the buffer that is displaying said code, and hardcodes this color in the HTML export. It gets the job done, but I wanted to do something different instead. Indeed, back when I was using Markdown, syntax highlighting was handled by Pygments, whose CSS files I still had (and was in fact hoping to reuse).
    So, as suggested here, I replaced the original Org mode code that handles the export of src-blocks (blocks that contain code) by a call to pygmentize. (Did I already mention how cool it is to be able to override calls to Emacs internals by your own functions?)
  • In case you didn’t notice, I really like footnotes. I use them everywhere, half of the time for insightful and apropos remarks, half of the time for silly jokes (mostly at my own expense!) or silly comments that completely break the reading flow of the blog entry14. However, I did not like how footnotes were exported by Org mode: it was not working with my previous CSS, and the HTML div organization was different. So I overrode org-html-footnotes-section and org-html-footnote-section to change them to whatever I preferred. Right now, it mostly looks like what I wanted it to; but one day, I will get proficient enough in Lisp to display footnotes at the end of each section, rather than grouping all of them at the end of the page. I already wanted to implement this idea when I began writing this blog, but never got around it in Markdown. In Elisp this definitely seems achievable, but it seems to require extensive modifications to the current HTML rendering backend and I am not quite there yet!

Final word

So, gather around people, the moment hath come: this website, powered by Pelican, is entirely written in Org mode! 🎉

Was it necessary? Definitely not. Was is useful? Most doubtfully. However, it was a nice holiday project that kept me busy for two days, and I enjoyed it very much! Plus, my blog gets an additional level in nerdiness! I could say that this will motivate me to write more often, but we all know that this would be complete bonkers (Markdown wasn’t slowing my writing down). Yet, transitioning to Org mode brought me closer than ever to rely on Emacs for everything, and I am very content with this!


  1. Provided that they do, at least, read the website in a somewhat functional web browser. If anyone is reading the website as a text file in pure HTML, they will immediately notice that every paragraph now has an id, whose name gives it away.

  2. First of all, Léo is the other PhD student of Pascal Vanier, and we study the same kinds of topics (subshifts). Second: I am sorry Léo, that was uncalled for! You have been really helpful since the beginning of my Emacs tribulations, and I am actually very grateful! ❤️

  3. Or am I? There is only one reason I do this kind of jokes: they’re funny, because they’re true!

  4. Okay, you got me: for OCaml development! 😁 Tuareg and Merlin are two awesome tools for an OCaml setup, and there is nothing like them in any other text editor that I know of!

  5. There is something incredibly humbling in being able to manifest this kind of genuine surprise when learning that people do not use your obscure piece of software6. I have nothing but respect for Léo to manage it.

  6. Okay, it is highly possible that calling Org-mode an “obscure piece of software” is quite an exaggeration. It is in fact very well-known (I just had never heard of it before).

  7. If you get into Emacs, God help you, you will probably need to get used to reading the official website and the documentation. Half of the things I understood probably came from reading the documentation. The other half was Léo quoting parts of the documentation that I probably should have found by myself!

  8. This is the Emacs way: if you don’t end up with wrist pain at the end of the day, you are doing something very wrong!

  9. You have no idea how wild this rabbit hole is. None! 😂 Did you know that Emacs contains a few hundred lines of code to compute sunrise and sunset hours at any point on the globe for the next eight hundred years? It is sourced to a few astronomical almanacs from the 80s. The very same thing exists for the lunar phases. There are also functions to convert dates from one calendar to another, including (but not exhaustively): the Maya calendar, the Coptic calendar, the French Revolutionary calendar10

  10. I was so impressed with the latter that I envisioned, just for a few seconds, to replace the date program on my computer (and, of course, the display on my website) with this floral decimal calendar system. I was immediately discouraged by opponents of the revolution, taunting me with some comments along the lines of “it won’t be easy to schedule an event with friends”; but I did not entirely give up on the idea nevertheless.

  11. Keeping up with a consistent organization system is a bit of work, whereas I do not need that anymore since there are wonderful pieces of software (Emacs, fzf for zsh…) with functionally-rich search features that make it possible to manage files in very busy directories.

  12. Here is a completely unrelated fun fact: when I say something, I very often think of a song that contains the final part of the sentence I just said. When working on the fixpoint construction for my PhD (a subshift-related math topic), in which some macro-tiles are called “parents” and “siblings”, I had thousands of songs constantly popping in my head (like this one), and I think some people were very disappointed in me for singing them and not keeping it shut. (I am very sad about this: it takes a lot of efforts to think of silly things to say and silly songs to sing!) Anyway, all this rambling to actually say: after the sentence that started this footnote, I have to sing this song too!

  13. But isn’t my own comfort the whole point of this update?

  14. Okay, I feel like this deserves a lot more attention: and what is more meta than a footnote about footnotes?!

    So, why do I add irrelevant footnotes all over the place, if they break the natural flow of reading? While they may be more or less unconnected with what they refer to in the main text, they may actually be relevant to what I was also thinking about when writing said main text. They give context about me, the author, while the main text hopefully pertains more to what I think. What do these footnotes tell about what happens in my mind, though? Well, whatever it is, it is probably all over the place; but I like providing bits of contexts, trivia or disconcerting thoughts I had while writing an otherwise hopefully structured speech.