The regular readers of the HTML view of this blog (if they exist) may have noticed some changes in the syntax highlighting of the source code in my posts. I've tweaked the CSS style sheet to use the RubyBlue Vim theme instead of the Blue Vim theme. This is the occasion to explain you my process to format source code samples in HTML.

I'm a Vim addict, using it on Linux, AIX and Windows. Vim has a powerful and extensible syntax highlighting engine that can format almost any existing text file format. And most importantly, it has a plugin that can export the highlighted code as HTML.

There is many advantages in using Vim and storing statically highlighted code:

  • I can use the huge set of languages supported by Vim highlighting ;
  • I can use the huge set of themes built for Vim and easily convert to a CSS for the web ;
  • if a language is not supported, I can define highlighting myself (I already did it for 4 languages) and it will be done once for use both in Vim and on my blog ;
  • as I store only pure HTML as data in the blog engine (no special wiki code, no plugin), I am not dependent on the engine I'm currently using ;
  • no charge on the server (as with PHP formatting engines such as GeSHi) or on the client (such as with syntaxhighlighter) ;
  • as we are using simple <pre> tags, there is no characters/tags pollution: the reader can simply select and copy the text to the clipboard ;
  • last but not least, I can tweak the output to improve it and fix the highlighting bugs (some languages are very hard to parse).

In Vim you can invoke the conversion to HTML from the "Syntax" menu or that way:

:runtime syntax/2html.vim

To get the best XHTML code, I'm using the following settings in $HOME/.vimrc:

syntax on
" Conversion HTML (:help 2html.vim)
let g:html_use_css = 1
let g:html_use_encoding = "utf8"
let g:use_xhtml = 1

For example, here is the HTML I extract (I remove anything around the <pre> tag) from what is generated by 2html.vim from the code above:

<pre>
<span class="Comment">&quot; Conversion HTML (:help 2html.vim)</span>
<span class="Statement">let</span> g:html_use_css <span class="Operator">=</span> <span class="Constant">1</span>
<span class="Statement">let</span> g:html_use_encoding <span class="Operator">=</span> <span class="Constant">&quot;utf8&quot;</span>
<span class="Statement">let</span> g:use_xhtml <span class="Operator">=</span> <span class="Constant">1</span>
</pre>

I just have to add my own set of classes to enable highlighting:

<pre class="code vim vimft-vim">...</pre>

Here is the semantic associated to the classes:

  • code is my generic class for source code blocks 
  • vim is for source code formatted using the Vim classes for highlighting 
  • vimft-html is the class for the specific kind of source code: Vim's filetype option, displayed with ":set ft?".

For terminal output samples, I'm using my own highlighting using semantic XHTML tags :

  • <pre class="terminal">, the enclosing tag, with an optional class :
    • unix for Unix/Linux samples ;
    • cmd for Windows cmd.exe shell code.
  • <kbd> for what is typed in the terminal, with the following optional classes:
    • shell for any Unix shell samples ;
    • bash or ksh (in addition to shell) for Unix shell samples that uses features which are not in the standard POSIX shell ;
    • cmd for Windows cmd.exe shell code.
  • <samp> for programs output:
    • prompt for shell or interactive programs prompts ;
    • shell, bash, ksh or cmd for shell prompts (in addition to prompt) ;
    • sqlite, sqlite3 for SQLite client samples...
  • <var> for variable input/output. Everything except the <var> content should exactly match if you repproduce it yourself. The title can indicate what the variable represent, and on which data it depends. The tag is always a direct child of either <kbd> or <samp>.

Here is an example:

<pre class="terminal unix">
<samp class="prompt shell">$ </samp><kbd class="shell">echo Hello, world!</kbd>
<samp>Hello, world!</samp>
<samp class="prompt shell">$ </samp><kbd class="shell">date</kbd>
<samp><var>samedi 25 août 2007, 18:51:00 (UTC+0200)</var></samp>
</pre>

And the final result:

$ echo Hello, world!
Hello, world!
$ date
samedi 25 août 2007, 18:51:00 (UTC+0200)

This semantic tags will allow me to provide later additional feature using JavaScript code. I'm thinking to a button that would hide any <samp> tags and keep only <kbd> tags to ease copy of the commands to a terminal to run the commands.

With these tags in place, the CSS stylesheet is quite short and simple. More importantly it is easily replaceable in case I change the theme of the blog.

pre.code.vim,
pre.terminal { margin-left: 1pt; padding: 5pt; }

/* Text not embedded in samp or kbd will be in red, to easily detect errors */
pre.terminal { background: #000; color: #f00; }
pre.terminal samp.prompt { color: #888; }
pre.terminal samp { color: #eee; }
pre.terminal kbd { color: #fff; font-weight: bold; }
pre.terminal var { color: #55f; font-style: italic; }

/* colorscheme rubyblue */
pre.code.vim { color: #c7d4e2; background-color: #162433; }
pre.code.vim a[href] { color: #0f0; }
pre.code.vim .Constant { color: #0c0; }
pre.code.vim .Comment { color: #428bdd; }
pre.code.vim .Identifier { color: #fff; }
pre.code.vim .Label { color: #ff0; }
pre.code.vim .Operator { color: #ff0; font-weight: bold; }
pre.code.vim .PreProc { color: #f9bb00; }
pre.code.vim .Special { color: #0c0; }
pre.code.vim .Statement { color: #f9bb00; }
pre.code.vim .Title { color: #fff; font-weight: bold; }
pre.code.vim .Type { color: #fff; text-decoration: underline; }
pre.code.vim .Underlined { color: #208aff; text-decoration: underline; }

I had to tweak a bit the blog engine I use (DotClear) to add this style sheet: modifying the template.php in the theme directory is not enough because the theme is used only on the public part of the blog. So I added an @import rule in ecrire/style/default.css to enable the CSS in the private area of the blog.

 

I would be glad to read your experiences about source code formatting on your own blog or CMS.

Update 2007-09-06: added missing information about the usage of <var> tags.