However, the generated markup differs highly with regard to semantic correctness and user experience. In this article I compare different approaches and present a setup, that suits both the semantics and the daily use for syntax highlighting.
State of the Art
pre element serves as starting point. It is used since the
olden days for putting source code samples on the web (while the original
listing elements never gained much respect). In the simplest case the
content is stuffed inside a
pre, perhaps marked up with simple emphasis on
When automatic syntax highlighters come into play, they offer many useful features for showing off source code:
- Obviously, highlighting keywords and special operators with different text styles
- Semantic markup, i.e., using appropriate elements for a task
- Simple copy’n’pasting of the code sample
- Highlighting single lines and zebra striping of lines
- Line numbers
- Configurable line wrapping or scrolling
I’ll compare different markup approaches on how they cope with these feature requests.
pre element with
spans for colorizing does well for the first four
categories, but it utterly fails for the last two. Line numbers, if displayed,
will land in copied text, rendering it unusable without post-processing. They
would also be part of the content of the
pre, which is not correct. Wrapping
is something, that is completely off-topic for plain
pre elements. They will
always scroll, as long as they are not forced to other behaviour via CSS.
The huge advantages are the otherwise correct semantics and the ease of use.
<span class="kw">function</span> foo<span class="op">();</span>
Another approach is using
table elements for highlighted code. The line
numbers are put in the first cell of a row, while the code moves to the second
cell, either plain or fenced by a
This approach is brilliant at handling line issues. Zebra striping comes naturally, and highlighting lines can be done by background color or by a dedicated column.
However, copying the code suffers from the same issue as the plain
solution, and the semantics are debatable: Has source code really a tabular
nature? Are the line numbers part of the information that the source is
The copying problem can be circumvented with the help of CSS: When the first
cells are hidden with
display: none, they won’t show up in the text from the
clipboard. The toggling of line numbers can be achieved using a tiny bit of
If the lines should wrap, there has to be taken care by the algorithm, that they still only occupy a single cell in a row, as otherwise the line numbering will get out of order. This is especially a problem in setups, where all line numbers are in one cell, while all the code is in a single cell next to it.
<td><span class="kw">function</span> foo<span class="op">();</span></td>
A variant of the table approach uses floating
divs to achieve the same
effect. This is nothing more than a symptom of Divitis, but doesn’t add any
Using an Ordered List
Focusing on the line numbers again, another HTML element comes to mind, that is
most natural for handling ordinal data sets:
ol. In this version, every line
of code is enclosed in a
li element, that marks single lines.
This solution is elegant for several reasons. The most important is, that the line numbers need not be handled in the markup themselves. They are automatically generated by the browser.
The markup is semantic in the way, that line-oriented content is put in an element, where sorting is important.
Zebra striping and highlighting are equally trivial as in the
table case, and
line wrapping is done automatically, together with correct adaption of line
numbers. A really nice feature is, that clicking the line number automatically
highlights the whole corresponding line.
What prohibits this solution, is the problem of copy’n’pasting code, again.
Even if the
lis receive a
list-style: none via CSS, the browser still adds
automatic line numbers to copied text.
Also, the line numbers themselves cannot be styled independently from the rest
of the content. They will always take on the text color of the
ol and no
custom background color.
<li><span class="kw">function</span> foo<span class="op">();</span></li>
General Problems when Skipping
ol solutions all exist in variants with and without
pre element. If a highlighter chooses to skip using it, even if
code element in place, there are several issues arising
- White space will suffer from the usual collapsing if not replaced by nbsps or changed with CSS
- Automatic HTML compression (stripping unnecessary whitespace) will likely destroy semantics in the code (think Python)
- Bots, screen readers, older browsers might display the code wrong or partial
- And forgetting to deliver an appropriate print stylesheet will finally end in the same effect for all “normal” users
pre Element with CSS
The promised new approach to marking up syntax highlighting is in fact a simple
extension to the old
pre technique. We only add a single new
to discriminate lines:
<span class="line"><span class="kw">function</span> foo<span class="op">();</span></span>
<span class="line"><span class="kw">function</span> bar<span class="op">();</span></span>
That doesn’t gain us anything itself, but now we have all prerequisites in place for proper CSS formatting:
Using CSS’s generated content and counter properties we can now simply build
line numbers, that fall in no (but one) way behind the ones from the
solution. Plus they have the added benefit, that they don’t get copied to the
clipboard. The display of line numbers can be controlled by classes on the
pre element, e. g. by adding line numbers only for
.line:before pseudo-element can even be styled like any other element: We
can freely choose color, background, width and so on. Line wrapping can be
controlled with CSS, too:
overflow-x: auto; /* show scrollbars, if we’re not wrap-
ping long lines */
white-space: pre; /* the default: don’t wrap */
white-space: pre-wrap; /* wrap long lines, but keep mul-
tiple spaces and tabs intact */
pre-wrap solution works in all modern browsers and in IE 8 and
newer. The non-wrapping
solution works down to IE 6.
With the rise of CSS 3 in current browsers zebra striping becomes as simple as
/* granted, the color combination is not the best ;-) */
Highlighting a line can be achieved with class names alone.
A drawback compared to the
ol solution is, that clicking a line number
doesn’t automatically select the line, but this can also be re-built in
All in all the simple
pre element, together with
spans for lines and a
little bit CSS fairy dust serve great for marking up syntax highlighted text in
a meaningful way. And they do so in every browser on this side of IE8 while
degrading gracefully in older ones.
Update: The other minute I read Adam Prescott’s
article, one month
old, on the same topic. He concentrates there on explaining the possibilities
and limits of using as little markup as possible. I suggest the article, since
the information given there completes and rounds up the “use only