Highlighted syntax of source code on websites is a common sight these days. Tools like GeSHi, Pygments and SyntaxHighlighter make it easy to embed this functionality on both server and client side.
However, the generated markup differs highly with regard to semantic correctness and user experience. In this article I compare different approaches and present a setup, that suits both the semantics and the daily use for syntax highlighting.
State of the Art ¶
The venerable pre
element serves as starting point. It is used since the
olden days for putting source code samples on the web (while the original xmp
and listing
elements never gained much respect). In the simplest case the
content is stuffed inside a pre
, perhaps marked up with simple emphasis on
some keywords.
When automatic syntax highlighters come into play, they offer many useful features for showing off source code:
- Obviously, highlighting keywords and special operators with different text styles
- Semantic markup, i.e., using appropriate elements for a task
- Simple copy’n’pasting of the code sample
- Highlighting single lines and zebra striping of lines
- Line numbers
- Configurable line wrapping or scrolling
I’ll compare different markup approaches on how they cope with these feature requests.
Simple pre
Element ¶
The pre
element with span
s for colorizing does well for the first four
categories, but it utterly fails for the last two. Line numbers, if displayed,
will land in copied text, rendering it unusable without post-processing. They
would also be part of the content of the pre
, which is not correct. Wrapping
is something, that is completely off-topic for plain pre
elements. They will
always scroll, as long as they are not forced to other behaviour via CSS.
The huge advantages are the otherwise correct semantics and the ease of use.
<pre>
<span class="kw">function</span> foo<span class="op">();</span>
</pre>
Tables ¶
Another approach is using table
elements for highlighted code. The line
numbers are put in the first cell of a row, while the code moves to the second
cell, either plain or fenced by a pre
or code
element.
This approach is brilliant at handling line issues. Zebra striping comes naturally, and highlighting lines can be done by background color or by a dedicated column.
However, copying the code suffers from the same issue as the plain pre
solution, and the semantics are debatable: Has source code really a tabular
nature? Are the line numbers part of the information that the source is
carrying?
The copying problem can be circumvented with the help of CSS: When the first
cells are hidden with display: none
, they won’t show up in the text from the
clipboard. The toggling of line numbers can be achieved using a tiny bit of
Javascript.
If the lines should wrap, there has to be taken care by the algorithm, that they still only occupy a single cell in a row, as otherwise the line numbering will get out of order. This is especially a problem in setups, where all line numbers are in one cell, while all the code is in a single cell next to it.
<table>
<tr>
<td>1</td>
<td><span class="kw">function</span> foo<span class="op">();</span></td>
</tr>
</table>
A variant of the table approach uses floating div
s to achieve the same
effect. This is nothing more than a symptom of Divitis, but doesn’t add any
new insight.
Using an Ordered List ¶
Focusing on the line numbers again, another HTML element comes to mind, that is
most natural for handling ordinal data sets: ol
. In this version, every line
of code is enclosed in a li
element, that marks single lines.
This solution is elegant for several reasons. The most important is, that the line numbers need not be handled in the markup themselves. They are automatically generated by the browser.
The markup is semantic in the way, that line-oriented content is put in an element, where sorting is important.
Zebra striping and highlighting are equally trivial as in the table
case, and
line wrapping is done automatically, together with correct adaption of line
numbers. A really nice feature is, that clicking the line number automatically
highlights the whole corresponding line.
What prohibits this solution, is the problem of copy’n’pasting code, again.
Even if the li
s receive a list-style: none
via CSS, the browser still adds
automatic line numbers to copied text.
Also, the line numbers themselves cannot be styled independently from the rest
of the content. They will always take on the text color of the ol
and no
custom background color.
<ol>
<li><span class="kw">function</span> foo<span class="op">();</span></li>
</ol>
General Problems when Skipping pre
¶
The div
, table
and ol
solutions all exist in variants with and without
embedded pre
element. If a highlighter chooses to skip using it, even if
using the code
element in place, there are several issues arising
immediately.
- White space will suffer from the usual collapsing if not replaced by nbsps or changed with CSS
- Automatic HTML compression (stripping unnecessary whitespace) will likely destroy semantics in the code (think Python)
- Bots, screen readers, older browsers might display the code wrong or partial
- And forgetting to deliver an appropriate print stylesheet will finally end in the same effect for all “normal” users
pre |
table |
ol |
|
---|---|---|---|
Semantic | ✓ | ✗ | ✓ |
Line wrapping | ✗ | ✓ | ✓ |
Line numbers | ✗ | ✓ | ✓ |
Copy’n’paste | ✓ | ✗ | ✗ |
Degrading | ✓ | ✗ | ✗ |
Reviving the pre
Element with CSS ¶
The promised new approach to marking up syntax highlighting is in fact a simple
extension to the old pre
technique. We only add a single new span
element
to discriminate lines:
<pre>
<span class="line"><span class="kw">function</span> foo<span class="op">();</span></span>
<span class="line"><span class="kw">function</span> bar<span class="op">();</span></span>
</pre>
That doesn’t gain us anything itself, but now we have all prerequisites in place for proper CSS formatting:
pre {
counter-reset: code;
padding-left: 30px;
}
.line {
display: block;
counter-increment: code;
}
.line:before {
content: counter(code);
float: left;
margin-left: -30px;
width: 25px;
text-align: right;
}
Using CSS’s generated content and counter properties we can now simply build
line numbers, that fall in no (but one) way behind the ones from the ol
solution. Plus they have the added benefit, that they don’t get copied to the
clipboard. The display of line numbers can be controlled by classes on the
pre
element, e. g. by adding line numbers only for pre.with_numbers .line:before
.
The .line:before
pseudo-element can even be styled like any other element: We
can freely choose color, background, width and so on. Line wrapping can be
controlled with CSS, too:
pre {
overflow-x: auto; /* show scrollbars, if we’re not wrap-
ping long lines */
}
.line {
white-space: pre; /* the default: don’t wrap */
white-space: pre-wrap; /* wrap long lines, but keep mul-
tiple spaces and tabs intact */
}
The pre-wrap
solution works in all modern browsers and in IE 8 and
newer. The non-wrapping
solution works down to IE 6.
The approach is both semantic and usable and works in all recent browsers and IE from v8 and up. For IE 7 and below a simple Javascript solution is imaginable, that adds the line numbers dynamically. This will however have impact on the clipboard.
With the rise of CSS 3 in current browsers zebra striping becomes as simple as
.line:nth-child(2n) {
background: green;
}
.line:nth-child(2n+1) {
background: red;
}
/* granted, the color combination is not the best ;-) */
Highlighting a line can be achieved with class names alone.
.highlighted.line {
background: yellow;
}
A drawback compared to the ol
solution is, that clicking a line number
doesn’t automatically select the line, but this can also be re-built in
Javascript, if the feature seems necessary.
All in all the simple pre
element, together with span
s for lines and a
little bit CSS fairy dust serve great for marking up syntax highlighted text in
a meaningful way. And they do so in every browser on this side of IE8 while
degrading gracefully in older ones.
Update: The other minute I read Adam Prescott’s
article, one month
old, on the same topic. He concentrates there on explaining the possibilities
and limits of using as little markup as possible. I suggest the article, since
the information given there completes and rounds up the “use only pre
”
solution.