matt ryall’s weblog

Death defying feats since 2002.

Site

Portrait of Matt Ryall

 

About me

Feed icon Articles feed

Feed icon Comments feed

Archive

Photography

Europe trip 2004

More photos

Software

NoteWiki

Other Pages

About Me

Uni timetable

SysProg Journal

The List

HTML 5, headings and sections

7 October 2008

Tonight there was a presentation at the Web Standards Group by Lachlan Hunt about some of the new facilities provided by new and upcoming web standards: HTML 5 and CSS 3. One point that proved interesting was his coverage of the sectioning feature of HTML 5.

Whereas HTML 4 had just six levels of headings for the entire document, the working draft for HTML 5 stipulates that each section has its own heading hierarchy. An h1 element that appears at the top level in a document is considered to “rank higher than” an h1 element found in a section or article within the document.

For example, rather than using <h1>, <h2> and <h3> elements for the headings in the sample shown below, you can use three nested <section> tags, each with its own <h1>.

Diagram showing HTML 5 markup for sections and headings
HTML 5 section and heading example

This might not seem much simpler in this basic example. In fact, to me it seems decidedly less simple. In the case where each section has its own distinct hierarchy of headings, the situation becomes even more confusing. However, I think the change makes a bit more sense if you consider it in light of two things.

First, the spec recommends keeping the heading hierarchy sane by using either <h1> tags throughout the document or keeping the headings in sync with the levels of sectioning. This latter case is similar to how you do it currently, just without the <section> tags.

Sections may contain headers of any rank, but authors are strongly encouraged to either use only h1 elements, or to use elements of the appropriate rank for the section’s nesting level.

Second, one of the main reasons why sections and other elements are allowed to contain their own heading hierarchy is to handle parts of a document included from elsewhere. There are many examples of this on the web: blogs, where a few articles appear each with its own heading structure; news sites, made up of sections each which comes from its own page with its own headings; search engines, which display excerpts from other sites.

The point of the improvement is so that these sites that include other content don’t have to do any special processing to embed an external article or section with its own heading structure. The levels are automatically adjusted by the browser to account for the fact that these headings are relevant only within one subsection of the page.

So given these considerations, is it still worth the extra complexity of allowing six heading levels in every section within a document? I’m not sure. It does add a lot of complexity. In just a few minutes at the WSG meeting, we came up with a number of significant problems:

  • Search engine optimisation, or SEO, relies on extracting the heading information from the page. Rather than simply matching <h1>...</h1>, search engines now need to follow the fairly complicated process to determine the ranking of headings within the document.

  • Styling headings with CSS, particularly providing default styles, becomes much more verbose. Rather than using h1, h2, h3 { ... }, with HTML 5 you would need to define h1, section h1, section section h1 { ... }. This would probably be in addition to the old rules, if you’re including content with nested headings.

  • Automatically determining a table of content is a lot more complex. As linked above, you need to follow a fairly tricky algorithm to determine the heading structure of a document.

  • With current DOM APIs, you can easily find all headings at a particular level in the document with document.getElementsByTagName('h2'). You’d need to use a query selector to do this with the new-style use-a-h1-for-everything structure. Without an efficient query selector, this is much trickier. Even with a query selector, it’s going to probably be a fair bit slower, which is a problem if you’re doing it often.

Given these issues, I don’t consider it beneficial to make headings relative to the section that contains them. What authors gain by not having to adapt included content on the server side so it uses appropriate heading levels, they end up potentially losing due to the increased complexity in determining the outline of a document and styling headings consistently in different sections.

Perhaps if there is some benefit other than just for included content as I’ve mentioned above, a compromise solution might be to have better HTML APIs which allow access to sections and headings in a more meaningful way than the existing DOM methods like getElementByTagName. I could imagine methods like HTMLElement.getSections() and HTMLElement.getHeadings() proving useful in addressing some of the concerns above.

 
Posted by Joe Clark at 2008-10-08 23:19:00
A more rational approach would be the one used in tagged PDF, where H1 through H6 exist but so does H, which can only be used inside a certain set of block-level elements. H is effectively auto-numbered when used.
 
Posted by Matt Ryall at 2008-10-09 06:35:29
Joe, I agree. Including a designated heading element whose rank is purely determined by its level of nesting in sections makes a lot of sense. According to the current draft, one has to scatter H1s through your document to achieve this, which seems very different to the situation in HTML 4.

If we did introduce the H element, the spec could keep the recommendation that where headings with numbers are used they should reflect the level of nesting, and drop the bit about using H1s throughout the document.
 
Posted by SSSSS at 2008-10-10 09:48:37
I can’t see what I’m typing, it’s black on black. What is the background colour on this element? Operating system default…. Foreground colour? Black…
 
Posted by Histrionic at 2008-10-10 21:05:42
Although I’m entirely behind the idea of making it easier to include chunks of material together — that’s almost a panacea for technical writing on the Web — multiple heading levels in multiple sections just makes my head hurt. I have a hard enough time explaining includes to people who need to help with documenting things on the Web; I think this would just make most people’s heads explode.

I really like Joe’s suggestion of an H element. It would be similar to how LI is handled in traditional HTML — effectively autonumbered when you place it within an OL block.

I think I can envision situations where you might want to specify an Hx element for a page, but if you’re including that same content in another page, have it automatically changed to an H element. Hm, going to have to think about this some more …
 
Posted by Matt Ryall at 2008-10-11 09:27:32
@SSSSS, thanks for pointing that out. It should be fixed now.
 

Add a comment

All fields are required.

Title:
Name:
Link: (email or http)
Detail:
For verification, please enter your name exactly as you entered it above.
Verification: