HTML and CSS, two of our favorite acronyms, are normally associated with web pages. And deservedly so: HTML is the dominant document format on the web andCSS is used to style most HTML pages. But, are they suitable for off-screen use? CanCSS be used for serious print jobs? To find out, we decided to take the ultimate challenge: to produce the next edition of our book directly from HTML and CSS files. In this article we sketch our solution and quote from the style sheet used. Towards the end we describe the book microformat (boom!) we developed in the process.
The studious reader may want to fetch asample HTML file, sample style sheet, as well as the PDF file generated by Prince. The PDF file is similar to the one we sent to the printer. We encourage you to base your own book on the sample file and tell us how it goes.
Print vs. pixel
A printed book has many features not seen on screens. There are page numbers, headers and footers, a table of contents, and an index. The content must be split into pages of fixed size, and cross-references within the book (for example, “see definition on page 35”) must be resolved. Finally, the content must be converted to PDF, which is sent to the printer.
Web browsers are good at dealing with pixels on a screen, but not very good at printing. To print a full book we turned to Prince, a dedicated batch processor which converts XML to PDFby way of CSS. Prince supports the print-specific features of CSS2, as well as functionality proposed for CSS3.
CSS2
CSS2 has a notion of paged media (think sheets of paper), as opposed to continuous media(think scrollbars). Style sheets can set the size of pages and their margins. Page templates can be given names and elements can state which named page they want to be printed on. Also, elements in the source document can force page breaks. Here is a snippet from the style sheet we used:
@page { size: 7in 9.25in; margin: 27mm 16mm 27mm 16mm; }
Having a US-based publisher, we were given the page size in inches. We, being Europeans, continued with metric measurements. CSS accepts both.
After setting the up the page size and margin, we needed to make sure there are page breaks in the right places. The following excerpt shows how page breaks are generated after chapters and appendices:
div.chapter, div.appendix { page-break-after: always; }
Also, we used CSS2 to declare named pages:
div.titlepage { page: blank; }
That is, the title page is to be printed on pages with the name “blank.” CSS2 described the concept of named pages, but their value only becomes apparent when headers and footers are available. For this we have to turn to CSS3.
CSS3
The CSS Working Group has published a CSS3 Module for Paged Media. It describes additional functionality required for printing. We will start by looking at running headers and footers.
HEADERS AND FOOTERS
Here is an example:
@page :left { @top-left { content: "Cascading Style Sheets"; } }
The example above puts a string (“Cascading Style Sheets”) in the top left corner of all left-hand side pages of the book. All pages? Not quite. A subsequent rule removes the header from pages named “blank”:
@page blank :left { @top-left { content: normal; } }
Recall from earlier that all <div class="titlepage"></div>
elements are to be printed on “blank” pages. Given the style sheet above, “blank” left-hand side pages will be printed without a header.
STEALING STRINGS
Our book consists of many chapters and the title of each chapter is displayed in a header on right-hand side pages. To achieve this, the title string must be copied from an element with the string-set
property:
h1 { string-set: header content(); }
Just like there were named pages in the previous section, CSS3 also has named strings. In the example above, the string named “header” is assigned the chapter headings. Each time a chapter heading is encountered, the chapter title is copied into this string. The string can be referred to in other parts of the style sheet:
@page :right { @top-right { content: string(header, first); } }
In the example above, the right-hand side header is set to be the value of the “header” string. The keyword “first” indcates that we want the first value of “header” in case there are several assignments on that page.
PAGE NUMBERS
Like headers, page numbers are a navigational aid in books. Setting the page numbers is easy:
@page :left { @bottom-left { content: counter(page); } }
One requirement from our publisher was to use roman numerals in the first part of the book. This part is referred to as “front-matter”. Here is the style sheet for roman page numbers in the front-matter:
@page front-matter :left { @bottom-left { content: counter(page, lower-roman); } }
The numbering systems are the same as for the list-style-type
property and lower-roman
is one of them. The counter
called “page” is predefined in CSS.
CROSS-REFERENCES
The web is a huge collection of cross-references: all hyperlinks are cross-references. Cross-references in books are similar in nature, but presented differently. Instead of the blue underlined text we know from our screens, books contain text such as “see the figure on page 35.” The number “35” is unknown to the authors of the book—one can only find the page number by formatting the content. Therefore, the number “35” cannot be typed into the manuscript but must be inserted by the formatter. To do so, the formatter needs a pointer to the figure. In HTML, this is done with an A
element:
<a class="pageref" href="#figure">see the figure</a>
The corresponding style sheet looks like this:
a.pageref::after { content: " on page " target-counter(attr(href), page) }
The example above needs some explanation. The selector refers to a generated pseudo-element (::after
) which comes after the content of the A
element. The first part of that pseudo-element is the string ” on page ”. After that comes the most interesting part, thetarget-counter
function which fetches the value of the “page” counter at the location pointed to by the “href” attribute. The result is a that the string ” on page ” is concatenated with the number “35”.
TABLE OF CONTENTS
Similar magic is invoked to generate a table of contents (TOC). Given a bunch of hyperlinks pointing to chapters, sections and other TOC entries, the style sheet describes how to present the hyperlinks as TOC. Here is a sample TOC entry:
<ul class="toc"> <li><a href="#intro">Introduction</a> <li><a href="#html"><abbr title="HyperText Markup Language"> HTML</abbr></a> </ul>
The style sheet for the TOC uses the same target-counter
to fetch a page number:
ul.toc a::after { content: leader('.') target-counter(attr(href), page); }
Also, a new function, leader
, is used to generate “leaders.” In typography, a “leader” is a line that guides the eye from the textual entry to the page number. In our example, a set of dots is added between the text and the page number:
Introduction....................1 HTML............................3
Note that the this functionality is experimental; no Working Draft for leaders has been published yet.
The book microformat—boom!
As you probably have guessed by now, we succeeded in producing our book using HTML andCSS. In doing so, we also developed a set of conventions for marking up a book in HTML.HTML has the wonderful class
attribute which lets anyone extend the semantics of HTMLdocuments while building on HTML’s universally known semantics. So, in our book, we used a rich set of HTML elements and added a bunch of class
names.
Since then, the concept of “microformats” has entered the web and we are happy to discover that we actually developed (at least the beginnings of) a microformat for books. We think other authors will be able use the boom! microformat and improve upon it in the process.
SECTIONS OF A BOOK
The chapters in the first part of the book, such as preface, foreword, and table of contents, are enclosed in a DIV
with a corresponding class
name. The chapters in the main body areDIV
s with a class
of “chapter” and the appendices are DIV
s with class
“appendix.” In the style sheet, the class
names are primarily used to select the correct named page with the correct headers and footers.
Although HTML has six levels of headings (H1
, H2
, etc.) to distinguish chapter headings, section headings, and subsection heading, it is convenient to enclose sections in an element, if only to be able to style the end of a section. We used a DIV
with class
“section”.
TABLES AND FIGURES
HTML doesn’t have a dedicated element for figures with captions, but it is easy to create one by specializing a DIV
:
<div class="figure"> <p class="caption">...</p> <p class="art"><img src="..." alt="..."/></p> </div>
The TABLE
element has a CAPTION
element, but support is spotty. We, therefore, used a similar strategy for marking up tables:
<div class="table"> <p class="caption">...</p> <table class="lined"> ... </table> </div>
We used a variety of figure styles (normal, wide, on the side, etc.) and table styles (normal, wide, lined, top-floating, etc.) in our book. An element can be given several class
names, so that, say, a table can be both “lined” and “wide.” We have cut down on the number of alternatives in the sample document for the sake of simplicity.
SIDE NOTES AND SIDE BARS
A DIV
with class
“sidenote” is used for side remarks, related to the (following) text in the main body but not necessarily shown in-line. A typical way to show them is to put them in the margin.
A “sidebar” is longer than a “sidenote.” The latter is typically only one paragraph, maybe two; the former is several paragraphs or includes lists or other material. In the sample documentthere is one sidebar that floats to the top, uses the full width of the page, and is given a gray background.
Summing up
The Prince formatter has opened up the processing pipeline from HTML and CSS to PDF. It is now possible, even feasible, to use HTML as the document format for books. This makes it easier to cross-publish content on the web and in print.
That said, authors who attempt to use the techniques described in this article will face some technical issues along the way. For example, we have not discussed how to generate the TOCstructures and how to display wide tables. We have also left some room for improvement in the boom! microformat. However, compared to the headaches of actually writing a book, formatting is now a joy!
Source