oXygen XML Editor

Docbook Markup

1. Accessible docbook
2. Cross Referencing using xref
3. How do I make a cross reference to another part of my document?
4. How do I link to someone else's website?
5. Can I do conditional text in DocBook?
6. How do I do conditional titles when only one title is allowed in DocBook elements?
7. Is there any way to generate change bars from a DocBook document?
8. Changing language in a document, how to markup
9. xref to figure, or numbering figures.
10. How to include C source code in docbook
11. How to mark up emacs keyboard codes
12. Mathml in docbook
13. remap DocBook elements to other element
14. Markup for daemon name
15. Add a title inside a list item
16. Setting column widths
17. How to mark up a translator of a document
18. How to cross reference to tables and images
19. Markup for synonyms in glossary
20. Markup for notes
21. Marking up protocols
22. user name and groupname markup
23. Whitespace problem in indexterm
24. Default encoding
25. HTML to docbook
26. MathML and docbook
27. Inserting external code into docbook
28. Lists and white space
29. Executive Summary
30. Docbook 4.2 Image semantics
31. Image markup
32. biblioentry and biblioentry and bibliomixed
33. Table problems in fo output
34. website problems, setting directories
35. Resume
36. Content re-use in docbook
37. Text direction and language
38. Produce a back cover
39. Table markup
40. Link to biblioentry
41. Reference to biblioentry
42. How to set the language?
43. Including parts of a book using entities
44. Index markup
45. Book Title, why are there two?
46. How to use callouts
47. beginpage element
48. Nesting Sections in Simplified docbook
49. Image size problems
50. Including parts of my docbook
51. alt text or d link on images?
52. Indexing
53. Why simpara tag?
54. Are there any good uses for pi's
55. Image file extension selection
56. How to not number figures
57. abbrev tag needed in output
58. table titles
59. Create an index
60. What is simpara for?
61. Reference to a glossary entry
62. Reference a bibliography
63. Equations
64. Repeated reference to screenshot
65. How to set the language
66. Entity problems
67. Entity sets
68. Shared entity definitions
69. Entity reference to include external code listings
70. Adding another mediatype
71. Confusing entities
72. markup for generated content
73. How to mark up a persons middle initial?
74. Multiple index terms
75. Modular docbook books?
76. Including XML in programlisting tags
77. Index markup


Accessible docbook

Bob Stayton

> What I was really
> looking for is for the HTML (rather than the XHTML), what
> things to meet accessibility requirements do Norm's
> stylesheets already provide? 

OK, the short answer: The DocBook XSL stylesheets do a pretty good job of making it possible to create accessible HTML. The main features are:

- Use of class attributes in <div> and <span> element to permit control of formatting with CSS stylesheets.
- A parameter to turn off inline 'style' attributes (css.decoration).
- Parameters to convert @role attributes on emphasis, para, and phrase into class attributes that can be styled with CSS.
- Parameters to turn off the use of tables for page layout, used for some formatting features (variablelist.as.table, segmentedlist.as.table, callout.list.table).
- Support for alt tags in images by adding <textobject><phrase>blah</></> in each mediaobject.
- Parameter to turn on long descriptions for images.

There are still some hard coded HTML formats that are not subject to control with CSS, but those are falling in number over time.


Cross Referencing using xref

Bob Stayton

> According to Norm Walsh's TDG, a reasonable rendering for an unadorned xref
> should include a referenced chapter's title.  In other words, if my second
> chapter is tagged:

>      <chapter id="stuff">
>           <title>Stuff and Nonsense</title>

> then an xref like so:

>      <xref linkend="ch2"/>

> should be rendered:

>      Chapter 2, "Stuff and Nonsense"

> I have not found this to be the case with either the 1.59 DSSSL or the 1.41
> XSL. All I get is

>      Chapter 2

> I found Jirka's archived explanation of how to add the title in with the
> DSSSLs (which worked, incidentally). Does anyone know how to accomplish the
> same via the XSLs?

The XSL stylesheets use another kind of template system to form xref text and other generated text strings (not to be confused with XSL apply-templates). It uses templates of text strings, sort of like the strings used in printf statements, where some of the text is fixed and some is variable to be substituted at runtime.

The string templates are localized, and may be highly customized. In your 1.41 distribution, you should find common/en.xml. That contains all the generated text strings for processing with the default lang="en".

Search in en.xml for <context name="xref">. That element contains <template> elements for each kind of element that can have automatically generated xref text. The default one for chapter says:

<template name="chapter" text="Chapter %n"/>

The %n represents the item number (chapter number in this case). You can use %t to represent the title. So you could change it to:

<template name="chapter" text='Most Worthy Chapter %n, "%t"'/>

Note that I changed the enclosing attribute quote characters to permit using "" around the title.

BTW, don't be put off by the message at the top of the file "Do not edit this file by hand!". That only applies if you are building the template files from the CVS source. Since the stylesheet distro doesn't come with the source or Makefile, you can ignore that comment and customize it to suit your needs.

However, you can also do such customization without touching the original distro files. It takes a few more steps:

1. In 'common' directory, copy en.xml to a new name like custom-en.xml and make your changes there.

2. Also in common, copy l10n.xml to a new name like custom-l10n.xml. Edit custom-l10n.xml to change

<!ENTITY en SYSTEM "en.xml">
<!ENTITY en SYSTEM "custom-en.xml"> 

so that it references your customized file.

3. In your customization layer for your stylesheet, add another parameter:

<xsl:param name="l10n.xml"

This parameter sets the parameter named "l10n.xml" (yes, it sure looks like a filename) to your custom filename, which pulls in your custom english template file. This parameter should probably be listed in param.xsl, but it isn't.

Of course, if you were hoping for just a simple parameter that would turn on chapter titles, well, it isn't there. The template system gives you a great deal of control, but it is a bit more complex.

I'm updating the doc on XSL customization to include stuff like this.


How do I make a cross reference to another part of my document?

You make a cross reference by identifying where you want to go, and then making a link to it. To identify where you want to go, you add an id attribute to the element you are targeting. An id attribute is a simple text string without spaces that uniquely identifies that element in your document. Each id attribute has to be unique within your document, or your document won't validate. For example, if you want to cross reference to a table of sales figures, you add the id attribute to the table start tag like this:

<table id="MyJulySales">

To form a link to that table, you have two DocBook elements to choose from. In both cases you add a linkend attribute whose value is the id of the element you are pointing to. The two elements are xref and link.


This element generates the cross reference text for you, so you enter it as an empty element like this: <xref linkend="MyJulySales"/>. When this element is formatted, the stylesheet looks up the target element and forms the cross reference text, such as "Table 3: July Sales". You might also want to see the FAQ answer on altering the generated cross reference text.


This element doesn't generate any text, and so you have to supply the cross reference text as the content of the element. For example: <link linkend="MyJulySales"/>july sales figures</link>.

Why not just use xref every time since it generates the text for you? Using xref minimizes maintenance, since the generated text gets updated every time the document is processed. But there are times when it is not appropriate. For example, sometimes you don't want the formal title as the cross reference text. Also, there are many valid target elements that don't have readily generated titles, such as a paragraph or a blockquote.

If you


How do I link to someone else's website?

If you need to link to someone else's web site, then use ulink instead of xref or link. The "U" stands for URL, which is the web address you put in its url attribute. For example:

For more information visit <ulink url="http://www.acme.com">the Acme website</ulink>.

If you omit the text inside the element, then the url string is also used as the hot spot text. You can also pop the link up into a new window if you are processing to HTML. To get a new window, add to your ulink a ulink.target attribute, whose value is the name of the new window.


Can I do conditional text in DocBook?

Conditional text means you have some text in your document that you only want to output when certain conditions are met. For example, maybe you need to write documentation that covers an application running on two different operating systems. Rather than write two separate documents that are mostly the same, you can write one document and mark as conditional the text elements that are specific to each operating system. This is typically done by assigning different values to attributes such as role, os, userlevel, or arch. So you might write two similar para elements, one with attribute os="linux" and the other with os="windows". Such a document is said to contain profiles, each profile targeted for a different audience.

Once you have your document marked up this way, you can generate different versions by using the profiling stylesheet profile.xsl written by Jirka Kosek. That stylesheet accepts parameters set from the command line that establish the conditions to select text. It generates a complete version of your document that meets the conditions. All the marked text that doesn't meet the conditions is stripped out from the generated version. Then you process the profiled version using the standard DocBook stylesheets.

The profiling stylesheet is included in the DocBook XSL distribution in tools/profile.xsl. Its doc is also included in the distribution in doc/tools/profiling.html, or on the web at http://docbook.sourceforge.net/projects/xsl/doc/tools/profiling.html.


How do I do conditional titles when only one title is allowed in DocBook elements?

If you are doing conditional text, you may run into the need to have conditional titles as well. But DocBook elements such as chapter and section only permit one title per element. How can you have two different titles on the same element for different profiles?

The solution is to keep one title element and put two phrase elements in it. Put the conditional attributes on the different phrase elements. For example:

<phrase os="linux">Creating symbolic links</phrase>
<phrase os="windows">Creating shortcuts</phrase>

The phrase element is also useful for marking conditional text that is smaller than an element, such as a single sentence in a paragraph.


Is there any way to generate change bars from a DocBook document?

Norm Walsh.

Yes, it can be done either by hand or semi-automatically.

You can use the revisionflag attribute to track changes. If you have two versions of the document, you can use diffmk to automatically add the revisionflags.

Then process the document with changebars.xsl and you'll get something like change bars, see http://www.w3.org/TR/2000/REC-xml-20001006-review.html.

P.S. I have a java version of diffmk that is in some ways better than the perl version. I'm working on getting it released.


Changing language in a document, how to markup

Norm Walsh

>I'm writting a document in spanish, but I need to include some english
> words to explain some things (like RISC or byte). Which tag can I use to
> quote this, so it apperes in italic, preferably? 

<foreignphrase>RISC<foreignphrase> seems like a good choice.


xref to figure, or numbering figures.

Norm Walsh

> I do this in a document:
> <mediaobject id="image_segments">
>   [etc]
> </mediaobject>

> And then, from inside the text, I do: <xref linkend="image_segments"/>,
> but it does not work.

> I would like Docbook to number figures with numbers, and allow me to
> make references to them from inside the text, so it apperes like: See
> Figure 2.1. (for example)

Then you want:

  <figure id="image_segments">
    <title>Some Title</title>


How to include C source code in docbook

Mark B. Wroth

An approach is the "literate programming" one of having the DocBook source write the C source when processed. This approach interests me enough that I wrote the code to do it -- it's available at http://www.west-point.org/users/usma1978/36200/LitProg/SGMLWEB/index.htm.

Rafael R. Sevilla adds

I have another approach that is a little more general:


How to mark up emacs keyboard codes

Norm Walsh

For the C-h C-f example, try this:

    <keycombo action="seq">
      <keycombo action="simul">
      <keycombo action="simul">


Mathml in docbook

Jirka Kosek

There is MathML customization for DocBook (see oasis ). With it you can use MathML inside of equation and inlineequation. However typing MathML without any supporting tools is very sluggish.

Peter Ring adds

The 'free' comment as meant for something else, the WebEQ Math Viewer , the TeXaide equation editor and the IceSoft browser.

But now that we are at it, do you know if there are any other good free (GPL, BSD, whatever) tools for authoring and rendering MathML besides Amaya, PassiveTeX and a few other?

There's a somewhat lacking page at W3C, and a list with links at the same place

Mozilla will also be able to display (some) MathML; there's a project developing MathML support in Gecko (the rendering engine) that will sertainly 'spill over' to other projects. See Mozilla. To check if you current incarnation of Mozilla is built with MathML, try texvsmml.xml. You may also try with Amaya, but you'll get disappointed, because the page doesn't validate.

There's an interesting MathML+Mozilla site at pear.math.pitt.edu.


remap DocBook elements to other element

Michael Smith

Nik Clayton mentioned a way to remap DocBook elements to other element names -- that is, to take something like this:

  <helpproject status="draft" remap="article">
    <topic revisionflag="changed" remap="section">

and turn it into this:

  <article status="draft">
    <section revisionflag="changed">

Here's a simple stylesheet that I think will do it.

  <?xml version='1.0'?>
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <!-- ********************************************************
       Use the "attributename" parameter to specify the attribute
       that holds the names of the elements you want to rename to.    

    <xsl:param name="attributename">remap</xsl:param>

  <!-- ********************************************************
       Look for any elements with an attribute named $attributename and
       replace the name of the element with the value of that attribute

    <xsl:template match="node()|@*">

        <xsl:when test="@*[local-name()=$attributename]">
          <xsl:element name="{@*[local-name()=$attributename]}">

            <xsl:apply-templates select="@*|node()"/>


To make the best use of it, you'd also want to do some steps to add support for it in your DTD customization layer:

1. Decide on an attribute to use throughout the customization layer for storing the element names you want to convert/rename to. Off the shelf, the stylesheet defaults to looking for "remap".
2. Add the attribute name to the ATTLIST for each new element in your customization layer, as an NMTOKEN with a #FIXED value.

If you're creating a DocBook customization layer, you'd end up with an ATTLIST that looked something like this:

    <!ATTLIST  helpproject
                       remap    NMTOKEN      #FIXED     "article"
                       appid    CDATA        #IMPLIED

If you do that, you don't have to include "remap" attributes in your document instances. But your XSLT engine will need access to your customization layer DTD; you won't be able to correctly process a document if it doesn't include a DOCTYPE declaration that references your customization-layer DTD where the #FIXED attributes are declared.

This might not seem like a proper use of "remap", but I think Eve Maler and Jeanne El Andaloussi's "Developing SGML DTDs: From Text to Model to Markup" provides some support for using it that way.

They discuss a "remap" attribute in that book (Conversion Markup section (8.4.2) of the Markup Model Design and Implementation chapter) and outline a couple different things it might be used for, including "transforming SGML documents to conform to a different DTD (or to the same DTD but with different or augmented contents)."

Bob Stayton adds

Clearly the remap attribute is intended to capture the former element name for a transformed element so that something special can be done with it. Your concern about proper use of remap comes from the phrase "previous markup scheme" in the description of the attribute in the Definitive Guide. You are using remap essentially for the "next markup scheme" instead of previous, right?

Well, consider modifying your transformation to make it round trip. Change your stylesheet to add a remap attribute to your generated Docbook elements to indicate which help element they came from. Then you can write the reverse stylesheet that transforms a Docbook document back to the help customization. It reads the remap value of a Docbook element to decide which help element to create, and adds a remap value to indicate which Docbook element it came from. That makes the Docbook element name "previous". Even if you don't actually perform the reverse transformation, you have defined it and therefore can use remap as you have described, in a virtual sense. The documents are isomorphic, remapping to each other.

Anyway, I think it would be useful for your transformation to capture which help element it came from in case some special handling is desired. Who knows, there may be cases where you need to do the reverse transformation. Consider the initial conversion of an existing DocBook document to a help document, for example.


Markup for daemon name

Norm Walsh

| Is there something more specific than <literal> for marking up names of
| daemon names?

<systemitem role="daemon"> or <command role="daemon"> depending on the
context ?


Add a title inside a list item

Norm Walsh

| I  want to add a  title  inside <listitem>,but  there is  no such child
| element  inside listitem.So, how  can  i  proceed for  it?

Use <formalpara> inside the list item.


Setting column widths

Norm Walsh

| How can I set the first column to 20 % of the page width,
| second column 30 % of hte page width, and
| third column the rest?

By using the colwidth attribute, for example

<colspec colnum="1" colwidth="2*"/>
<colspec colnum="1" colwidth="3*"/>
<colspec colnum="1" colwidth="5*"/>

See the CALS table spec for a complete description of relative width specs in CALS tables.


How to mark up a translator of a document

Mike Smith

> I wanted to know is there markup for a <translator> element in
> authorgroup. Since there is even the <editor> tag and since many many
> documents are usually translated in several languages etc.

I think this is the kind of thing <othercredit> ("A person or entity, other than an author or editor, credited in a document") is intended for. You can put a role attribute on <othercredit> to qualify it: <othercredit role="translator">.


How to cross reference to tables and images

Rory H

> btw. the next problems are already arising, as I also need to figure out 
> references to tables, to images, glossary, ...

You do this with <xref>s. An example is easiest;

Name the element you want to reference first, i.e.

<figure id="my_figure">

then anywhere you want to refer to it, use <xref linkend="my_figure" />, which gets converted to "Figure x", where x is the figure number. With other elements, the appropriate noun is inserted in place of "Figure".

Incidentally, you probably want to use element id's that describe what they label, i.e. <figure id="fig-image_description">, <section id="section-section_about_topic_x">. It's not compulsory in the slightest, but I'd argue that it makes the xref's more legible when you're writing the document.


Markup for synonyms in glossary

Norm Walsh

| what is the correct markup for
| synonyms (another word
| used for the GlossTerm)?

Example below

<glossentry id="loop.infinite">
<glossterm>Loop, Infinite</glossterm>
<para>A loop that never terminates.</para>
<glossseealso>Infinite Loop</glossseealso>

<glossentry id="infinite.loop">
<glossterm>Infinite Loop</glossterm>
<glosssee>Loop, Infinite</glosssee>


Markup for notes

Peter Eisentraut

>I'm working on a publication using docbook.  We have a lot of additional
>notes that I'd like to reference from within the book, but am unsure as
>to what the best way to mark them up is.

>Here's a fragment from the book to show what I mean:

    <para> 1. The paths shown here are circular and simply connected.
    What happens if the paths are not circular? In note 144 I deal with
    the non circular paths using line integrals and the Residue theorem.
    However I can leave my paths circular without loosing generality
    for several reasons. </para>

>What's the best way to mark up note 144?

If the note is shown elsewhere in the document, like a Figure, Example, or Theorem, I would use xref.

If the note is published externally then a bibliography would be appropriate.


Marking up protocols

Matt Gruenke

> > How are protocol names (such as "HTTPS") supposed to be
> > marked up?
>Don't know what others might use but my preference is to use
>         <token>HTTPS</token>

I definitely wouldn't use 'token'. A token is supposed to be a logically atomic unit of information. Examples of this are reserved words, operators, variable names, and '{' braces, in the C programming language. By "atomic", I mean that a token is logically indivisible, in a given context (e.g. you can brake the token 'return' into smaller groups of letters, but if you're trying to parse a C program, it isn't meaningful to do so).

I would use 'systemitem'. Or, failing that, 'phrase'. In either case, the 'role' attribute should probably be used. For example:

<systemitem role="protocol">HTTPS</systemitem>
<phrase role="protocol">HTTPS</phrase>

Now, perhaps you're thinking "gee, this is pretty darn verbose". If so, don't feel ashamed of yourself, however. Verbosity has disadvantages: the more work you make it for someone to say/do something, the less likely it is that they will. This leads to inconsistent markup. Furthermore, the 'role' attribute is an open-ended hook. Unless you customize your stylesheets to allow only certain values (but if you're doing this, why not just add a protocol element?), or check for non-sanctioned values in your processing application (i.e. stylesheets), there exists the possibility that someone may misspell "protocol", which will cause semantic loss of that instance.

Another issue to consider is that perhaps a 'protocol' element one day gets added to the DocBook vocabulary, or you use a customization layer with it. It seems troublesome to go back and change all the documents that were previously written, to maximize the advantages of having this element (which may prevent you from ever getting around to adding it).

"What's to be done about all this?", you may ask. I have a solution that I devised in a markup language with much more ad hoc semantics (dare I say it...? LaTeX): use indirection. Indirection allows you to centralize the translation, which enhances manageability and consistency. Specifically, I recommend that you define a general entity, and reference it for every case in which you want to mention HTTPS:

    <!ENTITY Https "
        <systemitem role="protocol">HTTPS</systemitem>

Then, in your document, you can use it as:

At the expense of host-side stream encryption and per-transaction key generation, as well as client-side decryption (requiring a compatible browser), &Https; enables secure web transactions to be performed over insecure networks.

Now, not only is it easy to type, but if you make a spelling error (aside from leaving off the "s", perhaps), the parser will flag it as an error. Furthermore, you've centralized this definition, so that if the name of the standard, or the tags you want to use, ever change, you only have to make an edit in one place. Putting this definition in an external parameter entity can allow you to share this definition between multiple documents.

I usually group multiple such entity definitions, by subject matter, into a external parameter entities. That way, I benefit from reuse, centralized management, and consistency across multiple documents and authors.

One thing to watch out for is naming conflicts (especially since the first definition of an entity will silently override any subsequent ones). Therefore, I break down the problem by prefixing each entity name with a prefix that is common to the file in which it's defined. Each file then has a unique prefix. (You may recognize this technique from C or other programming languages that have no formal mechanism for precluding namespace collisions.)


user name and groupname markup

Norm Walsh

| I need to markup a username or group name in my document. Currently I've
| got them taggerd with token, which I now realise is incorrect. Some of the
| other things I have as token are either systemitem or database but neither
| of those seem to be appropriate for username or group name. (These are not
| UNIX or operating system names, by the way.)

The next version of DocBook will support 'username' and 'groupname' as explicit class values for systemitem. I suggest that you use role in the short term.


Whitespace problem in indexterm

Norm Walsh

| Processing expectations for <indexterm> include:
|     IndexTerms are suppressed in the primary text flow, although they
|     contribute to the population of an index and serve as anchors for
|     cross references. Under no circumstances is the actual content of
|     IndexTerm rendered in the primary flow.
| The DVI/PostScript file 
| has one erroneous space for each indexterm in the source.  Which tool
| (or me?) is responsible for these erroneous spaces and how do we fix
| it?

Consider your source:

<para>Why does adding indexterms cause spaces to appear here:
     <primary>data-parallel operation</primary>
     <primary>data-parallel operation</primary>

ending here?</para>

For clarity, let's replace spaces by '+' characters:



Now, unless you've taken special care, multiple adjacent spaces are generally treated as a single space, so we can reduce this to:



Now, remove the index terms and what's left?


Those spaces are not "adjacent" unfortunately, so each one is produced in the output. That's the source of the extra spaces.

Unfortunately, the only way to avoid this problem is either to put the indexterms between paragraphs (which is logically wrong) or to make sure that you don't introduce extra spaces with your index terms:

<para>Why does adding indexterms cause spaces to appear here:<indexterm>
     <primary>data-parallel operation</primary>
     <primary>data-parallel operation</primary>
    </indexterm> ending here?</para>


Default encoding

Tony Graham

 > The parser obviously is not aware that you have chosen ISO 8859-1.  That is 
 > the expected error message if an 8859-1 document contains any high bytes 
 > (128+) and the parser is trying to parse it as UTF-8.
 > 1) Do all of your entities (i.e., files) have encoding declarations?  What 
 > are they?  Remember that UTF-8 is the default unless you explicitly specify 
 > a different encoding (or use a byte-order mark, in which case UTF-16 is the 
 > default).

Strictly speaking, it's "or use UTF-16 with a byte-order mark", since you can have a byte-order mark with UTF-8.

UTF-16 without a byte-order mark (BOM) can be mistaken for a number of other encodings, hence you need the BOM if you're omitting the encoding declaration. Both UTF-16 without the BOM and the 'number of other encodings' all need to have the encoding declaration so the XML processor can determine the encoding. UTF-16 with both the BOM and an encoding declaration is okay, too.

8-bit text without an encoding declaration is expected to be UTF-8. Hence, if the text isn't UTF-8, you need the encoding declaration. UTF-8 text with the BOM (EF BB BF) and without an encoding declaration should be recognised as UTF-8. However, using the BOM with UTF-8 wasn't mentioned in the Unicode Standard, Version 2.0 (which was current when XML 1.0 was published), so some early XML processors weren't designed to recognise the UTF-8 BOM. The UTF-8 BOM was not mentioned in Appendex F of XML 1.0, but is mentioned in Appendix F of XML 1.0 Second Edition (and was mentioned in the version of ISO/IEC 10646 current when XML 1.0 was published).


HTML to docbook

Jeff Iezzi

Command Prompt makes a rather good and quite cheap product. It is also quite easy to use. commandprompt.com


MathML and docbook

Jirka Kosek

> Is there a MathML way of correctly inserting a mathematical constant 
> into DocBook documents?
> I believe MathML is at minimum a switch that can be turned on with the db
> distribution? Is correct?

If you need MathML you can use DocBook with MathML module: on the oasis site


Inserting external code into docbook

Bob Stayton

You can use following construct to include external code.

You only need the inlinemediaobject wrapper if you are using 4.1.2.

            <textdata fileref="yourfile.txt"/>

with the 4.2 DTD, you can use textobject directly in programlisting:

           <textdata fileref="yourfile.txt"/>

Also, the text insertion process is an extension function, and is not available in xsltproc. You can use saxon, but you must include the saxon extensions jar file in your CLASSPATH, and you must set these two parameters to nonzero: use.extensions and textinsert.extension.


Lists and white space

Bob Stayton

> I've encountered a listitem indentation problem discussed in TDG
> (html/entry.html) and the following threads:

> http://sources.redhat.com/ml/docbook-apps/2001-q3/msg00858.html
> http://sources.redhat.com/ml/docbook-apps/2001-q4/msg00041.html

> I understand that any block element should immediately follow listitem,
> otherwise the parser would treat the listitem contents as inline.

> I use para within listitem. What I can't understand is, why white space
> in para is significant within listitem, and not significant otherwise.
> In other words, why the following construction produces leading space
> on the first line:

> <listitem><para>
>   Text.
> </para></listitem>

> and the following do not:

> <listitem><para>
> Text.
> </para></listitem>

> <para>
>   Text.
> </para>

The content model for <para> includes #PCDATA, which means any white space is significant. The stylesheet should pass any white space through to the output.

What happens to the white space then depends on the viewing application. How are you processing your content? HTML browsers have their own idea of how to display whitespace. If you are generating HTML, can you tell if the whitespace is getting through in both cases 1 and 3 above? I don't see either of those spaces in my browser when I process your example but the white space is in the HTML.


Executive Summary

Norm Walsh

| I'm writing an engineering report and I'd like to have an executive
| summary appear before the table of contents in the printed output of
| docbook.
| How would I go about doing that?


<abstract><title>Executive Summary</title>

in the *info wrapper for starters.


Docbook 4.2 Image semantics

Norm Walsh

I spent some time today (May 2002) working on new code to map DocBook V4.2 image semantics (a superset of previous semantics) to HTML. A number of compromises were required along the way.

I probably won't be able to post the new code until I get back home, but here are the notes I wrote as I went. Comments, etc., most welcome.

The HTML img element only supports the notion of content-area scaling; it doesn't support the distinction between a content-area and a viewport-area, so we have to make some compromises.

1. If only the content-area is specified, everything is fine. (If you ask for a three inch image, that's what you'll get.)

2. If only the viewport-area is provided:

- If scalefit=1, treat it as both the content-area and the viewport-area. (If you ask for an image in a five inch area scaled to fit, we'll make the image five inches to fill that area.)

- If scalefit=0, ignore it.

Note: this is not quite the right semantic and has the additional problem that it can result in anamorphic scaling, which scalefit should never cause.

3. If both the content-area and the viewport-area is specified on a graphic element, ignore the viewport-area. (If you ask for a three inch image in a five inch area, we'll assume it's better to give you a three inch image in an unspecified area than a five inch image in a five inch area.

Relative units also cause problems. As a general rule, the stylesheets are operating too early and too loosely coupled with the rendering engine to know things like the current font size or the actual dimensions of an image. Therefore:

1. We use a fixed size for pixels, $pixels.per.inch

2. We use a fixed size for "em"s, $points.per.em

Percentages are problematic. In the following discussion, we speak of width and contentwidth, but the same issues apply to depth and contentdepth

1. A width of 50% means "half of the available space for the image." That's fine. But note that in HTML, this is a dynamic property and the image size will vary if the browser window is resized.

2. A contentwidth of 50% means "half of the actual image width". But the stylesheets have no way to assess the image's actual size. Treating this as a width of 50% is one possibility, but it produces behavior (dynamic scaling) that seems entirely out of character with the meaning.

Instead, the stylesheets define a $nominal.image.width.in.points and convert percentages to actual values based on that nominal size.

Scale can be problematic. Scale applies to the contentwidth, so a scale of 50 when a contentwidth is not specified is analagous to a width of 50%. (If a contentwidth is specified, the scaling factor can be applied to that value and no problem exists.)

If scale is specified but contentwidth is not supplied, the nominal.image.width.in.points is used to calculate a base size for scaling.

Warning: as a consequence of these decisions, unless the aspect ratio of your image happens to be exactly the same as (nominal width / nominal height), specifying contentwidth="50%" and contentdepth="50%" is NOT going to scale the way you expect (or really, the way it should).

Don't do that. In fact, a percentage value is not recommended for content size at all. Use scale instead.

Finally, align and valign are troublesome. Horizontal alignment is now supported by wrapping the image in a <div align="{@align}"> (in block contexts!). I can't think of anything (practical) to do about vertical alignment.


Image markup

Steffen Maier

> <!ENTITY figure '<mediaobject><imageobject><imagedata fileref="image-file" 
> width="75%"/></imageobject></mediaobject>'>

> cause I used % in width attrib; how can I solve this?

Interesting problem. Never came across this one although I have to admit that sooner or later it's supposed to happen.

As you already noticed the percent sign is not allowed in entity values because it starts parameter entity references.

[9]    EntityValue    ::=    '"' ([^%&"] | PEReference | 
      Reference)* '"' |
     "'" ([^%&'] | PEReference | Reference)* "'"

According to [http://www.w3.org/TR/2000/REC-xml-20001006#entproc] I tried to use a character entity reference (&#x25;) to replace the percent sign and it worked with saxon 6.5.2.

The following example document...

<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" 
  "/home/maiersn/share/sgml/docbkx412/docbookx.dtd" [
<!ENTITY figure '<mediaobject><imageobject>
 <imagedata fileref="image-file" 

<article class="techreport">

...produces the following html output which again includes the percent sign in the width attribute's value of the img element...

    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
    <meta name="generator" content="DocBook XSL Stylesheets V1.50.0">
  <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
    <div class="article">
      <div class="titlepage">
      <div class="mediaobject">
  	<img src="image-file" width="75%">


biblioentry and biblioentry and bibliomixed

Jirka Kosek

> I infer that a
> single <bibliomixed> is roughly equivalent to a <biblioentry>.

Difference between biblientry and bibliomixed is very simple. Between elements in bibliomixed you must manually put puncation, when using biblientry this puncation is added automatically by stylesheet (and of course it will be added probably in a different way then you want/expect.


Table problems in fo output

Norm Walsh

| 1) If  i declare the size of the column in the colspec-tag like
| <colspec colwidth="4cm" /> i got a column with that width. So far so
| good. But I'd rather like not to specifiy the column-width and let the
| renderer (fo or tex?) choose the appropriate column-width in a
| HTML-Table-like manner. Is this possible? If yes: does someone know
| how?

If you don't specify any widths, implementations are free to choose widths in the manner that you describe. Some do, some don't. Some do better than others.

Rather than use explicit units of measure, you can use relative widths: In a two column table,

  <colspec colwidth="3*"/>
  <colspec colwidth="2*"/>

the implementation is free to choose the width, but the ratio of the width of the first column to the second will be 3/2.

| 2) Borders: Following table should (in my opinion) turn up grid-like
| with a frame around it:
|       <table colsep="1" frame="all" rowsep="1">
|         <title>Umgebungsvariablen</title>
|         <tgroup cols="3">
|           <colspec colwidth="4cm" />
|           <colspec colwidth="8cm" />
|           <colspec colwidth="5cm" />
|           <thead>
|             <row>
|               <entry>Col1Head</entry>
|               <entry>Col2Head</entry>
|               <entry>Col3Head</entry>
|             </row>
|           </thead>
|           <tbody>
|             <row>
|               <entry>Col1</entry>
|               <entry>Col2</entry>
|               <entry>Col3</entry>
|             </row>
|           </tbody>
|         </tgroup>
|       </table>
| This table turns up with inner borders but without a frame around it.
| I wonder what's wrong. Does someone know ho to do it? Or is it a bug,
| somewhere?

It's a bug. It will be fixed in 1.50.1, when it's released. I believe that it's fixed in 1.50.1-EXP2. But that's an experimental release.

| 3) I tried to change the padding of the columns in a driver-file the
| following way:
| <xsl:attribute-set name="table.cell.padding">
|   <xsl:attribute name="padding-left">12pt</xsl:attribute>
|   <xsl:attribute name="padding-right">12pt</xsl:attribute>
|   <xsl:attribute name="padding-top">12pt</xsl:attribute>
|   <xsl:attribute name="padding-bottom">12pt</xsl:attribute>
| </xsl:attribute-set>
| nor
| <xsl:param name="table.entry.padding" select="'12pt'"/>
| worked. The enlarged padding (default is 2pt) never showed up. The
| parameter all show up in the fo/param.xsl file. Is it possible, that
| they  aren't used at all? Has someone an idea what's wrong?

Look in the FO file. If they show up there, you're seeing a formatter bug. If they don't, it's a stylesheet bug. I believe it's the former, but I've been wrong before :-)

[Ednote:]That's a minority view!


website problems, setting directories

Norm Walsh

| the layout file should be something like this?
| <layout>
|   <toc page="subdir.xml" dir="subdir" file="index.html"
|     <tocentry page="subdir\foo.xml" file="foo.html"/>

Try "subdir/foo.xml". ("/" not "\")



Jason Diamond

> I just updated my resume for the umpteenth time.  I keep the master in
> plain text, then made a HTML version, and then a pdf version.  I've never
> used docbook, but from what I've read about it so far, it seems like
> storing it in docbook might be the way to go.
> I only use a few formatting elements:  3 different font sizes, 2 font
> sets, bold face, and one section uses 2 columns.
> Would docbook be the right way to go?

Try XMLResume at sourceforge.net. It's a vocabulary specifically designed for resumes and includes stylesheets to generate text, HTML, and PDF.

Mark Derricutt adds

Take a look at HR-XML - hr-xml.org - theres an industry standard on resume xml layout.

HR-XSL: sourceforge

This is an open-source project that uses HR-XML to generate resumes from an XML master, so I think it answers the original question. Plus, HR-XSL uses DocBook XML as an intermediate representation, so it's definitely relevant to the DocBook project.


Content re-use in docbook

Jirka Kosek

> <para>This sentence is shared and generic to all
> versions. <phrase condition="internal">This
> sentence only gets output for internal documentation.</phrase>
> <phrase condition="external">And this one only gets
> output for external documentation>.</phrase></para>

> Then two passes with XSLT generates two different versions
> of the content -- or it gets included in documents that
> are "internal" or "external" only.

Just small technical note. Current version of DocBook XSL stylesheets is able to do profiling on-the-fly together with conversion to HTML or FO in a single transformation. This makes whole process more user-friendly as there are not any additional steps needed to be done by user.

For more info look at: sourceforge.net


Text direction and language

Bob Stayton

> 1) What is the DocBook equivalent, if any, of the HTML attribute DIR
> for specifying in which direction, i.e. left-to-right or right-to-left,
> text should be rendered within the containing element?

I don't know of one. Wouldn't the direction be determined by the lang?

> 2) What is the correct way to specify the language used within the
> content of a certain element in DocBook *XML*: a) with the LANG
> attribute (as in HTML), b) with the LANG and XML:LANG attributes
> (as in XHTML 1.0), or c) with the XML:LANG attribute (as in XHTML
> 1.1)?

Either 'lang' or 'xml:lang' attribute would be correct (note they use lowercase letters). The 'lang' attribute is declared in the DocBook DTD for just about every element. The 'xml:lang' attribute is outside the DocBook DTD, of course, but is defined in the xml namespace specifically for that purpose. The DocBook XSL stylesheets support both.


Produce a back cover


> - how can i produce back cover for the book?

You need to decide which elements in the markup describe/contain the matter for the back cover. The produce some DSSSL to handle the markup - and lay it out as you want (borrow heavily from the stuff that does the front cover).


Table markup

Norm Walsh

| How would I go about marking up a table in docbook with partial frame
| lines?
| Here's the content:
|      a b c * d
|      e f g * h
|      i j k * l
|      * * * * *
|      m n o * p
| What I'd like to do is have the *'s replaced with nice unbroken
| separators.  Is there any way of doing this?

Try it like this:

 	<informaltable frame="none">
 	<tgroup cols="4">
        <colspec colnum="3" colsep="1"/>
 	  <row rowsep="1">

That should work for FO and for HTML with CSS.


Link to biblioentry

Norm Walsh

| I understand that a biblio set can be generated like this:
| <biblioentry id="walsh97">
| How can I reference this from my text, like "see [Walsh97]", creating a
| link (at least in the HTML version of the styles) to the corresponding
| biblioentry?

see <xref linkend="walsh97">


Reference to biblioentry

Stephan Wiesner

>[Walsh97] XML: Principles, Tools, and Techniques. & Associates, Inc.. 
>1085-2301. Dan Connolly. "A Guide to XML". Norman Walsh. Copyright C 
>1997 ArborText, Inc.. 97-108.

>I have a need to write biblioentries that do not 
>include the [abbrev] text. However, the XSLT stylesheets just stick 
>in an empty []. Is there any chance to get this fixed so that the 
>brackets are only included if the abbrev is present? Acccording to 
>the DTD, abbrev is optional.

For the HTML Style (V1.52) exchange:

    <xsl:when test="local-name($node/child::*[1]) = 'abbrev'">
      <xsl:apply-templates select="$node/abbrev[1]"/>
      <xsl:value-of select="$node/@id"/>
  <xsl:text>] </xsl:text>


    <xsl:when test="local-name($node/child::*[1]) = 'abbrev'">
      <xsl:if test="$node/@id"><xsl:value-of


How to set the language?

Jirka Kosek

> I try all properties and it work fine except language. I make a
> french documentation.

The correct way to set language is to use lang attribute in instance of your document. E.g.:

<?xml ...?>
<!DOCTYPE ...>
<book lang="fr">


Including parts of a book using entities

Bob Stayton

>  Should I be
> allowed to have book entities in a set, and then more entities in books for
> it's own chapters as follows ...

> --- set
> <!DOCTYPE set PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
> <!ENTITY book1 SYSTEM "/u/fred/fred/docs/book1.sgml">
> ]>
> <set>
> <title>My Site</title>
> <setinfo>...
> </setinfo>
> &book1
> </set>

> And then in book1.sgml have ...

> --- book
> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
> <!ENTITY chap1 SYSTEM "chap1.sgml">
> <!ENTITY chap2 SYSTEM "chap2.sgml">
> <!ENTITY chap3 SYSTEM "chap3.sgml">
> ]>
> <book>
> <title>my book</title>
> <bookinfo>...
> </bookinfo>
> &chap1;
> &chap2;
> &chap3;
> </book>

> Seems if I throw all doctype and entities in my set page, then it works, but
> by having doctype w/ entities separated doesn't seem to work. 

I'm afraid a system entity cannot have a DOCTYPE declaration. It's a "feature" of the SGML and XML specs that I've never quite understood. I can understand not wanting to mix different doctypes, but if they all use the same doctype, why not permit it? The parser could certainly ignore any extra doctype declarations when it read the system entities.

So you end up with system entity files without a DOCTYPE, which means they are not valid files and cannot be validated. It breaks the concept of modular documentation. Some editor programs like emacs and ArborText can dynamically add a DOCTYPE declaration to a system entity based on a processing instruction.

If you care to switch to XML, I've developed a system of modular DocBook using XIncludes and olinks, where each module is maintained as a valid file. It is described at: my website


Index markup

Norm Walsh

|   What is the reasoning behind using all the index, indexdiv, primary,
| secondary, indexentry, etc. terms? (Is there a link to a webpage that

index starts an index. indexdiv divides an index into sections. indexentry is an entry in the index.

I think those are pretty clear.

Within an indexentry, the primary, secondary, and tertiary elements are used in order to make sure that the index accurately reflects the common understanding of an index.

The problem with


is two-fold.

First, did the indexer really mean that? Did they really mean to have bar as a second level term, or did they intend for it to be a third level term and accidently delete one?

Second, suppose you're trying to fix an index. You've printed it out and gone through and found inconsistencies. It's a lot easier to find the inconsistencies if they have distinct names rather than having to count start tags.


Book Title, why are there two?

Bob Stayton

> >   as I peruse all of the elements, I was curious about the fact that
> > a <book> can directly contain a <title> element, but a <book> can also
> > contain a <bookinfo> element, which itself can contain a <title>.
> >
> >   this kind of duplication seems to be fairly rare -- a <book> can't
> > directly contain, for example, an <authorgroup>; that has to be
> > within the <bookinfo> element.  any reason why title, subtitle
> > and titleabbrev have this extra flexibility?

> I think this comes from the distinction of meta data (title inside *info,
> e.g.  bookinfo) and non-meta data (e.g. title inside book) where the
> former is usually not represented in any form in printed output (therefore
> meta data) and the latter makes up material to be printed in the output
> format.

Well, much of the metadata in bookinfo is printed, on the title page. That's where the author information goes, for example. In general, metadata is information *about* the book, rather than part of the book's subject content.

I think title came first, and bookinfo was added later as a way to collect the little bits of metadata that were building up as DocBook evolved. It is possible to have a title in both places, but the stylesheet 'policy' in XSL is to use bookinfo/title if that is available, otherwise use title.


How to use callouts

Bob Stayton and Dave Pawson

> >Since programlisting is a verbatim block, can't you
> >just line  up your annotations manually as you write them? 
> >Maybe I'm not understanding what you are trying to do.
> >If you are doing long comments that wrap, perhaps you could
> >use callouts?

Callouts are a good option. In html they do take a bit of tweaking, but look good.

Took me some time to imitate Norm, but this is what works for me now. I could put them any position on the line.

    <example id="bl.ch">
         <title>Chapter page sequence</title>
         <programlisting format="linespecific">
           format="1">                              <co id='pnf'/>
      &lt;fo:block text-align="outside">               <co id='hd'/>
        Chapter    &lt;fo:retrieve-marker 
           retrieve-class-name="chapNum"/>          <co id='cn'/>
     &lt;fo:leader leader-pattern="space" />
           retrieve-class-name="chap"/>             <co id='ttl'/>   
     &lt;fo:leader leader-pattern="space" />
      Page &lt;fo:page-number font-style="normal" />   <co id='pna'/>
      of &lt;fo:page-number-citation ref-id='end'/>    <co id='lp'/>

   <callout arearefs="pnf">
             <para>The page number is formatted in Roman</para>
           <callout arearefs="hd">
             <para>this block forms the header on these pages</para>
           <callout arearefs="cn">
             <para>The chapter number is retrieved as a marker</para>
           <callout arearefs="ttl">
             <para>The chapter title is retrieved as a marker.</para>
           <callout arearefs="pna">
             <para>The page number is added</para>
           <callout arearefs="lp">
             <para>Last page number of document</para>

One additional refinement to Dave's example. You can make the links bidirectional by adding an id to the <callout> and a linkends to the <co>:

<co id='pnf' linkends="pnf-callout" />
<callout arearefs="pnf" id="pnf-callout">

Then you can click on the callout bug in the code to go to its explanatory text, and click on the callout label bug to go back to the code location. At least with XSL.


beginpage element

Bob Stayton

> I've converted a print book into XML using the DocBook DTD. I've inserted a 
> beginpage to indicate each new page in the print book. Now I want to chunk 
> the XML into pages based on the <beginpage> tag, so that each page in the 
> print book corresponds to an HTML file in the XSLT output.

> How should I do that with the DocBook XSLT stylesheets.

You can't, unless you are willing to do some pretty heavy customizing of the chunking stylesheet.

This element is not designed to produce chunking output. (although you are not alone in thinking that it is). See "DocBook: The Definitive Guide" for a description of the beginpage tag (docbook.org). The DocBook XSL stylesheets don't respond to <beginpage/> unless some customization was done.

There is no facility in the chunking stylesheets for arbitrary chunks. A DocBook file is chunked at chapter and section breaks. You have some control over which sections cause a break using stylesheet parameters. See sourceforge for a description of the HTML stylesheet parameters.


Nesting Sections in Simplified docbook

Norm Walsh

| It appears that if I create a handful of sections to map out the (somewhat
| known) structure of a document in progress, then create some content in one
| of the sections (as it is provided to me), if I attempt to nest a section
| within that section that now has content, it is not possible - the section
| element is not offered in the list of allowed elements.

All the sections have to come "last". In other words, if you have this structure:

<title>Some Title</title>

<!-- point A -->

<para>Some material</para>

<!-- point B -->


You can add a new section at "point B" but not at "point A" (because that would put the paragraph after the section and that's not allowed).


Image size problems

Dennis Grace

>I am using in my documents the same image format for both PDF and HTML
>outputs (PNG). Though I want to specify the width for PDF output but not
>for HTML. An option is to define two different imageobject but I find
>much easier to change the XSL template for imageobject and ignore the

>Can someone tell me the easier way to do so?

I've been working on solutions for the same problem. I, too, found the two-different-imageobjects solutions less than satisfactory (that whole dual source issue).

So far, my best solution has been external to DocBook. In GIMP or PhotoShop, I can alter the resolution to find the optimum pixels/inch (pixels/mm) for a given size. An image that looks good in the browser at, say, 640 pixels wide will--at a resolution of 72 pixels/inch (2.8346 pixels/mm)--be too wide (8.9 in[225.8 mm]) for standard page outputs. Resetting the resolution to 110 pixels/inch (4.33 pixels/mm) reduces the print version to a printable width of 5.8 in ( 147.8 cm) while retaining the total pixel width unaffected for HTML output.


Including parts of my docbook

Yann Dirson

> I want to create blocks of text, for example, <glossentery/>s and 
> 'include' them in my text at different places.  One strategy I have 
> come up with is to create a separate file for each entry and then add an 
> <!ENTITY term SYSTEM "term.xml"> for each term.  This starts to get 
> rather ugly if I put them all in the '['...']' of the DocType specifier.

> First of all, does anybody understand what I'm attempting? If so, is 
> there a more elegant way of achieving this?

This is usually done by putting all your entity defns and references in a single file (say "entities.ent"), and just referencing this one from your master doc.


alt text or d link on images?

Jirka Kosek

> <mediaobject>
>       <imageobject>
>             <imagedata fileref= "picture.png">
>       </imageobject>
>       <textobject>
>             <para>Description of picture<para>
>       </textobject>
> </mediaobject>

> the resulting HTML page contains the image, followed by a link labeled
> '[D]', which connects to a separate HTML page containing the phrase
> "Description of the picture."

Enclose description by phrase not by para. <phrase>d text goes to ALT, otherwise surrounded text is stored on separate HTML page with long description.

       <phrase> This produces alt text </phrase>
       <para> This produces seperate, dlink (ed) html file </para>



Bob Stayton

> I just tried to automatically generate an index at the end of a book using
> the DocBook 1.57.0 stylesheets and XEP.  I found that the generate.index
> parameter did nothing unless I had an empty <index/> tag at the end of the
> book, and that the index then contained the index for the entire set.
> I searched for the string "generate.index" in the entire fo/ directory of
> the XSL distribution, and couldn't find it.  Is this a parameter that is not
> yet supported?  Regardless of that, shouldn't an empty <index/> at the end
> of a book only generate an index for that book instead of all 26 books in
> the set?

It looks like the FO indexing machinery needs a bit of work. The way I read the stylesheets, if you want an index in print output, you add an <index> element. The $generate.index parameter is not consulted. It should be. You'll still need to include an empty <index> element to tell the stylesheet where you want it, though.

Also, the selection of index terms for an index in the 'generate-index' template is "//indexterm[...]", So make sure they are present.


Why simpara tag?

Norm Walsh

| What's the rationale for the <simpara> element.
| since it seems to be a restricted version of <para>, does it
| have any benefits?  i guess, other than that it can be 
| transformed differently based on a stylesheet.

Some users want to prevent paragraphs from containing "block" elements (as HTML does). The simpara element gives them an alternative to para that has the semantics they want. And they can make a customization layer that's a proper subset of DocBook simply by removing the 'para' element from the DTD.


Are there any good uses for pi's

Norm Walsh

Absolutely. I am totally exasperated by the folks that want to remove PIs from XML. "Here's your Swiss Army Knife, norm, oh, but we broke off the small blade (the internal subset) and we've removed the tweezers (PIs), because you don't really need those. And for good measure we welded the corkscrew open (thou shalt always put elements in a namespace). Is there anything else we can do to help you?"

| Does anyone have examples of
| *inappropriate* uses? I have the impression that some people think
| that any use of PIs is inappropriate, but I'm not sure what the
| argument is?

Inappropriate? Hmm. I'd say that it was inappropriate to put proper information content in there:

  <p>This is a paragraph</p>
  <?p This is a special paragraph that is really important but I jammed it in a PI?>

That'd be bad.

| Also, are there any guidelines about naming the targets of PIs? The
| ones that I've seen tend to use a naming scheme "prefix-localPart"
| where the prefix is used to indicate the authority or application
| involved and the local part is used to specify the name of the
| instruction itself. Is that fairly standard?

More-or-less. Maybe PI targets should have been allowed to be QNames, I don't know. Clearly they have to be named with the same considerations as they are a flat space. I've used "dbhtml" and "dbfo" in the DocBook stylesheets which may be a bit too broad.

What can you do with the PIs in the DocBook styleheets?

- - Provide hints for line numbering of verbatim environments
- - Provide "table-summary" text for HTML tables. This was a bad use and is now deprecated since DocBook added a proper element to allow this, but the PI was a workable solution before the schema was revised.
- - Provide hints for the cellspacing and cellpadding of rendered HTML tables
- - Control the style of function synopses (K&R or ANSI)
- - Toggle ToCs for Q&A sets
- - Provide some suggested widths in description list environments
- - Specify the filename to use for "chunked" documents
- - Control presentation of some lists as blocks or tables
- - Specify rotation for table cells
- - Specify background color for table cells

| Despite the fact that the XML Rec. suggests using notations to specify
| the legal names of PIs, I haven't actually seen that in use. Anyone
| think that it's good practice?

Notations, alas, are a good idea that never really took off. And since the WXS idea of notations and the DTD idea are pretty different, they're probably dead.

| What about PI data? Most PIs that I've seen use pseudo-attributes, I
| guess for extensibility. Is that standard practice?

I think so. Certainly I've seen other styles, but as a stylesheet author, it's very convenient. The alternative is either some other tokenizing strategy or a different PI target name for each possible parameter.


Image file extension selection

Bob Stayton and Adam DiCarlo

> Jeff Beal <jeff.beal@ansys.com> writes:

> > As for the general concept of having the stylesheets 
> > add an extension based
> > on the format, it seems that specifying multiple alternate 
> > mediaobjects and
> > selecting the right one for the output is a better approach.
> > 
> > <mediaobject>
> >   <imageobject>
> >     <imagedata fileref="myImageFile.eps" format="EPS"/>
> >   </imageobject>
> >   <imageobject>
> >     <imagedata fileref="myImageFile.pdf" format="PDF"/>
> >   </imageobject>
> ...

> I don't know that I really agree this is optimal.  What I would like
> (and what I use) is something more like:

>  <mediaobject>
>    <imageobject>
>      <imagedata fileref="myImageFile.&img.fmt.suffix;" 
>            format="&img.fmt.name;"/>
>    </imageobject>
>  </mediaobject>

> This way, it's the job of the build system to define the
> img.fmt.suffix and the img.fmt.name and also convert the images as
> needed (I use makefile dependencies for this).  This takes work off of
> the author, where it shouldn't be, and onto the build system itself
> (where it should be).  This also eliminates all the redundant tagging
> and therefore makes the document itself more maintainable.

That's a great method of selecting a graphic format if your build system is set up for it. I'm not sure why you are objecting, though, as you can do this now, without any change to the stylesheets, right?

FYI, starting with the 1.59 XSL stylesheets, there is another way of selecting a graphic format at runtime. If the 'use.role.for.mediaobject' is nonzero, then a role="html" in an imageobject will select that imageobject (and its imagedata child) when processed by the html stylesheet. Likewise for a value of "fo". So you can set up a mediaobject like this:

  <imageobject role="html">
    <imagedata fileref="myImageFile.png" format="PNG"/>
  <imageobject role="fo">
    <imagedata fileref="myImageFile.pdf" format="PDF"/>

When you process this with the html stylesheet, you get the PNG graphic, and when you process with the fo stylesheet, you get the PDF graphic. For the xhtml stylesheet, you can add an object with role="xhtml" if you want a different one, otherwise the stylesheet falls back to selecting role="html".

If you want finer control, such as the situation you describe here with PDF for PDF output and EPS for Postscript output, then you can use any role values you want. Then you pass the selected role value in a command line parameter 'preferred.mediaobject.role'.

  <imageobject role="html">
    <imagedata fileref="myImageFile.png" format="PNG"/>
  <imageobject role="eps">
    <imagedata fileref="myImageFile.eps" format="EPS"/>
  <imageobject role="fo">
    <imagedata fileref="myImageFile.pdf" format="PDF"/>

To select the EPS format, set the stylesheet parameter preferred.mediaobject.role="eps" on the command line.

This method is admitedly more verbose than yours, but is more flexible per object. It gives the author the opportunity to add graphics attributes for individual output formats to optimize each one, for example. It also fulfills the original design goal of the mediaobject wrapper: to allow the author to specify several potential objects, where one of which is selected at processing time.

Closely related: Bob Stayton tells us

> I want to use different graphic formats for print and html output, 
> and was hoping that I could deal with this in the relevant stylesheets 
> using the default.graphic.extension parameter, as in the following example:

> <xsl:param name="graphic.default.extension" select="gif"/>

You need to put the gif in single quotes within the double quotes:

<xsl:param name="graphic.default.extension" select="'gif'"/>

Otherwise the stylesheet thinks you are trying to select the element named gif instead of a string. It's a common mistake.


How to not number figures

Bob Stayton

> Images are numbered "Figure 1. Title ..", "Figure 2.." and so on.

>  How can I not number it, or remove the title completely.

> Any pointers would be highly appreciated.

If you don't want to put a title on a figure, then use 'informalfigure' instead of 'figure'. informalfigure doesn't take a title.

If you want to change how the title is presented, i.e, without the "Figure 1", then you need to customize the generated text for figure. See: sagehill.net for how to do that.


abbrev tag needed in output

Bob Stayton

> Is there a piece of Docbook markup that gets transformed into and HTML 
> abreviation tag? To create the following output, for example:

> <abbr title="Web Accessibility Initiative">WAI</abbr>

There is no DocBook XSL template that outputs the HTML <abbr> tag. And there is no attribute on acronym that is for a title. There is some discussion in the DocBook Technical Committee about annotations like that, but nothing in the DTD yet.

If you wanted to stuff your title in the xreflabel attribute, you could customize the stylesheet by adding something like this to your stylesheet customization layer:

<xsl:template match="acronym">
  <xsl:attribute name="title" select="@xreflabel"/>
    <xsl:call-template name="inline.charseq"/>

The xreflabel isn't quite the right attribute to use, but there isn't a better one at this point.


table titles

Bob Stayton

> How do I turn off table titles (like "Table 2.1"). I want to use section
> titles instead, and don't need the table numbering or title labels.

If you don't want table titles, then you should use <informaltable> instead of <table>. The only difference is the <title> element.


Create an index

Jirka Kosek

> How can i automatically construct an index using '<indexterm>'

Put empty <index/> element in a place where index should occur.


What is simpara for?

Norm Walsh

| I'd be
| curious to hear why simpara was added to DocBook in the first place.

So that customizers could limit users to a para element that didn't allow block content.


Reference to a glossary entry

Jirka Kosek

> How to point to a glossary entry, or to mark-up the text as to say: 
> you can find me in the glossary?  

You can use glossterm element for this purpose.


Reference a bibliography

Bob Stayton

> How does one reference and enumerate a bibliography?  I have a 
> reference to a document and I would like to "link" it to the reference in 
> the text.  

This citation:

<xref  linkend="BrodyArticle"/>

to this bibliography entry:

<biblioentry  id="BrodyArticle">

will generate this citation text:


At least with the XSL stylesheets.



Bob Stayton

> There are several equation tags and there are also an adequate 
> number of symbols available from ISOtech.ent and ISOamsb.ent.   If I want to 
> write an equation which is simply made of the ISOtech and ISOamsb symbols 
> and I don't want it inline in the text, how do I do it?  

By figure tag, I presume you mean a title? There is the equation tag, but that requires a title. There is also informalequation, which does not require a title. You can enter math text this way:



Repeated reference to screenshot

Bob Stayton

> As a part of a user manual, I've got several screenshots which need to be
> referenced again and again.

> I'm in a fix as to how to refer to an image, through an <xref linkend=...>

> My initial effort at:

> <screenshot>
>  <screeninfo>640x480x250</screeninfo>
>   <graphic id="Logon" fileref="images/1.4a.gif">
> </screenshot>

> does not seem to work. Any suggestions on this score?

An xref generates the link text from the object being pointed to. In this case, there is no text to generate. You could put the screenshot inside of a figure, give the figure a title, and put the id on the figure element. Then the xref has some text it can generate


How to set the language

Bob Stayton

> what's the proper way to set 'language' attribute with DocBook XSL stylesheets
> for XSL FO? I've tried but failed for German (I assume the code is 'de').

> I need that for hyphenation to work.

You can put a lang="de" attribute in an element that starts a page-sequence (chapter, etc) to add a language="de" property for that page-sequence. If you want it for the whole document, put lang="de" in the document's root element, or set the command line parameter l10n.gentext.language="de". Either one should put the language="de" attribute in fo:root.


Entity problems

Norm Walsh

|> &euro; works, it is declared as an entity to its unicode character at the
|> docbookx.dtd:
|> <!ENTITY euro "&#x20AC;">
| Oh, wait, I misunderstood--I thought docbookx.dtd was just for XML.
| But this is what's tripping me up:
| <![%sgml.features;[
| <!ENTITY euro "[euro  ]"><!-- euro sign, U+20AC NEW -->
| ]]>
| <![%xml.features;[
| <!ENTITY euro "&#x20AC;"><!-- euro sign, U+20AC NEW -->
| ]]>
| Why is &#x20AC; restricted to XML documents?

Because SGML traditionally has used SDATA entities. There's a bug there, however, in that if %sgml.features; is true, the entity declaration should be:

 <!ENTITY euro SDATA "[euro  ]"><!-- euro sign, U+20AC NEW -->

If adding SDATA fixes the problem, please let me know. If not, try adding your own declaration for &euro; pointing to the Unicode code point. That should work, if your SGML processor understands Unicode.

(SGML has all sorts of magic in the SGML Declaration to handle multiple character sets.)


Entity sets

David Carlisle

There is now a collection of entity definitions hosted at the W3C at W3C

This is a mixture of an update to the existing data at W3C mathml with most of the text rephrased to be less mathml-specific, together with a new draft text of an update to ISO/IEC TR 9573 To include Unicode (ISO 10646) definitions of the entities rather than SGML SDATA entities.

All the data and scripts to produce the site are also available, linked from the overview page.

Currently the definitions are identical to the definitions in the forthcoming MathML 2 2nd edition PR draft.

The draft ISO/IEC DTR 9573 contains tables detailing differences between these current definitions and the defintitions used by Docbook, HTML, and the Stix Consortium.

It is hoped that this _draft_ set of definitions might form the basis of a shared, compatible set of definitions between different XML languages so that the current situation where <mo> & assymp; </mo> changes meaning if it is copied from a docbook+mathml document to a xhtml+mathml document might be avoided...


Shared entity definitions

Jeff Beal

> I want to share entity definitions across multiple Docbook documents.

> Is there a way to do this? For example, I want to have:

>   <!ENTITY mystring "My String">

> in one place, and be able to reference it by &mystring; from many 
> documents.

You can create a single file containing all of your shared entity declarations and then include it in each of your files.

For example, you could have a file named "global.ent" that would look something like:

  <!ENTITY mystring "My String">
  <!ENTITY another_string "Another String">

Then, the DOCTYPE declaration for each file would be:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook 4.2 XML//EN" "http://www...." [
 <!ENTITY % global_entities SYSTEM "global.ent">

(Note the use of the percent sign (%) when declaring such "parameter entities" that include DTD declarations.)


Entity reference to include external code listings

Bob Stayton

> I am currently writing a technical spec using DocBook XML 4.2 and
> publishing using Cocoon 2.0.4 with Saxon 6.5.2 and docbook-xsl-1.61.2.

> I have been experiencing problems including external DTD files (and xml
> documents) within an appendix of the document and have finally succeeded by
> using a textdata element within a textobject nested in a programlisting
> element. However I seem to have encountered two problems specifically with
> this:

> I have been forced to use the fileref attribute of the textdata object and
> hard-code the path to the external document. I would ideally have liked to
> us an entityref and then define all my external entities at the top of the
> document. This however seem to give me blank output for both html and pdf
> output. The following was my original document snippet

> <?xml version="1.0"?>
> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
> "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
> <!ENTITY generic-request.dtd SYSTEM
> "/home/r_exley/tmp/code/xml/generic-request.dtd">
> ]>
> .
> .
> .
> <section>
>   <title>generic-request.dtd</title>
>   <programlisting>
>     <textobject>
>       <textdata entityref="generic-request.dtd" />
>     </textobject>
>   </programlisting>
> </section>
> .
> .
> .

> I have since replaced the textdata element in the above with

>       <textdata fileref="/home/r_exley/tmp/code/xml/generic-request.dtd" />

> and this gives me the expected output. My assumption for using the
> entityref approach was that I could extend this to make advantage of
> catalogs.

The entity pointed to by an entityref must have an NDATA type as declared in the DTD. DocBook declares linespecific for this, so change your entity declaration to:

<!ENTITY generic-request.dtd SYSTEM
"/home/r_exley/tmp/code/xml/generic-request.dtd" NDATA linespecific>

Then the entityref should work. The validation process should have pointed out this problem.

> Also in the pdf output resulting from this, the first line for each
> included text file is indented, this is somewhat strange and throws the
> formatting out (and looks ugly). This only occurs in the pdf output, the
> html output is fine.

Inside a programlisting element, all white space is preserved, including that before and after your textobject. So change it to:

        <textdata entityref="generic-request.dtd" />

The whitespace inside textobject should be ignored, though.


Adding another mediatype

Bob Stayton

> The swf file type does not exist as a valid file type in either
> the imagedata or videodata elements.
> What is the work around to this so documents will validate with this
> extension? Must be someone else out there wanting to embed Flash movies
> as well.

For validation, you need to add to the list of notations supported by the DTD. You can extend the list yourself in the internal subset of the DTD. In your docbook file:

<!DOCTYPE book PUBLIC etc. [
<!ENTITY % local.notation.class "| SWF">

In your imagedata or videodata element, add a format attribute:


Then you need to customize the stylesheet to update the template named 'is.graphic.format' to accept your format string:

xsl:template name="is.graphic.format">
  <xsl:param name="format"></xsl:param>
  <xsl:if test="$format = 'SVG'
                or $format = 'PNG'
                or $format = 'JPG'
                or $format = 'JPEG'
                or $format = 'linespecific'
                or $format = 'GIF'
                or $format = 'GIF87a'
                or $format = 'GIF89a'
                or $format = 'BMP'
                or $format = 'SWF'">1</xsl:if>

If you use videoobject, then the HTML output will be <embed>. If you use imageobject, then the HTML output will be <img>.

I'm not sure what happens after that. 8^)


Confusing entities

Paul Grosso

>> I assume that I found a small bug in the stylesheets. Maybe the bug 
>> isn't in the stylesheets but in the DTD. I don't know.
>> If I use the entity &rsquor; within my documents, I get always the wrong 
>> character. Since I don't know the proper english (and german) name for 
>> it, I put an example online:
>> http://www.xshare.com/fehler.html

Shows &8216; - ed

>> It would be nice, if someone could have a  look at it.
>I think this is a bug in the DTD.  The &rsquor; entity
>should map to &#x2019; not &#x2018; as it currently does
>in the DTD.  The same with &rdquor; which should map to
>&#x201D; but instead maps to #&x201C; in the DTD. 
>I have to admit that the ISO entity descriptions aren't
>all that clear.  What exactly does the "rising" in
>"rising single quote, right (high)" mean?
>The "high" gives the position, so does "rising"
>refer to whether it is a 6 or 9 shape?

One problem is that the entity names, e.g., rsquor, aren't always easily matched to Unicode names.

At [1], I see:
but it seems clear that there is something wrong here.

At [2], one reads:
 2018 left single quotation mark--looks like a 6
 2019 right single quotation mark--looks like a 9
 201C left double quotation mark--looks like a 66
 201D right double quotation mark--looks like a 99
 201B single high reversed-9 quotation mark--looks like a reversed 9
 201F double high reversed-9 quotation mark--looks like a reversed 99
(where "reversed" means mirror image or reflected about the vertical axis)
 201A single low-9 quotation mark--looks like 9 down at the baseline
    (so it pretty much looks like a comma)
 201E double low-9 quotation mark--looks like 99 down at the baseline
    (so it almost looks like two commas close together)

The latest entities at [3] and the draft revision of ISO 9573 at [4] agrees with the above, but that's not too surprising since David, Norm, and I worked together on this, and I don't remember revisiting these quotes, so we might have missed something.

I wonder where "rising" comes from in the iso-pub file? I wonder if the "r" at the end of these really means "reversed" in the case of the right quotes? I wonder what the 'r' means at the end of the left quotes, since there are neither reversed nor rising left quotes and, in fact, the iso-pub comment says they are "rising, low-9" quotes. I wonder if the answers to such questions will ever really be known or lost in time (somewhere around 1985 when this stuff was originally done)?

My best guess is that the correct mappings are as follows:

 ldquo  ISOnum 0x201C # LEFT DOUBLE QUOTATION MARK  [correct now]
 ldquor ISOpub 0x201E # DOUBLE LOW-9 QUOTATION MARK  [correct now]
 lsquo  ISOnum 0x2018 # LEFT SINGLE QUOTATION MARK  [correct now]
 lsquor ISOpub 0x201A # SINGLE LOW-9 QUOTATION MARK  [correct now]
 rdquo  ISOnum 0x201D # RIGHT DOUBLE QUOTATION MARK  [correct now]
 rdquor ISOpub 0x201F # double high reversed-9 quotation mark  [needs fixing]
 rsquo  ISOnum 0x2019 # RIGHT SINGLE QUOTATION MARK  [correct now]
 rsquor ISOpub 0x201B # single high reversed-9 quotation mark  [needs fixing]

but I could be wrong. Input from others who can substantiate the correct mapping would be appreciated.

[1] http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT
[2] http://www.unicode.org/charts/PDF/U2000.pdf
[3] http://www.w3.org/2003/entities/
[4] http://www.w3.org/2003/entities/iso9573-2003doc/9573.html


markup for generated content

Rowan ??

> I'm looking for a good way to markup generated content, i.e. sections
> that get updated resp. replaced just before publication by an additional
> processing step. Things likely to get automatically replaced are the
> current date, a processor identification or the geographical location of
> the publishing machine.
> In some pseudo XML, this would look like:
> <para>
>   Signed <gentext type="Location.City">Munich</gentext>,
>   <gentext type="Date.Long">November 25, 2003</gentext>: Christian
> </para>
> <para>
>   (Processed by <gentext type="Processor.Name">XalanJ 2.5.2</gentext>
>   at <gentext type="Time.Short">23:10</gentext>)
> </para>
> In lieu of the <gentext> elements in the above, should I simply use
> <phrase> with a custom role attribute? 
> Are there any general rules or best practices for accomplishing this?

Personally, I tend to use custom XML processing instructions for this, since these are clearly targetted at ... a processor.

For your example I'd use something like the following mark-up:

  Signed <?gentext Location.City ?>,
  <?gentext Date.Long ?>: Christian
  (Processed by <?gentext Processor.Name ?>
  at <?gentext Time.Short ?>)

Of course you could use pseudo-attributes, like in "<?gentext type='Location.City' default='Munich' ?>"

But these are just my own preferences. Others might do things differently.

Btw., this template helps if you use that approach: http://docbook.sourceforge.net/release/xsl/current/doc/lib/lib.html#pi-attribute


How to mark up a persons middle initial?

Rune E Lausen

how does one mark up middle initials?

I would use <othername>.



Multiple index terms

Bob Stayton

> Imagine that in your document you first talk about stacks as a
> datastructure in general and later you talk about a (C++, Java,...)
> implementation "class stack". So you have two indexterms
> <indexterm><primary>stack</primary></indexterm>
> .
> .
> .
> <indexterm><primary><classname>stack
> There should be twe separate index entries, one for stack in general
> (proportional font), and one for the class stack (fixed font) so that
> the reader can quickly distinguish between them. Many books do it this
> way, e.g. Stroustrup (from which I've taken this stack example.)
> But the XSLT stylesheets map both indexterms to one entry.
> Is there any way to workaround this?

I thought this was going to be really hard, because the indexing machinery [in the xsl] is so complex and difficult to follow.

But actually, you can accomplish this without customizing the stylesheet at all. You can do it entirely in your source document. The sortas attribute can be used to separate them:

<indexterm><primary sortas="stack classname">

If all the instances of the classname indexterm have a sortas attribute that differs from the non-classname indexterms, then they will be treated as different index entries. You'll have to be careful to get them to sort together if you have other "stack somethings". You'll have to come up with a naming scheme, maybe using a character that occurs before all letters in the sort order.

Combined with the earlier customization, you should get what you want.


Modular docbook books?

Bob Stayton

I've written a description of using XInclude and olinks to create modular DocBook books. See this reference: Bobs site

It means you don't have to have any special features in your XML editor, but it does mean your XSLT processor must be able to handle XIncludes.

Ednote. Must read this.


Including XML in programlisting tags

Camille Begnis.

> Hi I  want to document the contents of an xml file using the 
> programlisting tag, however when I try to process the file it does not 
> ignore the tags from the xml file.  Is there anyway of resolving this?

> Ie a tag i can specify to ignore it?

> in xml file:
> ...
> <programlisting>
> <setting>
>   <server name="identifier" hostname="fqdn" port="3000">
>   <server name="identifier" hostname="fqdn" port="3300">
> </settings>
> <programlisting>

You must either excape the characters that trigger parsing:

  &lt;server name="identifier" hostname="fqdn" port="3000">
  &lt;server name="identifier" hostname="fqdn" port="3300">

Or enclose the code in a CDATA section:

  <server name="identifier" hostname="fqdn" port="3000">
  <server name="identifier" hostname="fqdn" port="3300">


Index markup

Bob Stayton

I have a recommendation for the process. It will go faster if whoever is creating the indexterms is set up to process the book and generate the index. If you have done much indexing you know that a good index is made by an iterative process. The first pass of adding entries will have small inconsistencies in vocabulary, groupings, see, and see also. The indexer has to be able to process the entries, review the index, and make adjustments in the indexterms.