Documentation, Literate Programming and xweb.

Revision History
Revision 0.2 2007-09-17T09:00:27Z Dave Pawson
First attempt at documenting xweb processing
Revision 0.3 2010-05-04T08:52:12Z Dave Pawson
Reviewed.

Table of Contents

Basics
Building a single output from several parts
Building several output files from a single source
Removing all those namespaces in the output
The program, or XML document
References

Basics

Literate programming involves integrating documentation and code (however you define that) into one instance. In this case, into one XML instance as outlined by Norm. I'm documenting it and bringing it up to date by using the relax NG schema and more modern stylesheets.

The terms are defined on wikipedia

It will come as no surprise to learn that this document is an instance of literate programming.

The source document is a docbook document, which happens to have an article as its document element. Whilst writing documentation, retain the docbook namespace and make full use of the docbook elements. When you want to start writing code, or xslt templates or something else other than docbook, that's when the namespace changes. The namespace this document uses for such content is http://nwalsh.com/xmlns/litprog/fragment and the element name is fragment. This is a block level element, used wherever a para element might be. It's content is namespace and requires an xml:id value. The content of the fragment is transformed to extract the code, i.e. the tangle side of the pair. Extracting the documentation is the weave process. For this tutorial, the output will be assumed to be XML as apposed to code. The implications of this are that the tangle stylesheet output has its output set to xml, whereas were it producing C code or Java, then the output mode would be set to text.

The basic processing sequence for an xweb file is shown below, as a Linux shell script. File locations will need to be resolved for your setup

Fragment frag.root

 

#!/bin/bash

java  com.icl.saxon.StyleSheet   -o tmp.xml        litprog.xweb  xsl/weave.xsl 

java  com.icl.saxon.StyleSheet   -o litprog.html   tmp.xml       xsl/mydocbook.xsl   

java  net.sf.saxon.Transform     -o code.xml       litprog.xweb  xsl/tangle.v2.xsl

  

The process sequence outlined above is, weave - to produce the documentation (documentation always comes first :-), then tangle - to generate the code. The weave step is a two stage process. The first generates pure docbook xml, the second generates html from the docbook - using a customization layer (mydocbook.xsl) which calls up the as delivered docbook stylesheets. Two options in calling up tangle. Use either xslt 1.0 implementation or xslt 2.0 if multiple output files are required. The output of this process is litprog.html - the documentation and code.xml which is the actual code.

That is the simple process outline. More follows, along with the details.

Building a single output from several parts

It was realised that people don't often build code or XML as a single one off exercise. It is common to develop the output over a period and in many files. The impact on xweb is that the output can be built up over time and needs to be built from several files. Ordering becomes a task then, since a developer may not build them in the order in which they are required as final output, for whatever reason. xweb answers this need by having a starting point and a sequence in which to build the output.

This is also the solution where documenting or even designing a solution consists of several parts. The author may design top down or bottom up. xweb comes to the rescue here, since I can document the parts as I design them, in pieces, then build the entire program up into a cohesive whole.

Fragment section2

 <src:fragment xml:id="top">
    <src:fragref linkend="frag.root"/>
    <src:fragref linkend="section2"/>
    <src:fragref linkend="markup"/>
    <src:fragref linkend="section3"/>
  </src:fragment>
 

The fragment above has an xml:id value of 'top' which is special, in that it identifies the starting point for the collection of output. In turn, each fragref is resolved to a fragment which is then added to the output. In this way the source can be built up and the fragments added to the output which cumulatively increases. The actual order is defined by the sequence in the fragment wrapper with xml:id value of 'top'. This file above has such a fragment.

Building several output files from a single source

The actual output file may need to vary between actual fragments. I.e. One fragment could be text (or more accurately non-XML) whilst the next could be XML. Further variations are that there may be more than one output file built from the single xweb file. This is controlled by the mode attribute on the fragment element. It only takes one value if present. It is chunked. When the attribute is both present and has that value, another attribute is required to specify the output filename. This is the href attribute, which should contain the name of a file. The final attribute, output determines if the output mode of the transform will be text or xml. All three attributes must be present to get an output file seperate from the main output. In order to get this seperate output file, an XSLT 2.0 processor must be used. The fragment in this section shows this. It contains:

  <src:fragment 
    xml:id="section3" 
    output='text'
     mode='chunked'
     href="output.txt"
    >
.... content

</src:fragment>

So the section3 fragment has an output format of text (the other alternative is xml), will be written to a file called output.txt, all triggered by the chunked value of the mode attribute

Fragment section3

     This stands as output code for section 3. The src:fragment element
     has the output attribute set to text, so this content should be a simple
     copy of the input content.

     The difference is that the mode is set to 'chunked', so an output
     file named after the href attribute (output.txt) should be generated.
     This also implies that the processor understands xslt 2.0

 

Removing all those namespaces in the output

The use of mundane-result-prefixes="dc ns a nvdl r" enables some namespace cleanup in the output. The attribute contains a list of prefixes for which namespace declarations will not be generated. This can also be specified as a stylesheet parameter, $mundane-result-prefixes.

The program, or XML document

This fragment is the first one to be processed (by the 'tangle' phase) and dictates the order in which the various fragments are processed into the output. Each fragref is resolved in turn to add to the normal output stream. The linkend attribute values should match one (and only one) id value on another fragment. Note that if you are using docbook version 5, as this example does, then the id values won't be id values, they will be xml:id values.

Fragment top

     frag.root
     section2
     section3
  

References

References to other documentation on tangle and weave.

  1. Norman Walsh, Literate Programming in XML

  2. The current litprog docbook

  3. Mark B Wroth, DBLP: DocBook-based Literate Programming

  4. Robin Cover, a longer history of tangle and weave

  5. Finally the xweb source for this file, and the modified stylesheets. All zipped up as xweb.zip