A discussion on approaches to learning XSLT

Push or Pull?

This section was culled after Roger L. Costello posted a link to an article on writing XSLT.

A colleague of mine has written an excellent paper describing a new way of looking at creating XSLT documents. I think that you will find the paper very thought provoking. He has kindly permitted me to post it on my Web site:

Various other contributors came back with comments and additions. I find these quite enlightenign with respect to the usage and learning styles

Steve Muench

One of the features that the XSLT 1.0 spec provides to cater to the first-time-HTML-savvy user, is the verbosely-named "literal-result element as stylesheet" capability. Stylesheets that use this capability are often called "single-root-template" stylesheets or stylesheets written in the "simple form".

Use of this could further improve on David's smooth-slope introduction to XSLT for first-time users. It's a technique that I use for simple HTML and XML transformations in my "Building Oracle XML Applications" book.

That is, instead of writing:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">
  <xsl:template match="/">
    <HTML>
      <HEAD>
        <TITLE>Welcome</TITLE>
      </HEAD>
      <BODY>
        <FONT bgcolor="{member/favoriteColor}">
          Welcome <xsl:value-of select="member/name"/>!
        </FONT>
        <TABLE>
          <TR><TH>Type</TH><TH>Number</TH></TR>
          <xsl:for-each select="member/phone">
            <TR>
              <TD><xsl:value-of select="@type"/></TD>
              <TD><xsl:value-of select="."/></TD>
            </TR>
          </xsl:for-each>
        </TABLE>
      </BODY>
  </HTML>
</xsl:template>
</xsl:stylesheet>

You can write:

<HTML xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <HEAD>
    <TITLE>Welcome</TITLE>
  </HEAD>
  <BODY>
    <FONT bgcolor="{member/favoriteColor}">
      Welcome <xsl:value-of select="member/name"/>!
    </FONT>
    <TABLE>
      <TR><TH>Type</TH><TH>Number</TH></TR>
      <xsl:for-each select="member/phone">
        <TR>
          <TD><xsl:value-of select="@type"/></TD>
          <TD><xsl:value-of select="."/></TD>
        </TR>
      </xsl:for-each>
    </TABLE>
  </BODY>
</HTML>

And you can teach people that the steps to get started are to:

(1) Get an HTML template from your web design folks
(2) Use Dave Raggett's "tidy" with the "-asxml" option to convert the HTML to well-formed HTML
(3) Add an "xsl:version" to the <HTML> root element
(4) Begin "peppering" in <xsl:value-of>, attribute value templates, and <xsl:for-each>'s to suit

I agree you need to enable that smooth transition from single template to multiple templates.

I offer a cut at explaining the "Single Template" stylesheet and why its worthwhile to move on from that to the multiple template stylesheets in Chapter 7 of my book that is about XSLT basics. It's available for free reading on O'Reilly's web site in HTML or PDF formats.

In my opinion, it's all about understanding *why* you should apply a technique, as opposed to learning by rote that "you do it this way, just because..."

David Carlisle comments on one aspect

> Agreed, this is just the pull method, but all too often I see this method
> being dissed in favor of the more powerful push.

Probably. I suppose I'm as guilty as any there. While I sometimes use that sort of technique while answering questions on this list, I don't think there's a single occasion when I'd ever have wanted to do that to process one of my own documents.

I can see it's useful if pulling bits of data out of databases etc (which is one main use of the ASP on the web pages here, as a matter of fact) Where almost the entire page is static and you just want to "fill in the blanks".

But if you are converting a document format of any complexity down to HTML (for example) the template rule driven approach is far more natural and easier to code in XSLT as the output is driven by the input, you don't need to sketch out the entire format of the document; If you come across a list in the input, convert it to an HTML list, and carry on

But I'd say that any document complex enough to require a TOC then probably the pull method is not very attractive. Unless you know the exact depth of sectioning etc how can you make your skeleton template?

Far easier to just define some templates defining locally the transforms you want and recurse down with apply-templates. Then your table of contents automatically grows to fit whatever's in the document.

The main danger of the "pull" technique is that you lose data very easily: the default behaviour is to lose all the input document. You have to explicitly code to copy things over. Conversely using templates the default behaviour is not to lose so much. I usually just start off with a default template something like

<xsl:template match="*">
 <font color="red">[<xsl:value-of select="name()"/>]</font>
  <xsl:apply-templates/>
</xsl:template>

Then just throw in templates as needed until the red its go away, (Of course I should say that first I fully specify and prove correct the entire design before starting to code the stylesheet but....)

Now of course this only works if the input is in some sense already a document. If the input is a few GByte of database, having an initial pass at writing a stylesheet have the whole thing come out in red is not so attractive. I suppose this is down to differing viewpoints again.

It's this facility to independently code small chuncs of stuff (which XSLT shares with many functional programming environments such as Standard ML, or lisp) that makes it rather attractive. Just coding the entire thing as one monolithic chunk works but just looks dull I think (in terms of showing students any interseting behaviour)

Uche Ogbuji

Pull is a bad idea from the didactic POV. If one wants people to learn how to generate HTML and other simple documents as quickly as possible, there is no doubt that most people with any background in the more popular computer languages would catch on to pull more quickly than push.

But it's a false simplicity. Pull is easy when the problem space is simple, as is the case with so many toy examples necessary when teaching beginners. But programming difficulty scales at an alarming rate with the complexity of the problem space. It doesn't take long to run into real-world examples where pull is nearly impossible to program correctly.

Push on the other hand, while for some people more difficult at first, is a much more powerful approach for solving complex problems. And in alomst all cases it is less prone to defect and easier to maintain.

This is not functional programming bigotry for its own sake. Since the invasion of webmasters and amateurs of scripting, it is easy to forget that document processing is one of the most delicate areas of inquiry in computer science, and it has called for elegant solutions from Knuth's TeX to Clark & co's DSSSL, to XSLT. As Paul Tchistopolskii explained here. XSLT at its best is about pipes and filters. XSLT's weakest points are where this model breaks down.

Whether your favorite conceptual module is pipes and filters, tuple spaces, or just good ol' lambdas, a fundamental understanding of push techniques is esential if you want to ever do any serious development in XSLT. New arrivals to this field take short-cuts only to get lost later. From a purely practical point of view, I think it's important to teach apply-templates, modes and friends well before for-each, and bitchin' value-of tricks.

Chad Smith

I think that this is defeating the message that was being relayed by the paper. The idea was to leverage currently accessible and widely used structures like HTML to learn XSLT. However, if you were to cut things down to the <HTML></HTML> stuff given in the example, people are going to learn this and then have a problem unlearning it in order to use multiple templates. Yes, starting out with the outer XSLT stuff might be more cumbersome, it's only slightly moreso and will definitely be more beneficial than a method of oversimplification like the <HTML></HTML> stuff.

David Jacobs

Agreed, this is just the pull method, but all too often I see this method being dissed in favor of the more powerful push. While this makes sense for those who are already experts in XSLT and pushing the envelope, I believe it is detrimental to have most people's initial exposure to XSLT be push formulated stylesheets. My main issue is that of advocacy and how to help XSLT achieve mass popularity on the order of PHP, ASP and other favored web application tools.

Ken Holman, who was, I think, the one to actually name these two approaches push and pull!

I agree with the others this isn't new, but I have additional points to make that haven't been brought up.

You may find that the utility of the pull approach quickly fades, especially in publishing solutions (though probably not as quickly in data solutions). Of course the "pull" approach is untenable when dealing with mixed content.

I think it is important to not distinguish the two approaches as "classical" and "new", as they both have important roles to play based on the kind of problem being solved. Many solutions will require the use of both, though there are other reasons where using "push" brings more benefits than using "pull".

Using "pull" approaches inhibit the sharing of stylesheet fragments. When an organization views the deployment of stylesheets from many contributors, it is critical to be able to share stylesheet fragments. Stylesheets using the "pull" approach you are advocating are monolithic, they inhibit sharing, and they cannot be specialized using importation. Stylesheets using the "push" approach are granular and promote reuse of the investment in stylesheet fragments. Organizations should be cognizant of issues of stylesheet maintenance in the long term.

Using only on a fill-in-the-blanks monolithic "pull" approach may provide quick gratification but will not equip you for certain real-world situations. This approach becomes either unusable or isolationist (relating to being able to reuse stylesheets because they are monolithic; I couldn't think of a better word here) and doesn't build the expertise to take full advantage of the language.

Jeni Tennison:

I like the 'top-level template as controller' approach too, although I tend make the distinction between the overall control - what bits of the source should be processed, say, or in what mode - and production of output. Control of output - the overall structure of the document, I tend to put in a template matching the document element. So typically I might have:

<!-- this is a copy of the built-in template for the root node -->
<xsl:template match="/">
   <xsl:apply-templates />
</xsl:template>

<!-- this template actually produces something -->
<xsl:template match="/*">
   <html>
      ...
   </html>
</xsl:template>

The reason for this is for extensibility. If I now want to add another set of templates to produce WML rather than HTML and use a parameter to choose between them, I can just change the root-node-matching template to:

<xsl:template match="/">
   <xsl:choose>
      <xsl:when test="$method = 'wml'">
         <xsl:apply-templates mode="wml" />
      </xsl:when>
      <xsl:otherwise><xsl:apply-templates /></xsl:otherwise>
   </xsl:choose>
</xsl:template>

Similarly, if I want to pre-process the data to filter out something, I can adjust the root-node-matching template:

<xsl:template match="/">
   <xsl:variable name="processed">
      <xsl:apply-templates mode="filter" />
   </xsl:variable>
   <xsl:apply-templates select="$processed/*" />
</xsl:template>

I find it also makes it easier to later combine stylesheets, but probably that's not particularly high on the web-application priority list.

If you like, there's a more natural 'match' between matching the document element of the source and generating the document element of the result than there is between the root node of the source and the document element of the result.

I think what's coming out here is not that it's a matter of the document vs. data orientation of the source, but rather the match between the source and the result.

If the result follows the structure of the source, then a push method is more natural - the source drives the process. If the result has a substantially different structure from the source, then a pull method is more natural - the result drives the process.

Documents *tend* to be transformed to documents (hence usually use push) but you could imagine a document-analysis stylesheet with a result structure totally different from the document - an index perhaps. There, a pull approach is easier.

Similarly, while you *tend* to be pulling information out of data-oriented XML for presentation, there are other times when the result structure is very similar - a translation, perhaps. There, push is more natural.

And of course it's not a matter of 'this stylesheet uses pull' and 'this stylesheet uses push'. Different bits of the same stylesheet can use a different method depending on the match between that particular bit of the source and the result that you need from it.

Which I think sums it up nicely (as often is the case:-)