SAX2 Filters

1. Sax filter or XSLT transform


Sax filter or XSLT transform

Michael Kay

Oft times the question arises when to use SAX filters and when to use XSLT. Here Mike Kay offers help in deciding

Here's a starter for ten (sorry, that's a catchprase from a UK TV programme)

A SAX filter can sometimes be used instead of an XSLT transformation, and it can sometimes be used for pre-processing the input to an XSLT transformation, or for post-processing the output.

The main cases where a SAX filter can be useful are:

(a) in cases where the XML file is too large to be processed by XSLT
(b) in cases where you need to perform operations - usually text processing - that can't be done easily in XSLT.
(c) to preserve information that the XSLT/XPath data model does not retain

To solve problems of document size, you can:

(a) do all the processing in a SAX application (if the processsing is simple and purely serial)
(b) use a preprocessing SAX filter to create a smaller input document for the transformation to work with (e.g. by projection or restriction)
(c) use a preprocessing SAX filter to split the large document into many small documents, each of which is then transformed independently by XSLT. If necessary, you can then use a postprocessing SAX filter to put the transformed pieces back together again.

A SAX filter can be used to transform the input data into a form that is more amenable to XSLT processing. Examples include:

(a) preparsing a structured text field (e.g. CSV) into a set of separate elements
(b) changing the representation of a date field to the ISO 8601 form yyyy-mm-dd
(c) computing a derived attribute, e.g. adding @value as the product of @price and @qty, making it easier for the XSLT stylesheet to do sorting and totalling.
(d) simple grouping of elements, for example adding a <list> element around any consecutive sequence of one or more <list-item> elements

A SAX filter can be used to capture features of the source document that are not representable in the XSLT data model. For example, entity references and CDATA sections, as well as DTD declarations, can all be captured in a SAX filter and translated into elements that are visible to the XSLT stylesheet.

A postprocessing SAX filter (or simply a SAX ContentHandler) is useful in two principal situations:

(a) to undo the changes made by a preprocessing filter
(b) to achieve serialization effects that cannot be achieved using the standard serialization methods (as an alternative to disable-output-escaping).

Sometimes a user-written serializer can be produced by subclassing the standard serializer supplied with your chosen product. This will of course be product-dependent and your code may not work with future releases of the product.

It's also possible to write a SAX filter to preprocess the stylesheet. This is less common, but it can be used to tackle problems such as dynamic sort keys, or XPath expressions that are contained within source documents.

The new STX specification provides the prospect of being able to write SAX filters without needing to do low-level Java coding. If this takes off, I think that the idea of doing a complex transformation as a pipeline of SAX filters, some generated using XSLT and some using STX, may become increasingly attractive. Although XSLT 2.0 deals with nearly all the limitations of XSLT 1.0 in areas such as text processing, grouping, and aggregation, it doesn't address the problem of handling large input documents.

STX at sourceforge

Streaming Transformations for XML (STX) is a one-pass transformation language for XML documents that builds on the Simple API for XML (SAX). STX is intended as a high-speed, low memory consumption alternative to XSLT. Since it does not require the construction of an in-memory tree, it is suitable for use in resource constrained scenarios. The aim of this project is to develop and maintain STX language specification.

Attention good readers. I'm looking for two examples.

1. sax filter feeding an XSLT transform

2. XSLT transform feeding into a Sax filter.

Could you help?