Flatten a structure

1. Removing duplicates
2. How to flatten the source XML tree.


Removing duplicates

Kay Michael

Question expansion

> a while ago I asked something about removing duplicates.
> Most of the answers I got concerned either using
>   <xsl:for-each select="//SPEECH[not(.=preceding::SPEECH)]">
> or
> <xsl:key name="sortKey" match="value" use="var" />
> both of 'em work fine, but can anybody tell which one is more 
> favourable and why?

The "preceding" solution typically has O(n*n) performance, it involves comparing each SPEECH with each SPEECH that precedes it, so as the number of items doubles, elapsed time increases by a factor of four.

The "key" solution typically has O(n log(n)) performance, it involves a adding each item to an index and looking up each item in an index. So when the number of items doubles, elapsed time increases by a factor of only say 2.1

That means that the "preceding" solution may be faster for small files, but as the files get bigger, the "key" solution will win.

Of course, this is all based on assumptions on how the implementations work: assumptions that are reasonable, but not necessarily true of all products.

Jeni Tennison (building on a partial solution from Mike Kay)

Just to expand on Mike's solution a little, here are two templates that operate in 'flatten mode'.

The first matches any element. If it is an element that contains subelements, then it simply applies templates (again in flatten mode) on its children (both elements and text, if there is any). If it is an element that does not contain any subelements - in other words a #PCDATA element or an EMPTY element - then it makes a copy of itself.

<xsl:template match="*" mode="flatten">
    <xsl:when test="*"><xsl:apply-templates mode="flatten" /></xsl:when>
      <xsl:copy-of select="." />

The second template matches any text node and (so long as it isn't empty), places its (normalised) content into an element with the name of its parent element (as in Mike's solution).

<xsl:template match="text()" mode="flatten">
  <xsl:if test="normalize-space(.) != ''">
    <xsl:element name="{name(..)}">
      <xsl:value-of select="normalize-space(.)" />

So, to flatten the content of a particular element, apply templates in flatten mode on its children. In your example:

<xsl:template match="document">
    <xsl:apply-templates mode="flatten" />


How to flatten the source XML tree.

David Carlisle

Q expansion: How to "flatten" the source XML tree, but
preserves information about child-parent relationships.

<mynode name="xxx">
    <mynode name="yyy">
        <mynode name="zzz">

should become

<mynode name="xxx" id="1" parentid=""/>
<mynode name="yyy" id="2" parentid="1"/>
<mynode name="zzz" id="3" parentid="2"/>

<xsl:template match="mynode">
  <mynode name="{@name}" id="{generate-id(.)}"