Directories

1. Accessing a directory full of xml files.
2. List directory contents into xml

1.

Accessing a directory full of xml files.

Francis Norton



This is a pure DOS / XML / XSLT way of creating an XML file
containing directory listing. It's based on my earlier
solution which didn't tolerate embedded spaces in filenames.
Warning. If a file name contains the ampersand character it
will fail to parse! If you need that, use a java filter to
remove.

For *nix users, According to the XML spec the line-parsing
technique should be OS independent. So you'd change the
batch file (and SED those pesky ampersands while you're at
it), but no change to the XML file, and all you need to do
to the XSL file is swap the '\' for a '/'...:)
(a perl script is added to the end of this answer)

In other words the processing-a-line-separated-file
technique should be portable without change, and the
specific utility should be fairly easily transportable.



The solution now takes a line-separated text file and
processes it into an XML file. Doing this requires two uses
of XML entities, firstly a system entity to read the text
file into the content of an XML element; and secondly a
character entity to access the acii 10 linefeed character to
parse that content.

For anyone unfamiliar with system entities, run the
xmlDir.bat, then see the difference between looking at
xmlDir.xml in a text processor and in an xml processor like
IE5. Ta-da...

I was never very fond of XML entities so this was a useful
exercise for me, I hope it helps others too.

1. The batch file

@echo off
cd > xmlDir.lst
dir *.xml /b >> xmlDir.lst
saxon xmlDir.xml xmlDir.xsl > xmlFiles.xml

Note that the last line needs changing to call your
own saxon processor (not the java version)

2. The xml file

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE xmlDir [
<!ENTITY xmlDirList
           SYSTEM "xmlDir.lst">    
  ]>

<xmlDir>&xmlDirList;</xmlDir>

Note that this won't work until you have created
the entity by running the batch file, and saving
it in a location where the xml file can access it.

3. The xsl file

<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<!-- root function -->
<xsl:template match="/xmlDir">

  <!-- create our root output element -->
  <xmlDir>
<!-- the path is on the first line, filenames on the others -->
    <xsl:call-template name="file">	
      <!-- all the CR-LF pairs have been normalised 
	   to ascii 10, as specified in
	 http://www.w3.org/TR/1998/REC-xml-19980210#sec-line-ends
      -->
      <xsl:with-param name="path" 
	  select="substring-before(string(), '
')" />  
      <xsl:with-param name="flist" 
	  select="substring-after(string(), '
')" />  
    </xsl:call-template> 

  </xmlDir>

</xsl:template>


<!-- process the individual files 
  in the space-separated file list -->
<xsl:template name="file">
  <xsl:param name="path" />
  <xsl:param name="flist" />

  <xsl:if test="$flist != ''">

<!-- output the path and the first filename as one element -->
    <xmlFile><xsl:value-of 
             select = "concat($path, '\', 
	     substring-before($flist, '
'))" 
              /></xmlFile>

<!-- now recurse with same path and rest of the filenames -->
    <xsl:call-template name="file">
      <xsl:with-param name="path" select="$path" />
      <xsl:with-param name="flist" 
	  select="substring-after($flist, '
')" />
    </xsl:call-template>

  </xsl:if>

</xsl:template>

</xsl:stylesheet>    

It was pointed out that this solution ignores any ampersands
in the file names.  The following java program addresses
this.

filename="EscapeAmps.java"

import java.io.*;

public class EscapeAmps
{
  static final int amp = '&';
  public static void main(String[] args)
  {
    int b;
    DataInputStream stdIn = new DataInputStream(System.in);
    try
    {
      while((-1)!=(b=stdIn.read()))
      {
        switch(b)
        {
          case amp:
          {
            System.out.print("&");
            break;
          }
          default:
          {
            System.out.print((char)b);
            break;
          }
        }//switch (b)
      }//while(!eof())
    }
    catch(Exception e)
    {
      return;
    }
    return;
  }
}

for perl users...
from  Beckers, Marc

A self-documenting Perl script that will do what you want
for HTML files.  All you need to do is edit the DOS dir call
(line 23) to read in files with the XML suffix rather than
HTM*.

You must have Perl installed, of course.

The output file is called mother.xml, contains the mother
root element with each path in a file element.  The path
names are relative to the working directory.  You can get
the absolute path by deleting line 42.


# Everything after a hash is a comment
# This perl script scans for all HTM or HTML files 
# in or under the current directory
# and creates an XML file that records 
# where the file is located. The file name is placed
# in a "file" element".
# The outer element of the XML file is "mother". 
# This version 2000-02-09, Chris Bradley

# NOTE: This is WIndows NT-specific. 
# It uses a the DOS "dir" command to create a
# temporary file "temp.xml" that contains the list of HTM and HTML files.
# The temp.xml file is also used to 
# contain the name of the current working directory.
# A UNIX solution would have to use another solution

# Get the current working directory 
# (will be removed later from all input lines of 
# the directory listing)

system ("cd > temp.xml");
open (inputfile,  "temp.xml");
$a=<inputfile>;
close (inputfile);

$curdirnamelen = length ($a);
#print "Length of b is ", $curdirnamelen;

# Here's the DOS "dir" call that traverses 
# the tree and stores into "temp.xml"
# or wherever you want it.

system ("dir /b/s *.xml* > temp.xml");

# Now open the file just created, 
# and use it as input to create the new "mother" XML file

open (inputfile,  "temp.xml");

# Open the "mother" output file for WRITE operations, 
# and call it "mother.xml" in the current directory:
open (outputfile, ">mother.xml");

# Start the mother document with the opening "mother" tag:
print outputfile "<mother>";

# Now scan the file that contains all the filenames, 
# using a "while" loop:
while ($a=<inputfile>) 
{
  # The variable "$a" contains 
  # the current input line from temp.xml.
  chomp($a);  # removes the line feed at the end 
              #of the input line (technical detail)
  
# remove the current working directory from the path name
  $a = substr ($a, $curdirnamelen, length ($a)-1);
  
  # Put opening and closing "file" tags round the current line
  print outputfile "\n  <file>$a</file>"
};

# Now output the closing "mother" tag.
print outputfile "\n</mother>";

# Exit cleanly by closing any open files:
close (inputfile);
close (outputfile); 

# Congratulations, you're through !





2.

List directory contents into xml

Chris Bayes

I wrote a javascript command line utility. to list the directory contents, in xml.

It just creates the following format
<?xml version="1.0"?>
<folder name="temp" dirroot="c:\\temp">
	<folder name="temp">
		<file name="temp.tmp" />
	</folder>
	<file name="temp.tmp" />
</folder>