Omnimark Vs XSLT

1. XSLT and Omnimark
2. XSLT vs Omnimark


XSLT and Omnimark

Various. - All personal opinions.

Does anyone have any feedback of when and where they would recommend using XSL/T, instead of Omnimark or any other text processing language?

Things which make XSLT better than Omnimark:

No need to validate against a DTD
Simpler language and syntax
Easier to debug, largely due to lack of data structures ... its just plain neater.
you want the rendering done in a browser which does not implement Omnimark (none, I assume)
you need scripting portable across all platforms. is there Omnimark for all platforms for which there is Java or C++?
you must work with tools for which you have the source
you think you can persuade non-programmers to write simple XSL but not weird Omnimark
you want to embed your processing in your application. can Omnimark be linked into your own code, as most XSL processors can? i dont know.
you dont deal with vast files today
in xslt there are (generally) no problems with conflicting rules, as xslt has it's own specification for evaluating the most relevant "match" at any point.
Much more powerful for tree querying (DOM oriented): tocs are way simpler with XSL-T

However, I still use Omnimark for somethings such as:

Converting from sgml empty tags to xml empty tags
Replacing the ampersand ("&") in source data entity references with "$!$" so that the pre-xslt xml parser doesn't try to resolve them.
Converting the prolog of the sgml file to a different prolog.
Converting the xml empty tags back to sgml empty tags.
you need speed
you trust a more mature language
you have big files to process
you like a more traditional language
you have trained programmers
You need to work with the DTD and entities
You don't want to build and use the entire document tree, but want a SAX like event triggered environment that doesn't require as much memory
When dealing with large documents.

XSLT does not allow you to output anything to the document prolog aside from the doctype statement.

In omnimark you have to be careful to avoid conflicting rules, and when you get into large conversion scripts, this makes code incredibly cumbersome

Noting: I would certainly hesitate to offer a hostage to fortune by suggesting that there are jobs which only one of the two languages could cope with. you can almost certainly, with greater or less elegance, solve any problem with both. whether the solutions are always *sensible* is another matter. I wrote a KWIC concordance generator in XSLT. possible, but was it wise?

Rick Geimer puts a size perspective on the comparison

When file sizes vary greatly, so does the performance. I suggest finding the average size, doing some benchmarking between XSLT and OmniMark, and picking a cutoff point where you are more comfortable with one over the other. If your sizes range from 20k -> 3MB, then I bet your average size is somewhere around a few hundred k, which is a gray area as far as I am concerned. I'm sure if all your files were 3MB, XSLT would be bogging you down by now.

Also, there are many ways to extend OmniMark besides system calls. You can use the C/C++ api, or web services. If I need to call Java from OmniMark, I write a web service servlet that talks XML-RPC or SOAP, then use omhttp to make the call. This creates a very extensible and scalable solution that I can spread over multiple servers if necessary. Heck, I can even call an XSLT processor to do the easy stuff, then use OmniMark for the more complicated cleanup. In short, I don't think the two are mutually exclusive.


XSLT vs Omnimark

Rick Geimer plus

I use both, but I tend to stick with OmniMark for anything
complex. Here are some pros and cons as I see them:

OmniMark Pros:
	Built in regular expression language
	Support for DTDs
	Complete control over the output 
	Support for SGML as well as XML

OmniMark Cons:
	Proprietary language

	Ignores some wellformedness errors in XML that are
	legal in SGML Will probably never be implementable
	on the client side (i.e. in a browser) Syntax can be
	a little confusing to newcomers

XSLT Pros:
	Random access to the entire tree
	Non-proprietary language with many evolving implementations
	Implementations are appearing on both the client and server side

XSLT Cons: No regular expressions Current implementations
	tend to be a little slow (this is improving, though)
	No support for DTDs Variables that don't vary - (I
	know this is by design, but it is a pain sometimes)
	Syntax can be a little confusing to newcomers

Basically, I like XSLT for the most part, it allows me to do
95% of what I need to do fairly easily, but trying to
accomplish that last 5% of a complex job is a real pain, or
in some cases virtually impossible.

This is probably because the focus of XSLT doesn't meet my
needs. XSLT is a tree transformation tool, and it is very
good at what it does, but if you need to do more than just
move nodes around, you would be better off looking elsewhere
at this time.


James Robertson adds:

* XSLT is becomming pretty common, so many
   people understand it.

* Omnimark is much more powerful, and extensible.

* Omnimark has regular expressions, which are vital
   for almost all real-world work. It also has
   much cleaner handling of multiple files, data
   structures, etc.

* Both have strange, bizzare syntaxes.

* Both are free.

* XSLT has better support for XML (Omnimark
   is primarily an SGML tool). Omnimark is
   improving in this area, though.

* Omnimark primarily works on valid documents
   (ie the ones with DTDs). XSLT works well on
   well-formed documents as well as valid ones.

* Both can be extended using external functions,
   in a variety of languages.

* Omnimark is streaming, and very fast. It doesn't
   require 40meg of ram for a 50kb document (see
   earlier message re: XSLT).

* Omnimark can easily handle 100+ meg documents
   without requiring unreasonable amounts of RAM.

And simple user requirements get steadily more complex as
time goes on, so I want a tool that has plenty of power, and
few limitations.

I would recommend trying both.

Your biggest problem is that both tools have a steep
learning curve.

Ken Holman adds

There is only access to the currently element and its
ancestry (all currently open elements) and no access to
other constructs of the source, thus, the programmer must
accommodate forward referencing (your term "look-ahead

It is OmniMark's responsibility (not the programmer) to emit
the final file with all the programmer-resolved referent
values (it is an error if a referent's value is not defined
by the programmer).  While some term this "two-pass", I've
heard "two-pass" reserved for when it is the programmer's
responsibility to satisfy the second pass, which is not true
in this case.  The programmer only sees the result data
once; the programmer only sees the source data once;
OmniMark sees the result data the second time when filling
in the place-holders and is *very* efficient doing so
entirely behind the scenes without program intervention,
thus I find the term "one-and-a-half-pass" quite apropos.

The streaming nature of OmniMark is great for some problems
and there is no overhead for the source document (it is not
maintained in memory), only for the result document (and the
intermediate result is on disk, not memory; I think referent
values are in memory, but I'm not sure and it doesn't affect
me as a programmer).

The tree nature of XSLT is great for some problems and,
being result oriented, has no overhead for the result (it
can be instantly serialized), but does for the source (the
entire file has to be accessible at all times; currently
this is in memory for the processors I'm aware of).

Two different approaches for transformation ... one isn't
necessarily better or worse than the other in the general
case or language definition, just different to the extent
that a direct comparison of the two is difficult.  I use
both and I choose which one based on the requirement, the
customer, the nature of the data, and the nature of the