oXygen XML Editor

Schema Design Questions

Schema design

1. include and externalRef
2. Combining two name classes
3. Attribute ordering
4. Whitespace in attributes
5. ANY, Trang schema from DTD
6. Attribute after text
7. How to combine schemas
8. Grammars in included Schemas
9. Line terminator and whitespace
10. IDREF or NCName
11. ambiguity checks
12. How to make a list of attibutes which is not case sensitive.
13. Open content model; xlink schema
14. Jing error 'oneOrMore contains group'
15. Attribute error with ID's
16. Simplified? Simple syntax?

1.

include and externalRef

John Cowan


> I'm sorry if I missed this in the documentation. I notice in the spec
> that <include> must point to a file that has a <grammar> element as
> root. Is the same true for <externalRef> or can I have that file be a
> simple <element> declaration as its root? 

Sure. externalRef can point at any pattern whatsoever. It's just a straight inclusion.

2.

Combining two name classes

Eric van der Vlist


 > If an element node has both ./@name and ./name, is it an error or are they combined?
 

If you mean a pattern element, it's an error, but you can explicitly combine two name name classes through a "choice", for instance:

 <element>
  <choice>
   <name>foo</name>
   <name>bar</name>
  </choice>
  <empty/>
 </element>	    

3.

Attribute ordering

John Cowan


> is it legal to have

> <element name="x">
>   <text/>
>   <attribute name="y"/>
> </element>

Sure. Because attribute ordering doesn't matter in XML, RNG always treats attribute patterns in a group pattern (in this case, an implicit one) as if they were interleaved. Therefore, they can appear in any position when they are children of an element pattern.

4.

Whitespace in attributes

John Cowan


> Does <attribute name="x"><empty/></attribute> 
> match an attribute value consisting
> only of whitespace? 

Yes. The only time whitespace is not ignored is in data or value patterns where the datatype preserves it, which in practice is only in string and xsd:string.

5.

ANY, Trang schema from DTD

James Clark


> The semantics of ANY in a DTD are essentially (#PCDATA|FOO|BAR|BAZ|...)*,
> where FOO, BAR, BAZ, ... are the declared elements in the DTD.  It does
> not mean that "any XML" is permissible here, but that's how Trang
> translates it.

Right. But DTDs don't have real wildcards, so if a DTD author wants a real wildcard, the best they can do is approximate it with ANY. Most of the uses of ANY I've come across are cases of this, rather than people actually wanting the DTD semantics.

In any case, if you want exactly the DTD semantics, trang can give it to you. See the -i strict-any option: thaiopensource.com

6.

Attribute after text

John Cowan


> Does <attribute name="x"><empty/></attribute>
>  match an attribute value consisting
> only of whitespace? 

Yes. The only time whitespace is not ignored is in data or value patterns where the datatype preserves it, which in practice is only in string and xsd:string.

7.

How to combine schemas

Eric van der Vlist


> start = document
> include "docbook.rnc"
> include "mods3.rnc"
> document = element document { meta, content }
> meta = ModsSchema
> content = { "article" | "book" | "chapter" }
> ===============================

> What am I doing wrong??

If you include the schema for docbook, you need to either redefine the start pattern defined in the schema for docbook or combine its content with a new one. Since here you want to replace its definition, the only option left is to redefine it:

include "docbook.rnc" {
  start = document
}
include "mods3.rnc"
document = element document { meta, content }
meta = ModsSchema
content = { "article" | "book" | "chapter" }

BTW, this last pattern means that content is a value equal to "article", "book" or "chapter". This is illegal to include that after "meta" if it's an element and, in any case, probably not what you mean. What about:

content = { article | book | chapter }


> perhaps there needs to be a way to delete a rule at
> include time?

I think that this is already possible.

Consider:

inc1.rnc
start = element foo{empty}

inc2.rnc
start = element bar{empty}

If you want to include inc1.rnc and keep only the definition of the start element in inc2.rnc, you can write:

include "inc1.rnc" {
	start |= notAllowed
}
include "inc2.rnc"

8.

Grammars in included Schemas

James Clark


> Are implicit grammars allowed in included schemas?

They are allowed. Each file in a compact syntax schema is translated individually to XML syntax. Each file is translated the same way whether its the top level file, referenced in include or referenced in external. The compact syntax schema is correct if the requirements of Appendix A of the Compact Syntax spec are met and if the resulting XML syntax schema is correct.

9.

Line terminator and whitespace

James Clark


> do I understand it correctly that while #xA terminates a comment and is a whitespace
> in the compact syntax,

Right. This ensures that a user can always replace any literal character in the original source by an escape.

>  #xD does not and is not?

Right. There didn't seem any reason why it should be whitespace. This is covered by: relaxng.org

10.

IDREF or NCName

David Tolpin

Suppose a RELAX NG schema containing a definition for references:

<define name="reference">
  <attribute name="ref">
    <data type="IDREF"/>
  </attribute>
</define>

Now, I want to change this definition in a derived schema, such that a reference now refers either to an element within in the XML document or to an element within an other XML document. Thus, I enhance the definition with a further attribute transitivly referring to the other XML document. I would like to have the same attribute name for the reference to the element because the element is of the same kind as the original one.

<define name="reference">
  <choice>
    <group>
      <attribute name="instance">
	  <data type="IDREF"/>
      </attribute>
      <attribute name="ref">
	  <data type="NCName"/>
      </attribute>
    </group>
    <attribute name="ref">
      <data type="IDREF"/>
    </attribute>
  </choice>
</define>

But, jing returns an error message:


error: conflicting ID-types for attribute "ref" of element ...

Is it not possible to define context-sensitive ID-types?

David answers:

these are two separate problems, I think. One is whether duplicate attributes are allowed -- they are not, as stated in 7.3.

The other is whether you need it; NCName and IDREF are the same type in Relax NG. Your grammar is equivalent to

<define name="reference">
  <optional>
    <attribute name="instance">
	<data type="IDREF"/>
    </attribute>
  </optional>
  <attribute name="ref">
    <data type="IDREF"/>
  </attribute>
</define>

or, since you've mentioned the compact syntax

reference=attribute instance {xsd:IDREF}?, attribute ref {xsd:IDREF}

> The way I understand the concept of IDREF, this type contains only
> those values, which are bound by the type ID in the same document.
> Thus, for my problem of a reference to an external element I have 
> to use the more general data type NCName.

Referential semantics is beyond the scope of Relax NG; IDREF are only checked syntacically, which is, in my opinion is good thing.


    
  

11.

ambiguity checks

Eric van der Vlist


> A grammar can be checked for ambiguity, but an ambiguous grammar is normal
> for Relax NG.  http://www.kohsuke.org/relaxng/meter/

Also, be aware that there are several definitions of what an "ambiguous grammar" is and that this tool is only checking the definition described in its documentation. In other words, it's a tool for a specific range of applications that may not be meeting your requirements.

12.

How to make a list of attibutes which is not case sensitive.

Kohsuke Kawaguchi

I currently have a solution :

ATTR_type  = attribute type {xsd:token {pattern =
"[Ss][Yy][Ss][Tt][Ee][Mm]"} |

  xsd:token {pattern =
"[Mm][Ee][Nn][Uu]"}}

Have you another solution?.

Define your own case insensitive datatype library. See thaiopensource.com

13.

Open content model; xlink schema

John Cowan


   
> I've been working on defining a RelaxNG schema for the upcoming SVG 
> 1.2, which in turn involves also creating schemata for sXBL, XLink, 
> and XML Events. Many parts of these specs have open content models, 
> where arbitrary elements with arbitrary attributes are allowed
recursively.

You may find my non-normative XLink schema at xml-dev archive useful.

=====cut here=====

# RELAX NG schema for W3C XLink

datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes";
namespace xlink = "http://www.w3.org/1999/xlink";

href = attribute xlink:href {xsd:anyURI}
role = attribute xlink:role {xsd:anyURI}
arcrole = attribute xlink:arcrole {xsd:anyURI}
title.att = attribute xlink:title {text}
show = attribute xlink:show {"new"|"replace"|"embed"|"other"|"none"}
actuate = attribute xlink:actuate {"onLoad"|"onRequest"|"other"|"none"}
label = attribute xlink:label {xsd:NMTOKEN}
from = attribute xlink:from {xsd:NMTOKEN}
to = attribute xlink:to {xsd:NMTOKEN}

simple = element * {
		attribute xlink:type {"simple"}, anyAttr*,
		href?, role?, arcrole?, title.att?, show?, actuate?,
		(anyElem | text)*
		}

extended = element * {
		attribute xlink:type {"extended"}, anyAttr*,
		role?, title.att?,
		(title | resource | locator | arc | anyElem | text)*
		}

title = element * {
		attribute xlink:type {"title"}, anyAttr*,
		(anyElem | text)*
		}

resource = element * {
		attribute xlink:type {"resource"}, anyAttr*,
		role?, title.att?, label?,
		(anyElem | text)*
		}

locator = element * {
		attribute xlink:type {"locator"}, anyAttr*,
		href, role?, title.att?, label?,
		(title | anyElem | text)*
		}

arc = element * {
		attribute xlink:type {"arc"}, anyAttr*,
		arcrole?, title.att?, show?, actuate?, from?, to?,
		(title | anyElem | text)*
		}

start = element * { anyAttr*, (simple | extended | anyElem)* }

anyElem = element * {anyAttr*, (anyElem | text)*}

anyAttr = attribute * - xlink:*  {text}

14.

Jing error 'oneOrMore contains group'

John Cowan et al



> I have developed a grammar by reading the RelaxNG tutorial, but when I
> attempt to validate it using Jing (within OxygenXML) I get a brief error
> message I cannot understand: E "oneOrMore" contains "group" contains
> "attribute"

> namespace rng = "http://relaxng.org/ns/structure/1.0"
> datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"

> start = element ConfigRoot  { ConfigGroup* }

> NamedElementAttributes =
>    attribute name { text },
>    attribute platform { text }?

> StringValue = element StringValue { xsd:string }
> IntValue = element IntValue { xsd:integer }
> FloatValue = element FloatValue { xsd:float }
> BooleanValue = element BooleanValue { xsd:boolean }
> PrimitiveValue = (StringValue | IntValue | FloatValue | BooleanValue)

> Map = element Map { (MapEntry)* }
> MapEntry = element MapEntry { PrimitiveValue }
> Array = element Array { (PrimitiveValue)* }

> ConfigValue =  ( PrimitiveValue | Map | Array ), NamedElementAttributes

> ConfigGroupEntry = (ConfigGroup |  ConfigValue)

> ConfigGroup =  element ConfigGroup {
>    ConfigGroupEntry*,
>    NamedElementAttributes
> }

You bumped up against one of the few restrictions of RELAX NG: you can't have an iteration (either + or *, they produce the same message) that contains a sequence (or interleave, same thing in this case) of attributes.

What you have written is tantamount to requesting an attribute sequence like attribute foo, attribute bar, attribute foo, attribute bar .... be allowed. This makes no sense, given that there can only be one attribute of a given name in a given element.

Fortunately, the problem is easily cured by removing the first reference to NamedElementAttribute.

> When I use the inbuilt Trang tool, I can successfully convert the compact
> grammar to XML form, but a conversion on to XML Schema results in a message
> like "Choice between attributes and children cannot be presented;
> approximating", but no schema object is produced.

You can't expect correct results when applying Trang to a schema that Jing rejects. The corrected version does convert to a plausible XSD schema.


> Any general advice on how to "debug" a RelexNG grammar would also be
> appreciated.

Try to avoid creating sequences/interleaves that contain both attributes and data. They tend to be problematic.

Bob Foster goes into a deeper explanation

> I have developed a grammar by reading the RelaxNG tutorial, but when I 
> attempt to validate it using Jing (within OxygenXML) I get an brief 
> error message I cannot understand: E "oneOrMore" contains "group" 
> contains "attribute"

ConfigGroupEntry* where ConfigGroupEntry has as an alternative ConfigValue, a group that contains a group, NamedElementAttributes, that contains two attributes.

The "oneOrMore" reference is because * (zeroOrMore) is defined in RELAX NG as +? (optional oneOrMore).

NamedElementAttributes is obviously redundant as it is used, so a simple fix would be to take it out of ConfigValue.


> When I use the inbuilt Trang tool, I can successfully convert the 
> compact grammar to XML form, but a conversion on to XML Schema results 
> in a message like "Choice between attributes and children cannot be 
> presented; approximating", but no schema object is produced.

Trang can't convert everything, esp. an invalid schema, and even if it did the conversion could not be faithful where there is a choice between an attribute and an element. XML Schema can't describe this.


> There doesn't seem to be any reference or further explanation for 
> parsing errors. I have spent several hours on this and Will Have to 
> abandon the use of RelaxNG unless I can obtain a speedy resolution.

Don't give up. Granted the error messages from jing can be pretty cryptic, but in this case you had all the info you needed, you just didn't understand it. OTOH, if you had spent the same several hours trying to write an XML Schema with no previous knowledge of that language, I wager you wouldn't be anywhere near as far along as you are. ;-}


> My grammar is shown below, including a leading * indicating the line the 
> validator fails on. Note that it relates to the only case of recursion 
> in the grammar- a ConfigGroup can contain nested ConfigGroups within 
> it.  Is anyone able to offer some explanation as to what these problems 
> mean? Any general advice on how to "debug" a RelexNG grammar would also 
> be appreciated.

First, a grammar isn't thoroughly checked until you try to validate with it, because some grammars are perfectly ok as fragments but not acceptable for validation. So exercise the grammar on simple test cases to make sure it is fully checked and gives you the result you wanted.

Second, compact syntax error messages from jing come in two flavors: "syntax error," for which the only cure is to stare at the location identified and consult the syntax until you see the error, and substantive error messages, like the one you describe, that usually refer to constraints described in RELAX NG (non-compact) oasis.org. In this case, you could have found the restriction by searching the restriction section for "attribute".

MURATA Makoto takes up the spec references

If you check the RELAX NG specification, you'll find, in section 7.1.2, a prohibition of the path:


oneOrMore//group//attribute
oneOrMore//group//attribute

Consider an illegal pattern

(element * {text}, attribute * {text})*

This means that the number of attributes and that of elements are equal.

Next, consider another illegal pattern

(attribute a:* {text}, attribute b:* {text})*

This means that the number of attributes in the namespace for the prefix a is the same as the number of attributes in the namespace for the prefix b.

To implement such patterns, we cannot use abstract machines having *finite* states, since we need one state for each natural number. Then, Bali and Miaou (which I should have finished long time ago) will become impossible.

Jing constructs patterns (which are "states") lazily. Thus, the above patterns can be implemented actually. However, there are some malicious patterns which cause (an earlier version of) Jing to explode, because so many patterns have to be created during evaluation.

15.

Attribute error with ID's

MURATA Makoto



> xmlschema.rng:718:18: error: conflicting ID-types for attribute 
> "id" of element 
>"include" from namespace " http://www.w3.org/2001/XMLSchema"

This error happens when (1) some attribute declaration has a wildcard matching absolutely any namespace and (2) you have an ID attribute somewhere else in your schema.

There are two solutions. One is to make the wildcard match all namespaces EXCEPT the namespace of the element declaration having that ID attribute. The other is to convert the ID attribute to a string attribute.

In your case, I would rewrite

 <element><anyName/>...</element> with 
 <element>
    <anyName>
     <except><nsName ns="http://www.w3.org/2001/XMLSchema"/>
     </except>
    </anyName>.
  </element>

A fundamendal reasons is as follows. In DTD, an application program can determine whether an attribute is ID by examining the attribute name and the parent element name. The DTD Compatibility specification of RELAX NG is intended to mimic DTDs and ensures this simplicity.

16.

Simplified? Simple syntax?

David Carlisle

> ? What's the definition of 'simplified' please?
>

Simplification is a transformation defined in the RELAX NG specification: relaxng.org The simple syntax (not to be confused with the compact syntax) is a subset of the full syntax which you normally author schemas in. The semantics of RELAX NG validation are entirely described in terms of the simple syntax.

Simply put, the simplification transformation flattens a schema into a more DTD like structure where the only remaining top level constructs are definitions containing a single element pattern. Additionally, there is a single start pattern. All other named patterns, includes, overrides, combination attributes and nested grammars go away. Also, inherited attributes are resolved so you don't have to look upwards to determine the value of for example a ns attribute.

So, if you want to process a schema and don't care how the originial schema was structured and modularized, the simplified version is just much simpler to use.

>> relaxng.org has links to tools that can perform the simplification 
>> step (e.g. rng2srng).

rng2srng is a tool which will perform the simplification transformation described above.