[DDI-users] DDI 3.2: Schema allows double identification sequence

Thu Dec 11 14:51:26 EST 2014

Achim,

Downloaded the XMLSpy Enterprise Edition 2015 sp2 and requested a 30-day trial.
Once I installed this monstrous 90 MB piece of software and activated the trial period
I opened your NOTOK_* test cases in it. 

Since the schema location on the disk was hard-coded into the XML documents,
I had to change those paths to reflect my system's setup. 
Once I did that, I got the following validation errors.

File C:\src-hg\ddi3-test\achim\NOTOK_DoubleIDSequence.xml is not valid.
	Element <Agency> is not allowed at this location under element <TopicalCoverage>.
		Reason: The following elements are expected at this location (see below)
			'MaintainableObject'
			'URN'
			'Subject'
			'Keyword'
			'UserID'
		Annotations of element 'TopicalCoverage' (see below)
			Describes the topical coverage of the module using Subject and Keyword.
		Annotations of type 'TopicalCoverageType' (see below)
			Describes the topical coverage of the module using Subject and Keyword. Note that upper level modules should include all the members of lower level modules. Subjects are members of structured classification systems such as formal subject headings in libraries or topical thesauri. Keywords are generally unstructured and reflect the terminology found in the document and other related (broader or similar) terms.
		Error location: TopicalCoverage / Agency
		Details
			cvc-complex-type.1.4: Element <Agency> unexpected at this location by type 'TopicalCoverageType' of element <TopicalCoverage>.
			cvc-type.3.2: Element <TopicalCoverage> is not valid with respect to type definition 'TopicalCoverageType'.
			cvc-elt.5.2.1: The element <TopicalCoverage> is not valid with respect to the actual type definition 'TopicalCoverageType'.

File C:\src-hg\ddi3-test\achim\NOTOK_DoubleURN.xml is not valid.
	Element <URN> is not allowed at this location under element <TopicalCoverage>.
		Reason: The following elements are expected at this location (see below)
			'MaintainableObject'
			'Subject'
			'Keyword'
			'UserID'
		Annotations of element 'TopicalCoverage' (see below)
			Describes the topical coverage of the module using Subject and Keyword.
		Annotations of type 'TopicalCoverageType' (see below)
			Describes the topical coverage of the module using Subject and Keyword. Note that upper level modules should include all the members of lower level modules. Subjects are members of structured classification systems such as formal subject headings in libraries or topical thesauri. Keywords are generally unstructured and reflect the terminology found in the document and other related (broader or similar) terms.
		Error location: TopicalCoverage / URN
		Details
			cvc-complex-type.1.4: Element <URN> unexpected at this location by type 'TopicalCoverageType' of element <TopicalCoverage>.
			cvc-type.3.2: Element <TopicalCoverage> is not valid with respect to type definition 'TopicalCoverageType'.
			cvc-elt.5.2.1: The element <TopicalCoverage> is not valid with respect to the actual type definition 'TopicalCoverageType'.

In conclusion, I cannot reproduce the behaviour you have experienced.
Maybe the reusable.xsd you have is not a patched one.

XML Schema is not too restrictive for this purpose.
The grammar producing the desired result is easily 
expressed using a notation similar to the field-level documentation,
making the possible solutions more accessible to wider audiences,

AbstractIdentifiableType ::= URN | ((Agency, ID, Version), URN?), UserID*

and, if desired, it is also possible to arrow arbitary order for the URN and ID sequence
with the following:

AbstractIdentifiableType ::= ((Agency, ID, Version)?, URN) | ((Agency, ID, Version), URN?), UserID*

Feel free to test this one out. On my system both libxml2 2.7.1 and the newly
downloaded XMLspy produced correct results.

This produces exactly the behaviour that the current solution has 
in the sense that the order of URN and ID sequence is not restricted,
but without its defects (it does not allow second URN or ID sequence
nor both missing).

However, as can be seen from the expression above,
this results in the grammar being overly verbose,
and one can start to see why it might be appealing to 
strive for simplicity in this case just by expressing:

AbstractIdentifiableType ::= URN?, (Agency, ID, Version)?, UserID*

and setting the restriction in the specification 
that either URN or ID sequence must be provided.

Jani

________________________________________
From: ddi-users-bounces at icpsr.umich.edu <ddi-users-bounces at icpsr.umich.edu> on behalf of Wackerow, Joachim <Joachim.Wackerow at gesis.org>
Sent: Thursday, December 11, 2014 19:22
To: Data Documentation Initiative Users Group
Subject: Re: [DDI-users] DDI 3.2: Schema allows double  identification  sequence

Jani,

I made a couple of test instances (attached) and tested it with your proposed reusable.xsd using XML Spy.

It validates all test instances even if double URNs or double ID sequences are used.
Therefore it is not a solution.

XML Schema is too limited for this purpose.

Achim