[DDI-users] DDI 3.2: Schema allows double identification sequence
Wackerow, Joachim
Joachim.Wackerow at gesis.org
Thu Dec 11 12:22:45 EST 2014
Jani,
I made a couple of test instances (attached) and tested it with your proposed reusable.xsd using XML Spy.
It validates all test instances even if double URNs or double ID sequences are used.
Therefore it is not a solution.
XML Schema is too limited for this purpose.
Achim
-----Original Message-----
From: ddi-users-bounces at icpsr.umich.edu [mailto:ddi-users-bounces at icpsr.umich.edu] On Behalf Of Jani Hautamäki
Sent: Donnerstag, 11. Dezember 2014 16:07
To: Data Documentation Initiative Users Group
Subject: Re: [DDI-users] DDI 3.2: Schema allows double identification sequence
Achim,
I wonder what went wrong when you tested it.
It is desirable, in my opinion, to minimize the number of XML documents that
are valid wrt schema, but invalid wrt specification. However, at the same time
it is also desirable to strive for simplicity of the schema. It is a trade-off
between grammatical simplicity and rigor.
The two good options I can see are: 1) go for the rigor, 2) go for the simplicity.
Patching the schema goes for the rigor, and the alternative is to go
for the simplicity. The current solution is neither one these.
In the current solution:
schema allows more than is needed (two ID sequences and URNS),
and specification sets the upper-bound by saying: at most one ID and at most one URN.
The opposite way of doing this would be to go for the simplicity of the grammar,
by setting the content of AbstractIdentifiableType to:
URN?, (Agency, ID, Version)?, UserID*
In this case:
schema would allow less than is needed (no ID and no URN),
and specification would set the lower-bound by saying: at least URN or ID.
In a sense, then, the current solution is the worst possible:
it does not achieve either rigor or simplicity.
Here's a script/transcript for patching the schema and executing
the obvious test cases with desired results. http://pastebin.com/dv6z39ZV
The same script is also attached to the end of this message.
I leave it to your discretion whether the patch should be employed or not.
Jani
---
#!/bin/bash
#################
# PRELIMINARIES #
#################
# Required software:
# wget, unzip, libxml2-utils, sed, patch, cat
# Create some temporary work space
mkdir tmp
cd tmp
# Download DDI-L v3.2 specifications
wget http://www.ddialliance.org/Specification/DDI-Lifecycle/3.2/DDI_3_2_2014-05-15.zip
# unzip
unzip DDI_3_2_2014-05-15.zip
# Copy schemas closer to working directory for convenience
cp -r DDI_3_2_2014-02-05/DDI_3_2_2014-05-15_Documentation_XMLSchema/XMLSchema .
#######################
# PATCHING THE SCHEMA #
#######################
# Get the patch
wget http://www.pastebucket.com/paste/download/72481
# Need some byte-level juggling here due to CRLF line-endings...
# Add a trailing CRLF to the patch file
sed -e '$s/$/\r\n/' -i 72481
# Verify that the patch works.
# Here --binary is needed because the file and the patch have CRLF line-endings.
patch --dry-run --binary --verbose XMLSchema/reusable.xsd <72481
# Patch the schema
patch --binary --verbose XMLSchema/reusable.xsd <72481
######################
# TESTING THE SCHEMA #
######################
# Verify the libxml2 version
xmllint --version
# xmllint: using libxml version 20901
# test case valid-1
cat >test-valid1.xml <<EOF
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance xmlns:ddi="ddi:instance:3_2" xmlns:r="ddi:reusable:3_2">
<!-- Has both -->
<r:Agency>acme.org</r:Agency>
<r:ID>ddi_instance</r:ID>
<r:Version>1</r:Version>
<r:URN>urn:ddi:acme.org:ddi_instance:1</r:URN>
</ddi:DDIInstance>
EOF
xmllint --noout -schema XMLSchema/instance.xsd test-valid1.xml
# test-valid1.xml validates
# test case valid-2
cat >test-valid2.xml <<EOF
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance xmlns:ddi="ddi:instance:3_2" xmlns:r="ddi:reusable:3_2">
<!-- Has only Agency/ID/Version -->
<r:Agency>acme.org</r:Agency>
<r:ID>ddi_instance</r:ID>
<r:Version>1</r:Version>
</ddi:DDIInstance>
EOF
xmllint --noout -schema XMLSchema/instance.xsd test-valid2.xml
# test-valid2.xml validates
# test case valid-3
cat >test-valid3.xml <<EOF
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance xmlns:ddi="ddi:instance:3_2" xmlns:r="ddi:reusable:3_2">
<!-- Has only URN -->
<r:URN>urn:ddi:acme.org:ddi_instance:1</r:URN>
</ddi:DDIInstance>
EOF
xmllint --noout -schema XMLSchema/instance.xsd test-valid3.xml
# test-valid3.xml validates
# test case invalid-1
cat >test-invalid1.xml <<EOF
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance xmlns:ddi="ddi:instance:3_2" xmlns:r="ddi:reusable:3_2">
<!-- Has neither -->
</ddi:DDIInstance>
EOF
xmllint --noout -schema XMLSchema/instance.xsd test-invalid1.xml
# test-invalid1.xml fails to validate
# test case invalid-2
cat >test-invalid2.xml <<EOF
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance xmlns:ddi="ddi:instance:3_2" xmlns:r="ddi:reusable:3_2">
<!-- Has double Agency/ID/Version -->
<r:Agency>acme.org</r:Agency>
<r:ID>ddi_instance1</r:ID>
<r:Version>1</r:Version>
<r:Agency>acme.org</r:Agency>
<r:ID>ddi_instance2</r:ID>
<r:Version>1</r:Version>
</ddi:DDIInstance>
EOF
xmllint --noout -schema XMLSchema/instance.xsd test-invalid2.xml
# test-invalid2.xml fails to validate
# test case invalid-3
cat >test-invalid3.xml <<EOF
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance xmlns:ddi="ddi:instance:3_2" xmlns:r="ddi:reusable:3_2">
<!-- Has double URN -->
<r:URN>urn:ddi:acme.org:ddi_instance1:1</r:URN>
<r:URN>urn:ddi:acme.org:ddi_instance2:1</r:URN>
</ddi:DDIInstance>
EOF
xmllint --noout -schema XMLSchema/instance.xsd test-invalid3.xml
# test-invalid3.xml fails to validate
_______________________________________________
DDI-users mailing list
DDI-users at icpsr.umich.edu
http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TestInstances.zip
Type: application/x-zip-compressed
Size: 3328 bytes
Desc: TestInstances.zip
Url : http://lists.icpsr.umich.edu/pipermail/ddi-users/attachments/20141211/bb977fc7/attachment.bin
More information about the DDI-users
mailing list