[DDI-users] DDI 3.2: Schema allows double identification sequence

Wackerow, Joachim Joachim.Wackerow at gesis.org
Thu Dec 11 12:22:45 EST 2014


Jani,

I made a couple of test instances (attached) and tested it with your proposed reusable.xsd using XML Spy.

It validates all test instances even if double URNs or double ID sequences are used. 
Therefore it is not a solution.

XML Schema is too limited for this purpose.

Achim



-----Original Message-----
From: ddi-users-bounces at icpsr.umich.edu [mailto:ddi-users-bounces at icpsr.umich.edu] On Behalf Of Jani Hautamäki
Sent: Donnerstag, 11. Dezember 2014 16:07
To: Data Documentation Initiative Users Group
Subject: Re: [DDI-users] DDI 3.2: Schema allows double identification sequence

Achim,

I wonder what went wrong when you tested it.

It is desirable, in my opinion, to minimize the number of XML documents that 
are valid wrt schema, but invalid wrt specification. However, at the same time 
it is also desirable to strive for simplicity of the schema. It is a trade-off
between grammatical simplicity and rigor.

The two good options I can see are: 1) go for the rigor, 2) go for the simplicity.
Patching the schema goes for the rigor, and the alternative is to go 
for the simplicity. The current solution is neither one these.

In the current solution:
schema allows more than is needed (two ID sequences and URNS), 
and specification sets the upper-bound by saying: at most one ID and at most one URN.

The opposite way of doing this would be to go for the simplicity of the grammar, 
by setting the content of AbstractIdentifiableType to:

URN?, (Agency, ID, Version)?, UserID*

In this case:
schema would allow less than is needed (no ID and no URN), 
and specification would set the lower-bound by saying: at least URN or ID.

In a sense, then, the current solution is the worst possible:
it does not achieve either rigor or simplicity.

Here's a script/transcript for patching the schema and executing 
the obvious test cases with desired results. http://pastebin.com/dv6z39ZV

The same script is also attached to the end of this message.

I leave it to your discretion whether the patch should be employed or not.

Jani

---

#!/bin/bash

#################
# PRELIMINARIES #
#################

# Required software:
# wget, unzip, libxml2-utils, sed, patch, cat

# Create some temporary work space
mkdir tmp
cd tmp

# Download DDI-L v3.2 specifications
wget http://www.ddialliance.org/Specification/DDI-Lifecycle/3.2/DDI_3_2_2014-05-15.zip

# unzip
unzip DDI_3_2_2014-05-15.zip

# Copy schemas closer to working directory for convenience
cp -r DDI_3_2_2014-02-05/DDI_3_2_2014-05-15_Documentation_XMLSchema/XMLSchema .

#######################
# PATCHING THE SCHEMA #
#######################

# Get the patch
wget http://www.pastebucket.com/paste/download/72481

# Need some byte-level juggling here due to CRLF line-endings...

# Add a trailing CRLF to the patch file
sed -e '$s/$/\r\n/' -i 72481

# Verify that the patch works.
# Here --binary is needed because the file and the patch have CRLF line-endings.
patch --dry-run --binary --verbose XMLSchema/reusable.xsd <72481

# Patch the schema
patch --binary --verbose XMLSchema/reusable.xsd <72481

######################
# TESTING THE SCHEMA #
######################

# Verify the libxml2 version
xmllint --version
# xmllint: using libxml version 20901

# test case valid-1

cat >test-valid1.xml <<EOF
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance xmlns:ddi="ddi:instance:3_2" xmlns:r="ddi:reusable:3_2">
  <!-- Has both -->
  <r:Agency>acme.org</r:Agency>
  <r:ID>ddi_instance</r:ID>
  <r:Version>1</r:Version>
  <r:URN>urn:ddi:acme.org:ddi_instance:1</r:URN>
</ddi:DDIInstance>
EOF

xmllint --noout -schema XMLSchema/instance.xsd test-valid1.xml
# test-valid1.xml validates

# test case valid-2

cat >test-valid2.xml <<EOF
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance xmlns:ddi="ddi:instance:3_2" xmlns:r="ddi:reusable:3_2">
  <!-- Has only Agency/ID/Version -->
  <r:Agency>acme.org</r:Agency>
  <r:ID>ddi_instance</r:ID>
  <r:Version>1</r:Version>
</ddi:DDIInstance>
EOF

xmllint --noout -schema XMLSchema/instance.xsd test-valid2.xml
# test-valid2.xml validates

# test case valid-3

cat >test-valid3.xml <<EOF
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance xmlns:ddi="ddi:instance:3_2" xmlns:r="ddi:reusable:3_2">
  <!-- Has only URN -->
  <r:URN>urn:ddi:acme.org:ddi_instance:1</r:URN>
</ddi:DDIInstance>
EOF

xmllint --noout -schema XMLSchema/instance.xsd test-valid3.xml
# test-valid3.xml validates

# test case invalid-1

cat >test-invalid1.xml <<EOF
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance xmlns:ddi="ddi:instance:3_2" xmlns:r="ddi:reusable:3_2">
  <!-- Has neither -->
</ddi:DDIInstance>
EOF

xmllint --noout -schema XMLSchema/instance.xsd test-invalid1.xml
# test-invalid1.xml fails to validate

# test case invalid-2

cat >test-invalid2.xml <<EOF
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance xmlns:ddi="ddi:instance:3_2" xmlns:r="ddi:reusable:3_2">
  <!-- Has double Agency/ID/Version -->
  <r:Agency>acme.org</r:Agency>
  <r:ID>ddi_instance1</r:ID>
  <r:Version>1</r:Version>
  <r:Agency>acme.org</r:Agency>
  <r:ID>ddi_instance2</r:ID>
  <r:Version>1</r:Version>
</ddi:DDIInstance>
EOF

xmllint --noout -schema XMLSchema/instance.xsd test-invalid2.xml
# test-invalid2.xml fails to validate

# test case invalid-3

cat >test-invalid3.xml <<EOF
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance xmlns:ddi="ddi:instance:3_2" xmlns:r="ddi:reusable:3_2">
  <!-- Has double URN -->
  <r:URN>urn:ddi:acme.org:ddi_instance1:1</r:URN>
  <r:URN>urn:ddi:acme.org:ddi_instance2:1</r:URN>
</ddi:DDIInstance>
EOF

xmllint --noout -schema XMLSchema/instance.xsd test-invalid3.xml
# test-invalid3.xml fails to validate

_______________________________________________
DDI-users mailing list
DDI-users at icpsr.umich.edu
http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TestInstances.zip
Type: application/x-zip-compressed
Size: 3328 bytes
Desc: TestInstances.zip
Url : http://lists.icpsr.umich.edu/pipermail/ddi-users/attachments/20141211/bb977fc7/attachment.bin 


More information about the DDI-users mailing list