[DDI-users] DDI 3.2: optional xml:lang attribute in StringType

Jani Hautamäki Jani.Hautamaki at staff.uta.fi
Wed Dec 10 10:32:49 EST 2014


Thanks for clarifying this.

It might be a good idea to amend the field-level specification
for InternationalStringType with something like this:

The value for the attribute xml:lang must always be defined
either implicitly or explicitly.
If the attribute is not explicitly set,
its value is inherited from the parent.
Consequently, there can be only one <r:String> child
without xml:lang set explicitly within an InternationalStringType element.

--

The inheritance of information (eg. xml:lang) from parent creates
some issues with respect to the XML linearisation process
(the process of converting the hierarchical DDIInstances into
flat list of Fragments). These issues are solvable though.

An example follows.

First, there is the minimalistic ddi:DDIInstance with
id "ddi_instance". The @xml:lang is defined in the root element.

ddi_instance.xml
------8<------8<------
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance
    xmlns:ddi="ddi:instance:3_2"
    xmlns:r="ddi:reusable:3_2"
    xmlns:s="ddi:studyunit:3_2"
    xml:lang="fi"
    >

  <r:Agency>acme.org</r:Agency>
  <r:ID>ddi_instance</r:ID>
  <r:Version>1</r:Version>

  <s:StudyUnit>
    <r:Agency>acme.org</r:Agency>
    <r:ID>study_unit</r:ID>
    <r:Version>1</r:Version>

    <r:Citation>
      <r:Title>
        <r:String>otsikko suomeksi</r:String>
      </r:Title>
    </r:Citation>

  </s:StudyUnit>
</ddi:DDIInstance>
------8<------8<------

The <r:String> element does not have the attribute @xml:lang explicitly
set. However, the root element has @xml:lang explicitly set to "en",
and therefore the <s:StudyUnit>'s title has the @xml:lang implicitly
set to "en" too.

Because the <s:StudyUnit> is inherited from r:AbstractMaintainableType,
it is possible to maintain it as an independent object.

As an independent object the <s:StudyUnit> does not have @xml:lang
defined for any of its children. Moreover, assuming a default @xml:lang
value "en_us" for the <s:StudyUnit> would lead to a contradiction.

This problem becames more apparent, when the above ddi:DDIInstance
is linearised into flat list of ddi:Fragments, and then the StudyUnit's
Fragment is reused in another DDIInstance.

Here's the linearised version of the above document,

fragments1.xml
------8<------8<------
<?xml version="1.0" encoding="utf-8"?>
<ddi:FragmentInstance
    xmlns:ddi="ddi:instance:3_2"
    xmlns:r="ddi:reusable:3_2"
    xmlns:s="ddi:studyunit:3_2"
    >

  <ddi:Fragment>
    <ddi:DDIInstance xml:lang="fi">

      <r:Agency>acme.org</r:Agency>
      <r:ID>ddi_instance</r:ID>
      <r:Version>1</r:Version>

      <r:StudyUnitReference>
        <r:Agency>acme.org</r:Agency>
        <r:ID>study_unit</r:ID>
        <r:Version>1</r:Version>
        <r:TypeOfObject>StudyUnit</r:TypeOfObject>
      </r:StudyUnitReference>
    </ddi:DDIInstance>
  </ddi:Fragment>

  <ddi:Fragment>
    <s:StudyUnit>

      <r:Agency>acme.org</r:Agency>
      <r:ID>study_unit</r:ID>
      <r:Version>1</r:Version>

      <r:Citation>
        <r:Title>
          <r:String>otsikko suomeksi</r:String>
        </r:Title>
      </r:Citation>
    </s:StudyUnit>
  </ddi:Fragment>

</ddi:FragmentInstance>
------8<------8<------

Now, consider another FragmentInstance, which reuses the StudyUnit
from the previous FragmentInstance

fragments2.xml
------8<------8<------
<?xml version="1.0" encoding="utf-8"?>
<ddi:FragmentInstance
    xmlns:ddi="ddi:instance:3_2"
    xmlns:r="ddi:reusable:3_2"
    xmlns:s="ddi:studyunit:3_2"
    >

  <ddi:Fragment>
    <ddi:DDIInstance xml:lang="en">

      <r:Agency>acme.org</r:Agency>
      <r:ID>another_ddi_instance</r:ID>
      <r:Version>1</r:Version>

      <r:StudyUnitReference>
        <r:Agency>acme.org</r:Agency>
        <r:ID>study_unit</r:ID>
        <r:Version>1</r:Version>
        <r:TypeOfObject>StudyUnit</r:TypeOfObject>
      </r:StudyUnitReference>
    </ddi:DDIInstance>
  </ddi:Fragment>

</ddi:FragmentInstance>
------8<------8<------

This DDIInstance is identified as "another_ddi_instance",
and it differs from the previous "ddi_instance" only by setting
a different value for the xml:lang.

Given these two FragmentInstances, "fragments1.xml" and "fragments2.xml",
is is possible to reconstruct (or "delinearise") two different DDIInstances.
One is equal to the first XML document, "ddi_instance.xml",
and the other DDIInstance is shown below,

another_ddi_instance.xml
------8<------8<------
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance
    xmlns:ddi="ddi:instance:3_2"
    xmlns:r="ddi:reusable:3_2"
    xmlns:s="ddi:studyunit:3_2"
    xml:lang="en"
    >

  <r:Agency>acme.org</r:Agency>
  <r:ID>another_ddi_instance</r:ID>
  <r:Version>1</r:Version>

  <s:StudyUnit>
    <r:Agency>acme.org</r:Agency>
    <r:ID>study_unit</r:ID>
    <r:Version>1</r:Version>

    <r:Citation>
      <r:Title>
        <r:String>otsikko suomeksi</r:String>
      </r:Title>
    </r:Citation>

  </s:StudyUnit>
</ddi:DDIInstance>
------8<------8<------

When "ddi_instance.xml" and "another_ddi_instance.xml" are compared,
they both contain literally identical manifestation of the StudyUnit
identified as "acme.org:study_unit:1".

However, the StudyUnit's @xml:lang is implicitly set, and its
value depends on which DDIInstance is being used. This leads
to an ambiguous definition of the Maintainable in question.

Obviously, this is undesirable.

The Title of the StudyUnit "acme.org:study_unit:1" should have
the language defined independently of the containing DDIInstance.

The resolution to this problem is obvious though, but has a notable
impact on the linearisation/delinearisation process.

When doing a linearisation (DDIInstance->FragmentInstance) each inherited
attribute must be explicitly set on each Maintainable before
the Maintainables are dismantled into Fragments.

Conversely, after a Maintainable has been delinearised
(FragmentInstance->DDIInstance), the redundant xml:lang attributes
should be cleaned up to reduce noise.

This approach implies that an XML document must be considered
equivalent to itself when the inherited values are explictly set.

The question is then:
Is the following document equivalent to the first one with
respect to the DDI-Lifecycle 3.2 specification,
as required by my approach?

The original document "ddi_instance.xml" is transformed so
that each xml:lang attribute has its value explicitly set,
and the XML document should no longer rely on inheritance in any place.

ddi_intance.noinheritance.xml
------8<------8<------
<?xml version="1.0" encoding="utf-8"?>
<ddi:DDIInstance
    xmlns:ddi="ddi:instance:3_2"
    xmlns:r="ddi:reusable:3_2"
    xmlns:s="ddi:studyunit:3_2"
    xml:lang="fi"
    >

  <r:Agency>acme.org</r:Agency>
  <r:ID>ddi_instance</r:ID>
  <r:Version>1</r:Version>

  <s:StudyUnit xml:lang="fi">
    <r:Agency>acme.org</r:Agency>
    <r:ID>study_unit</r:ID>
    <r:Version>1</r:Version>

    <r:Citation>
      <r:Title>
        <r:String xml:lang="fi">otsikko suomeksi</r:String>
      </r:Title>
    </r:Citation>

  </s:StudyUnit>
</ddi:DDIInstance>
------8<------8<------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.icpsr.umich.edu/pipermail/ddi-users/attachments/20141210/2b9ec801/attachment.html 


More information about the DDI-users mailing list