[DDI-users] [DDI-SRG] ISSUE 602

Jeremy Iverson jeremy at colectica.com
Sun Jun 2 12:00:45 EDT 2013


Excellent. This will give each LiteralText a Content element with xml:lang.

There remains a separate problem. When we specify the language 
individually for each segment, it is impossible for a machine to know 
the actual language of the full question. For example, in a question 
with three segments, 2 in English and one in German:

   QuestionText [lang = ???]
     LiteralText
       Text
         Content lang="en"
     LiteralText
       Text
         Content lang="de"
     LiteralText
       Text
         Content lang="en"

I see a few ways we could guess:

- the language of the first segment
- the language of the last segment
- the language with the most words
- pick one randomly
- telephone the author and ask what they intended

These are not precise. Putting the language on QuestionText allows us to 
explicitly state what we mean.


On 6/2/2013 5:56 PM, Wendy Thomas wrote:
> I'll note correction of line 1588 from name= to ref=  which should
> fix this.
>
>
> On Sun, Jun 2, 2013 at 5:46 PM, Jeremy Iverson <jeremy at colectica.com>
> wrote:
>> Wendy,
>>
>> There is a bug here. The samples in Mantis issue #602 do not
>> validate.
>>
>> I am looking at proposed3.2/schema/datacollection.xsd, svn revision
>> 157, line 1581 - 1596, the LiteralTextType is extended from
>> TextContentType, which gives it an r:Description element.
>> LiteralTextType also defines an element named Text, which has no
>> type information.
>>
>> Line 1588 should have a ref= instead of a name= to tie it to to the
>> type defined on line 1597. As it is now, the type of Text element
>> is xs:anyType.
>>
>>
>>
>> On 6/2/2013 5:06 PM, Wendy Thomas wrote:
>>>
>>> See under LiteralText/Text  TextType extension base="r:Content"
>>> @xml:space
>>>
>>> r:Content is the language specific subelement of
>>> StructuredStringType
>>>
>>>
>>> <xs:complexType name="DynamicTextType"> <xs:annotation>
>>> <xs:documentation>Structure supporting the use of dynamic text,
>>> where portions of the textual contend change depending on
>>> external information (pre-loaded data, response to an earlier
>>> query, environmental situations, etc.).</xs:documentation>
>>> </xs:annotation> <xs:sequence> <xs:element ref="TextContent"
>>> maxOccurs="unbounded"> <xs:annotation> <xs:documentation>This is
>>> the head of a substitution group and is never used directly as an
>>> element name. Instead it is replaced with either LiteralText or
>>> ConditionalText.</xs:documentation> </xs:annotation>
>>> </xs:element> </xs:sequence> <xs:attribute
>>> name="isStructureRequired" type="xs:boolean" default="false">
>>> <xs:annotation> <xs:documentation>If textual structure (e.g.
>>> size, color, font, etc.) is required to understand the meaning of
>>> the content change value to "true".</xs:documentation>
>>> </xs:annotation> </xs:attribute> </xs:complexType> <xs:element
>>> name="TextContent" type="TextContentType" abstract="true">
>>> <xs:annotation> <xs:documentation>Abstract type existing as the
>>> head of a substitution group. May be replaced by any valid member
>>> of the substitution group TextContent.</xs:documentation>
>>> </xs:annotation> </xs:element> <xs:complexType
>>> name="TextContentType" abstract="true"> <xs:annotation>
>>> <xs:documentation>Abstract type existing as the head of a
>>> substitution group. May be replaced by any valid member of the
>>> substitution group TextContent. Provides the common element
>>> Description to all members using TextContent as an extension
>>> base.</xs:documentation> </xs:annotation> <xs:sequence>
>>> <xs:element ref="r:Description" minOccurs="0"> <xs:annotation>
>>> <xs:documentation>A description of the content and purpose of
>>> the text segment. May be expressed in multiple languages and
>>> supports the use of structured content.</xs:documentation>
>>> </xs:annotation> </xs:element> </xs:sequence> </xs:complexType>
>>> <xs:element name="LiteralText" type="LiteralTextType"
>>> substitutionGroup="TextContent"> <xs:annotation>
>>> <xs:documentation>A substitution for TextContent containing the
>>> static (unchanging) text.</xs:documentation> </xs:annotation>
>>> </xs:element> <xs:complexType name="LiteralTextType">
>>> <xs:annotation> <xs:documentation>Literal (static) text to be
>>> used in the instrument using the StructuredString structure plus
>>> an attribute allowing for the specification of white space to be
>>> preserved.</xs:documentation> </xs:annotation>
>>> <xs:complexContent> <xs:extension base="TextContentType">
>>> <xs:sequence> <xs:element name="Text"> <xs:annotation>
>>> <xs:documentation>The value of the static text string. Supports
>>> the optional use of XHTML formatting tags within the string
>>> structure. If the content of a literal text contains more than
>>> one language, i.e. "What is your understanding of the German word
>>> 'Gesundheit'?", the foreign language element should be placed in
>>> a separate LiteralText component with the appropriate xml:lang
>>> value and, in this case, isTranslatable set to "false". If the
>>> existance of white space is critical to the understanding of the
>>> content (such as inclusion of a leading or trailing white space),
>>> set the attribute of Text xml:space to
>>> "preserve".</xs:documentation> </xs:annotation> </xs:element>
>>> </xs:sequence> </xs:extension> </xs:complexContent>
>>> </xs:complexType> <xs:element name="Text" type="TextType">
>>> <xs:annotation> <xs:documentation>The static portion of the text
>>> expressed as a StructuredString with the ability to preserve
>>> whitespace if critical to the understanding of the
>>> content.</xs:documentation> </xs:annotation> </xs:element>
>>> <xs:complexType name="TextType"> <xs:annotation>
>>> <xs:documentation>The static portion of the text expressed as a
>>> StructuredString with the ability to preserve whitespace if
>>> critical to the understanding of the content.</xs:documentation>
>>> </xs:annotation> <xs:complexContent> <xs:extension
>>> base="r:ContentType"> <xs:attribute ref="xml:space"
>>> default="default"> <xs:annotation> <xs:documentation>The default
>>> setting states that leading and trailing white space will be
>>> removed and multiple adjacent white spaces will be treated as a
>>> single white space. If the existance of any of these white spaces
>>> is critical to the understanding of the content, change the value
>>> of this attribute to "preserve".</xs:documentation>
>>> </xs:annotation> </xs:attribute> </xs:extension>
>>> </xs:complexContent> </xs:complexType> <xs:element
>>> name="ConditionalText" type="ConditionalTextType"
>>> substitutionGroup="TextContent"> <xs:annotation>
>>> <xs:documentation>A substitution for TextContent, contains
>>> command code or source of the dynamic (changing)
>>> text.</xs:documentation> </xs:annotation> </xs:element>
>>> <xs:complexType name="ConditionalTextType"> <xs:annotation>
>>> <xs:documentation>Text which has a changeable value depending on
>>> a stated condition, response to earlier questions, or as input
>>> from a set of metrics (pre-supplied data).</xs:documentation>
>>> </xs:annotation> <xs:complexContent> <xs:extension
>>> base="TextContentType"> <xs:choice> <xs:element ref="Expression"
>>> minOccurs="0"> <xs:annotation> <xs:documentation>The condition
>>> on which the associated text varies expressed by a command code.
>>> For example, a command that inserts an age by calculating the
>>> difference between today’s date and a previously defined date
>>> of birth.</xs:documentation> </xs:annotation> </xs:element>
>>> <xs:element ref="r:SourceParameterReference" minOccurs="0">
>>> <xs:annotation> <xs:documentation>This allows for the simple
>>> insert of a piece of information from another specified
>>> parameter. For example, if the text of the item using conditional
>>> text included the respondent’s name use
>>> SourceParameterReference to reference the InParameter of the
>>> question that is bound to the OutParameter of the question:
>>> “What is your name?†</xs:documentation> </xs:annotation>
>>> </xs:element> </xs:choice> </xs:extension> </xs:complexContent>
>>> </xs:complexType>
>>>
>>>
>>> On Sun, Jun 2, 2013 at 4:36 PM, Jeremy Iverson
>>> <jeremy at colectica.com> wrote:
>>>>
>>>> Hi Wendy,
>>>>
>>>> Where is the Content element where you can specify the
>>>> language? I only see this structure, which uses xs:anyType for
>>>> the Text, not Content. Content is used for Description, but
>>>> that is not actually the question text, it is a description of
>>>> the question text.
>>>>
>>>> QuestionItem QuestionText LiteralText Description Content Text
>>>> xs:anyType ConditionalText
>>>>
>>>> If Text become TextType as you note, this would allow the
>>>> language to be specified at the segment level. However, if the
>>>> language is only specified for each segment, it is impossible
>>>> to know the actual language of the question: is it the language
>>>> of the first segment, or the one with the most words, or
>>>> something else? Those are not precise. Putting the language on
>>>> QuestionText let's us be explicit.
>>>>
>>>> If I am asking the question in English and happen to use a
>>>> single German word, is it really necessary to document the fact
>>>> that the single word is German? This seems like overkill, but
>>>> if somebody has raised this as a use case I'd be curious to
>>>> find out more.
>>>>
>>>> I am not sure I understand the idea behind the Description,
>>>> either. Would a Description on the QuestionItem be more
>>>> appropriate, rather than having a Description of each segment
>>>> of a question's text?
>>>>
>>>> Thanks,
>>>>
>>>> Jeremy
>>>>
>>>> -- Jeremy Iverson +1 608-213-1637 http://www.colectica.com/
>>>> Colectica - Statistical Data Management
>>>>
>>>> On 6/2/2013 12:28 PM, Wendy Thomas wrote:
>>>>>
>>>>>
>>>>> I am sending this out as it seems to be a general interest
>>>>> question and I'd like broader feedback. There is a specific
>>>>> question regarding the resolution of this issue stated within
>>>>> the Note below. The brief answer to the issue as stated is
>>>>> that you can declare language in a QuestionItem and other
>>>>> DynamicText, its just that the language and translation tags
>>>>> lie within the Content tag (which is the language specific
>>>>> string in a StructuredStringType). The question is whether or
>>>>> not we need a top level "primary language" attribute to
>>>>> clarify when the content of a single language example
>>>>> contains foreign text. See details below.
>>>>>
>>>>> Please make your comments known as soon as possible. --
>>>>> Wendy
>>>>>
>>>>>
>>>>>
>>>>> Summary  0000602: QuestionText no longer has xml:lang.
>>>>> Cannot specify the language of questions.
>>>>>
>>>>> Description: The QuestionText element no longer has xml:lang,
>>>>> so it is impossible to specify the language of question text,
>>>>> or to specify questions with translations.
>>>>>
>>>>> Apologies if this has already been resolved as part of some
>>>>> other issue. Or am I missing something here? This seems
>>>>> quite serious.
>>>>>
>>>>> Proposed Solution: Restore xml:lang on QuestionText. This
>>>>> would be consistent with the documentation for QuestionText,
>>>>> which states "Note that when using QuestionText, the full
>>>>> QuestionText must be repeated for multi-language versions of
>>>>> the content
>>>>>
>>>>> NOTE 1654 In all cases of DynamicText we decided that the
>>>>> object itself must repeat to clearly provide a language
>>>>> alternative. All XxxxText objects of DynamicTextType reside
>>>>> in a parent complex object that is the one carrying the ID.
>>>>> The documentation states that the XxxxText object is
>>>>> repeatable for the purpose of expressing multiple languages
>>>>> and that the assumption is that the content of each
>>>>> repetition within the parent object is equivilent content in
>>>>> an alternate language.
>>>>>
>>>>> LiteralText is no longer a StructuredStringType but contains
>>>>> the repeatable object Content which is the language specific
>>>>> subelement of a StructuredString.
>>>>>
>>>>>
>>>>> So in a QuestionItem:
>>>>>
>>>>> < d:QuestionText><d:LiteralText><r:Content
>>>>> xml:lang="de">Kommen Sie
>>>>> mit?</r:Content></d:LiteralText><d:QuestionText> <
>>>>> d:QuestionText><d:LiteralText><r:Content xml:lang="en">Do
>>>>> you want to come
>>>>> with?</r:Content></d:LiteralText><d:QuestionText>
>>>>>
>>>>> This was done because a question could have multiple
>>>>> language segements and because the dynamic text may fall in
>>>>> different locations in various language strings. We felt it
>>>>> was confusing to mix multiple language strings into a single
>>>>> QuestionText under such conditions and could even be
>>>>> impossible to parse out.
>>>>>
>>>>> So at the moment it is a matter of digging further into the
>>>>> DynamicText content to determine language. The question we
>>>>> should address is the following:
>>>>>
>>>>> Do we need to provide information on the primary language of
>>>>> the DynamicText content at the parent object level?
>>>>>
>>>>> Pro: Saves digging into question and also clarifies the
>>>>> primary language for mult-language content within a
>>>>> questions, e.g. the following:
>>>>>
>>>>> < d:QuestionText><d:LiteralText><r:Content xml:lang="en">What
>>>>> is your understanding of the German word
>>>>> </r:Content><r:Content
>>>>> xml:lang="de">"Kölsch"?</r:Content></d:LiteralText><d:QuestionText>
>>>>>
>>>>>
>>>>>
>>
>>>>>
Con: What is the rule for language identification conflicts between
>>>>>
>>>>> primary language information at DynamicText level and
>>>>> Content level? For example I could be asking a question in
>>>>> one language for a questionnaire that was intended for use in
>>>>> another language group. In short resolving conflicts is not a
>>>>> one answer fits all situations.
>>>>>
>>>>> Note that "Content" has the full set of language and
>>>>> translation information found in any international or
>>>>> structred string. Also note that for ALL other string types
>>>>> that support multple languages the language and translation
>>>>> information is contained in the sub-element. The object that
>>>>> is of InternationalStringType or StructuredStringType is a
>>>>> means of binding multiple language equivilencies together.
>>>>>
>>>>> -- Wendy L. Thomas                              Phone: +1
>>>>> 612.624.4389 Data Access Core Director                 Fax:
>>>>> +1 612.626.8375 Minnesota Population Center
>>>>> Email: wlt at umn.edu University of Minnesota 50 Willey Hall 225
>>>>> 19th Avenue South Minneapolis, MN 55455
>>>>>
>>>>> _______________________________________________ DDI-SRG
>>>>> mailing list DDI-SRG at icpsr.umich.edu
>>>>> http://lists.icpsr.umich.edu/mailman/listinfo/ddi-srg
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>


More information about the DDI-users mailing list