[DDI-users] the discussion about schemes/versions/references mentioned in the recent DDI digest - Geoff Lee from ABS very interested in the resolution of this issue [SEC=UNCLASSIFIED]

Wendy Thomas wlt at pop.umn.edu
Sun Dec 19 19:18:34 EST 2010


Hi Geoff,

I think Sam stated my recommendation much more clearly (post your email). 
In DDI we were faced with a number needs.

First to uniquely identify each object in a persistant way
Second to understand the context of the object

We did this using a nested identification structure using both the parent 
maintainable Agency/ID/Version and the object Agency/ID/Version. Because 
of this relationship we need to have a disciplined way of managing new 
versions of the maintainable as objects within it change. The recommended 
means for doing this is the inclusion of an earlier version in the new 
version and the addition of new objects, new versions, or the exclusion 
of objects from the earlier version. Doing so maintains a single 
persistant identifier for an unchanged object. However, a reference to 
that object would not capture its context at the time of reference. For 
this reason we added the attribute of the sourceContext hold the current 
version of the parent maintainable at the time of reference. So that if, 
for instance, I was referencing a File Clerk in an Occupation Code that is 
unchanged from version 1, I could state that at the time I referenced it 
the current version of the Occupation Code was version 10.

Unnested identifiers make persistance much easier by they lose the context 
of the scheme or list they are a member of. Given the importance of 
knowing and understanding that context resulted in the above approach. 
What has not been done is clear guidence in this process and the 
implciations of using other approaches. In truth there are cases where 
lists are lists and context is of little importance, in other cases, 
context is very important.

This approach should be able to support the automatic creation of surveys 
as a survey would frequently use the contents of a maintainable of a 
specified version and the latest version of any item within it. Using the 
method of inclusion of the previous version it would also be clear what 
had changed since the previous version.

Clearly this needs to be writen up and the ramifications fully discussed.

Wendy

On Sun, 19 Dec 2010, Geoff Lee wrote:

>
> Hi Wendy, Achim  (and Alerk)
>
> Just a quick note to say that although I personally won't be able to
> contribute to this discussion (due to lack of competency in the details of
> DDI 3) it will be of great relevance to the Australian Bureau of Statistics
> and our use of DDI 3.
>
> Alerk's example talks about a simplified use case of 100 questions, 90% of
> which don't change.  A realistic example from a complex ABS survey might
> have as many as 10 times that many questions (not all asked of every
> respondent of course) split across multiple levels (questions at household
> level, questions at family level, questions at person level, questions at
> event level (eg instances of disability, or instances of work history, or
> use of treatment, ...).
>
> Our goal is to (practically, not just theoretically) manage not just the
> relationships between successive editions of questions within each
> individual study, but to record (and progressively improve) the reuse of
> common questions and question modules across distinct survey and studies
> (for example between the Australian Health Survey, our Indigenous Health
> Survey, our Survey of Disability and Carers, Our Mental Health Survey, our
> Survey of Income and Housing Costs, .... each of which runs every few years
> in a complex cycle planned ahead over a 10 year forward horizon ).
>
> To make our use case more interesting, we aspire to making the DDI metadata
> describing the questions and their attributes "active", in the sense that
> for example the DDI would be used to generate collection instruments
> automatically.  Automatic generation will presumably require some
> consistent approach to resolving this issue (otherwise the automatic
> generation process becomes a much more complex process to build and test).
>
> Finally, just to make the situation even more complex, the automatic
> generation would include multi-modal approaches to data collection, such as
> old-fashioned paper questionnaires administered on the doorstep by trained
> interviewers, CAI instruments administered by interviewers at the doorstep
> or over the phone, and eventually self-completed interviews completed over
> the web.  Obviously there will need to be some degree of variation in the
> details of questions to accommodate the methodological differences between
> modes, but we would require the ability to relate the very similar
> questions within the study.
>
> All this is a long winded way of saying that the question Alerk has raised
> has considerable practical significance for us.  Although I haven't (yet)
> had chance to discuss with the folk at ABS who know much more about DDI
> than me, it suggests to me we will need to think very carefully about how
> we manage all the URNs, certainly we will need to systematise the way we
> generate and manage them, and we may well need to automate the process
> substantially.
>
> Thanks Alerk for raising such a thought provoking topic.
>
> All the best
>
> Geoff Lee
>
>
>
>  From:       ddi-users-request at icpsr.umich.edu
>
>  To:         ddi-users at icpsr.umich.edu
>
>  Date:       18/12/2010 04:01 AM
>
>  Subject:    DDI-users Digest, Vol 63, Issue 2
>
>  Sent by:    ddi-users-bounces at icpsr.umich.edu
>
>
>
>
>
>
> Send DDI-users mailing list submissions to
> 		 ddi-users at icpsr.umich.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 		 http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
> or, via email, send a message with subject or body 'help' to
> 		 ddi-users-request at icpsr.umich.edu
>
> You can reach the person managing the list at
> 		 ddi-users-owner at icpsr.umich.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of DDI-users digest..."
> Today's Topics:
>
>   1. Re: a question about schemes/versions/references (Alerk Amin)
>   2. Re: a question about schemes/versions/references (Wendy Thomas)
>
> ----- Message from Alerk Amin <A.Amin at uvt.nl> on Fri, 17 Dec 2010 11:33:04
> +0100 -----
>
>      To: ddi-users at icpsr.umich.edu
>
> Subject: Re: [DDI-users] a question about
>          schemes/versions/references
>
>
> Hello,
>
>   I've considered both of the options presented by Achim & Wendy.  I
> think some of Achim's alternatives sounds reasonable in theory, but I'm
> having some trouble envisioning how they would work in practice.
>   Our (simplified) use case is that we could have a questionnaire with
> 100 questions.  We may have 5 waves of questionnaires, and within each
> wave, we might have 5 versions of the questionnaire (pre-test, pilot,
> main-wave, follow-up, etc).  In each of these questionnaire versions,
> 90% of the questions will probably stay the same (eg. "What is your
> gender? 1. male 2. female" is probably the same for every version).
>   If we have copies of questions in different versions of the
> QuestionScheme (such as Achim's Alternative #1 or #4), this would give
> us 25 copies of the exact same gender question, each with a different
> URN.  If we estimate that 90% of the questions don't change out of 100
> questions, this works out to 25*90 = 2250 QuestionItems when we really
> only want 90.  This seems extremely complicated for our survey
> designers, as well as our disseminators.
>   I'm not sure how Grouping or Resource Packages would resolve this
> problem, because we will still end up with different versions of the
> QuestionScheme, as the questions change from version to version.
>   Wendy's solution (which I think is the same as Achim's alternative
> #2) seems to avoid the duplication problem, but I am curious how it will
> work with respect to References.  In Wave 2 of the questionnaire, we'd
> ideally like all the QuestionConstructs to reference items in version
> 2.* of the QuestionScheme.  But by including qs v1.0.0 in qs v2.0.0 as
> described by Wendy, how will a reference such as
> urn:ddi:agency:QuestionScheme.qs.2.0.0:QuestionItem.qA.L get resolved?
>   Thank you again for all of your help.
>
> Best,
> Alerk
>
> On 12/16/2010 3:44 PM, Wendy Thomas wrote:
>> Alerk
>>
>> This can actually be handled well by how you construct the new version of
>> your maintainable.
>>
>> qs v2.0.0
>>      qs v1.0.0 except qC
>>      qC v2.0.0
>>
>>
>>
>> In this way the qA v.1.0.0 continues to live within qs v1.0.0
>>
>> This was a deep bar napkin discussion at an outside table at the Town
> Hall
>> Brewery clearly during better weather than we have now.
>>
>> Wendy
>>
>>
>> On Thu, 16 Dec 2010, Joachim Wackerow wrote:
>>
>>> Hi Alerk,
>>>
>>> This understanding is correct. A new version of qC triggers a new
>>> version of the question scheme. Then you have basically two identical
>>> objects with none-identical identifiers.
>>>
>>> You can use different approaches to solve the issue:
>>>
>>> Alternative 1:
>>> The QuestionItem for qA and qB can mention the objectSource, which is a
>>> reference to the object from which the item is being copied.
>>>
> http://www.ddialliance.org/sites/default/files/documentation/ddi3.1/schemas/reusable_xsd/complexTypes/AbstractIdentifiableType.html#r5
>
>>>
>>> Alternative 2:
>>> qs v2.0.0 could reference qA and qB in qs v1.0.0 where they live.
>>>
>>> Alternative 3:
>>> qA and qB could live in a resource package where they can be referenced
>>> by wave 1 and 2.
>>>
>>> Alternative 4:
>>> It is possible to state in the comparison module that two questions
>>> (here identical, but with different identifiers) are the same
>>>
>>> Alternative 5:
>>> With the grouping approach it is possible to push everything, which is a
>>> candidate for reuse, to a higher level. Then no replication of objects
>>> takes place.
>>>
>>> Dependent from the background one of this solutions is preferable to the
>>> others. As far as I understand alternative 1 and 3 seem to be good
>>> candidates. They are not too hard to implement.
>>>
>>> Cheers,
>>> Achim
>>>
>>> Alerk Amin wrote:
>>>> Hello,
>>>>
>>>>     I have a question regarding schemes, versioning and references.
> The
>>>> following is a simplified version of a real use case that we have.
>>>>
>>>>     Suppose we have a QuestionScheme qs, which has 3 questions qA, qB,
>>>> qC.  At the beginning (wave1 for example), we have
>>>>
>>>> qs v1.0.0
>>>> 		 qA v1.0.0
>>>> 		 qB v1.0.0
>>>> 		 qC v1.0.0
>>>>
>>>> At this point in time, its clear that any QuestionConstructs or
>>>> Variables that reference these questions would use
>>>> urn:ddi:agency:QuestionScheme.qs.1.0.0:QuestionItem.qA.1.0.0
>>>>    or something similar (I might not have the exact syntax correct).
>>>>
>>>> Now, my question comes up when we move to wave 2.  Suppose qA and qB
>>>> remain the same, but we change qC.  Now, we have
>>>>
>>>> qs v2.0.0
>>>> 		 qA v1.0.0
>>>> 		 qB v1.0.0
>>>> 		 qC v2.0.0
>>>>
>>>> Now, a reference to qA would be
>>>> urn:ddi:agency:QuestionScheme.qs.2.0.0:QuestionItem.qA.1.0.0
>>>>     Even though the question has not changed, we have a different
> version
>>>> for the QuestionScheme, and therefore a different identifier for the
>>>> QuestionItem.  If we have two different variables for the 2 different
>>>> waves, they would have different QuestionItemReferences, and therefore
>>>> it becomes impossible to determine that they are based on the same
> question.
>>>>
>>>>     Is my understanding of this correct?  If so, doesn't this hurt the
>>>> reusability of items?
>>>>     Thank you for your help.
>>>>
>>>> Best,
>>>> Alerk
>>>>
>>>
>>>
>>> --
>>> GESIS - Leibniz Institute for the Social Sciences
>>> Department: Monitoring Society and Social Change
>>> Unit: Social Science Metadata Standards
>>> Visiting address: B2 1, 68159 Mannheim, Germany
>>> Postal address: P.O. Box 122155, 68072 Mannheim, Germany
>>> Phone: +49 (0)621 1246 262
>>> Fax: +49 (0)621 1246 100
>>> E-mail: joachim.wackerow at gesis.org
>>> www.gesis.org/en/institute/
>>> _______________________________________________
>>> DDI-users mailing list
>>> DDI-users at icpsr.umich.edu
>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>
>>
>> Wendy L. Thomas                          Phone: +1 612.624.4389
>> Data Access Core Director		 		  Fax:   +1 612.626.8375
>> Minnesota Population Center              Email: wlt at pop.umn.edu
>> University of Minnesota
>> 50 Willey Hall
>> 225 19th Avenue South
>> Minneapolis, MN 55455
>> _______________________________________________
>> DDI-users mailing list
>> DDI-users at icpsr.umich.edu
>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>
>
> --
> -------------------------------------------------------------------
> Alerk Amin M.Eng.
> Senior Software Developer
> CentERdata
> Room K637 (Tilburg University, Koopmans Building)
> Postal address   : PO Box 90153, 5000 LE, Tilburg, The Netherlands
> Visiting address : Warandelaan 2, 5037 AB, Tilburg, The Netherlands
> Telephone        : +31-13-466 2243 / 8325
> Fax              : +31-13-466 2764
> WWW              : www.centerdata.nl
> Disclaimer       : See http://www.centerdata.nl/maildisclaimer.
>
>
> ----- Message from Wendy Thomas <wlt at pop.umn.edu> on Fri, 17 Dec 2010
> 08:12:29 -0600 (CST) -----
>
>      To: Data Documentation Initiative Users Group
>          <ddi-users at icpsr.umich.edu>
>
> Subject: Re: [DDI-users] a question about
>          schemes/versions/references
>
>
> Alerk,
>
> I'll look for some examples of this, but the situation you name
> (understanding the context of referenced item) is why we added the
> sourceContext attribute to ReferenceType. So if I wanted to reference
> GENDER as used in the context of Wave 4 (assuming a new version with each
> wave and no change in the Question GENDER since Wave 1):
>
> <QuestionReference isReference="true" isExternal="false" lateBound="False"
> sourceContext="urn:ddi:QuestionScheme:QS_1:4.0.0">
> <r:URN>urn:ddi:QuestionScheme:QS_1:1.0.0:Question:GENDER:1.0.0</r:URN>
> </QuestionReference>
>
>
> Because GENDER doesn't change and you want a persistant identifier it has
> to continue to "live" in version 1.0.0 of the scheme. Otherwise it becomes
> another object.....equivilent...but different.
>
> Wendy
>
>
>
>
> On Fri, 17 Dec 2010, Alerk Amin wrote:
>
>> Hello,
>>
>>   I've considered both of the options presented by Achim & Wendy.  I
>> think some of Achim's alternatives sounds reasonable in theory, but I'm
>> having some trouble envisioning how they would work in practice.
>>   Our (simplified) use case is that we could have a questionnaire with
>> 100 questions.  We may have 5 waves of questionnaires, and within each
>> wave, we might have 5 versions of the questionnaire (pre-test, pilot,
>> main-wave, follow-up, etc).  In each of these questionnaire versions,
>> 90% of the questions will probably stay the same (eg. "What is your
>> gender? 1. male 2. female" is probably the same for every version).
>>   If we have copies of questions in different versions of the
>> QuestionScheme (such as Achim's Alternative #1 or #4), this would give
>> us 25 copies of the exact same gender question, each with a different
>> URN.  If we estimate that 90% of the questions don't change out of 100
>> questions, this works out to 25*90 = 2250 QuestionItems when we really
>> only want 90.  This seems extremely complicated for our survey
>> designers, as well as our disseminators.
>>   I'm not sure how Grouping or Resource Packages would resolve this
>> problem, because we will still end up with different versions of the
>> QuestionScheme, as the questions change from version to version.
>>   Wendy's solution (which I think is the same as Achim's alternative
>> #2) seems to avoid the duplication problem, but I am curious how it will
>> work with respect to References.  In Wave 2 of the questionnaire, we'd
>> ideally like all the QuestionConstructs to reference items in version
>> 2.* of the QuestionScheme.  But by including qs v1.0.0 in qs v2.0.0 as
>> described by Wendy, how will a reference such as
>> urn:ddi:agency:QuestionScheme.qs.2.0.0:QuestionItem.qA.L get resolved?
>>   Thank you again for all of your help.
>>
>> Best,
>> Alerk
>>
>> On 12/16/2010 3:44 PM, Wendy Thomas wrote:
>>> Alerk
>>>
>>> This can actually be handled well by how you construct the new version
> of
>>> your maintainable.
>>>
>>> qs v2.0.0
>>>      qs v1.0.0 except qC
>>>      qC v2.0.0
>>>
>>>
>>>
>>> In this way the qA v.1.0.0 continues to live within qs v1.0.0
>>>
>>> This was a deep bar napkin discussion at an outside table at the Town
> Hall
>>> Brewery clearly during better weather than we have now.
>>>
>>> Wendy
>>>
>>>
>>> On Thu, 16 Dec 2010, Joachim Wackerow wrote:
>>>
>>>> Hi Alerk,
>>>>
>>>> This understanding is correct. A new version of qC triggers a new
>>>> version of the question scheme. Then you have basically two identical
>>>> objects with none-identical identifiers.
>>>>
>>>> You can use different approaches to solve the issue:
>>>>
>>>> Alternative 1:
>>>> The QuestionItem for qA and qB can mention the objectSource, which is a
>>>> reference to the object from which the item is being copied.
>>>>
> http://www.ddialliance.org/sites/default/files/documentation/ddi3.1/schemas/reusable_xsd/complexTypes/AbstractIdentifiableType.html#r5
>
>>>>
>>>> Alternative 2:
>>>> qs v2.0.0 could reference qA and qB in qs v1.0.0 where they live.
>>>>
>>>> Alternative 3:
>>>> qA and qB could live in a resource package where they can be referenced
>>>> by wave 1 and 2.
>>>>
>>>> Alternative 4:
>>>> It is possible to state in the comparison module that two questions
>>>> (here identical, but with different identifiers) are the same
>>>>
>>>> Alternative 5:
>>>> With the grouping approach it is possible to push everything, which is
> a
>>>> candidate for reuse, to a higher level. Then no replication of objects
>>>> takes place.
>>>>
>>>> Dependent from the background one of this solutions is preferable to
> the
>>>> others. As far as I understand alternative 1 and 3 seem to be good
>>>> candidates. They are not too hard to implement.
>>>>
>>>> Cheers,
>>>> Achim
>>>>
>>>> Alerk Amin wrote:
>>>>> Hello,
>>>>>
>>>>>     I have a question regarding schemes, versioning and references.
> The
>>>>> following is a simplified version of a real use case that we have.
>>>>>
>>>>>     Suppose we have a QuestionScheme qs, which has 3 questions qA, qB,
>>>>> qC.  At the beginning (wave1 for example), we have
>>>>>
>>>>> qs v1.0.0
>>>>> 		 qA v1.0.0
>>>>> 		 qB v1.0.0
>>>>> 		 qC v1.0.0
>>>>>
>>>>> At this point in time, its clear that any QuestionConstructs or
>>>>> Variables that reference these questions would use
>>>>> urn:ddi:agency:QuestionScheme.qs.1.0.0:QuestionItem.qA.1.0.0
>>>>>    or something similar (I might not have the exact syntax correct).
>>>>>
>>>>> Now, my question comes up when we move to wave 2.  Suppose qA and qB
>>>>> remain the same, but we change qC.  Now, we have
>>>>>
>>>>> qs v2.0.0
>>>>> 		 qA v1.0.0
>>>>> 		 qB v1.0.0
>>>>> 		 qC v2.0.0
>>>>>
>>>>> Now, a reference to qA would be
>>>>> urn:ddi:agency:QuestionScheme.qs.2.0.0:QuestionItem.qA.1.0.0
>>>>>     Even though the question has not changed, we have a different
> version
>>>>> for the QuestionScheme, and therefore a different identifier for the
>>>>> QuestionItem.  If we have two different variables for the 2 different
>>>>> waves, they would have different QuestionItemReferences, and therefore
>>>>> it becomes impossible to determine that they are based on the same
> question.
>>>>>
>>>>>     Is my understanding of this correct?  If so, doesn't this hurt the
>>>>> reusability of items?
>>>>>     Thank you for your help.
>>>>>
>>>>> Best,
>>>>> Alerk
>>>>>
>>>>
>>>>
>>>> --
>>>> GESIS - Leibniz Institute for the Social Sciences
>>>> Department: Monitoring Society and Social Change
>>>> Unit: Social Science Metadata Standards
>>>> Visiting address: B2 1, 68159 Mannheim, Germany
>>>> Postal address: P.O. Box 122155, 68072 Mannheim, Germany
>>>> Phone: +49 (0)621 1246 262
>>>> Fax: +49 (0)621 1246 100
>>>> E-mail: joachim.wackerow at gesis.org
>>>> www.gesis.org/en/institute/
>>>> _______________________________________________
>>>> DDI-users mailing list
>>>> DDI-users at icpsr.umich.edu
>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>>
>>>
>>> Wendy L. Thomas                          Phone: +1 612.624.4389
>>> Data Access Core Director		 		  Fax:   +1 612.626.8375
>>> Minnesota Population Center              Email: wlt at pop.umn.edu
>>> University of Minnesota
>>> 50 Willey Hall
>>> 225 19th Avenue South
>>> Minneapolis, MN 55455
>>> _______________________________________________
>>> DDI-users mailing list
>>> DDI-users at icpsr.umich.edu
>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>
>>
>> --
>> -------------------------------------------------------------------
>> Alerk Amin M.Eng.
>> Senior Software Developer
>> CentERdata
>> Room K637 (Tilburg University, Koopmans Building)
>> Postal address   : PO Box 90153, 5000 LE, Tilburg, The Netherlands
>> Visiting address : Warandelaan 2, 5037 AB, Tilburg, The Netherlands
>> Telephone        : +31-13-466 2243 / 8325
>> Fax              : +31-13-466 2764
>> WWW              : www.centerdata.nl
>> Disclaimer       : See http://www.centerdata.nl/maildisclaimer.
>> _______________________________________________
>> DDI-users mailing list
>> DDI-users at icpsr.umich.edu
>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>
>
> Wendy L. Thomas                          Phone: +1 612.624.4389
> Data Access Core Director		 		  Fax:   +1 612.626.8375
> Minnesota Population Center              Email: wlt at pop.umn.edu
> University of Minnesota
> 50 Willey Hall
> 225 19th Avenue South
> Minneapolis, MN 55455
>
> _______________________________________________
> DDI-users mailing list
> DDI-users at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>
>
> ------------------------------------------------------------------------------------------------
> Free publications and statistics available on www.abs.gov.au
>
>

Wendy L. Thomas                          Phone: +1 612.624.4389
Data Access Core Director		 Fax:   +1 612.626.8375
Minnesota Population Center              Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455


More information about the DDI-users mailing list