[DDI-users] the discussion about schemes/versions/references mentioned in the recent DDI digest - Geoff Lee from ABS very interested in the resolution of this issue [SEC=UNCLASSIFIED]

Geoff Lee geoff.lee at abs.gov.au
Sat Dec 18 18:31:07 EST 2010


Hi Wendy, Achim  (and Alerk)

Just a quick note to say that although I personally won't be able to
contribute to this discussion (due to lack of competency in the details of
DDI 3) it will be of great relevance to the Australian Bureau of Statistics
and our use of DDI 3.

Alerk's example talks about a simplified use case of 100 questions, 90% of
which don't change.  A realistic example from a complex ABS survey might
have as many as 10 times that many questions (not all asked of every
respondent of course) split across multiple levels (questions at household
level, questions at family level, questions at person level, questions at
event level (eg instances of disability, or instances of work history, or
use of treatment, ...).

Our goal is to (practically, not just theoretically) manage not just the
relationships between successive editions of questions within each
individual study, but to record (and progressively improve) the reuse of
common questions and question modules across distinct survey and studies
(for example between the Australian Health Survey, our Indigenous Health
Survey, our Survey of Disability and Carers, Our Mental Health Survey, our
Survey of Income and Housing Costs, .... each of which runs every few years
in a complex cycle planned ahead over a 10 year forward horizon ).

To make our use case more interesting, we aspire to making the DDI metadata
describing the questions and their attributes "active", in the sense that
for example the DDI would be used to generate collection instruments
automatically.  Automatic generation will presumably require some
consistent approach to resolving this issue (otherwise the automatic
generation process becomes a much more complex process to build and test).

Finally, just to make the situation even more complex, the automatic
generation would include multi-modal approaches to data collection, such as
old-fashioned paper questionnaires administered on the doorstep by trained
interviewers, CAI instruments administered by interviewers at the doorstep
or over the phone, and eventually self-completed interviews completed over
the web.  Obviously there will need to be some degree of variation in the
details of questions to accommodate the methodological differences between
modes, but we would require the ability to relate the very similar
questions within the study.

All this is a long winded way of saying that the question Alerk has raised
has considerable practical significance for us.  Although I haven't (yet)
had chance to discuss with the folk at ABS who know much more about DDI
than me, it suggests to me we will need to think very carefully about how
we manage all the URNs, certainly we will need to systematise the way we
generate and manage them, and we may well need to automate the process
substantially.

Thanks Alerk for raising such a thought provoking topic.

All the best

Geoff Lee


                                                                                                                                
  From:       ddi-users-request at icpsr.umich.edu                                                                                 
                                                                                                                                
  To:         ddi-users at icpsr.umich.edu                                                                                         
                                                                                                                                
  Date:       18/12/2010 04:01 AM                                                                                               
                                                                                                                                
  Subject:    DDI-users Digest, Vol 63, Issue 2                                                                                 
                                                                                                                                
  Sent by:    ddi-users-bounces at icpsr.umich.edu                                                                                 
                                                                                                                                





Send DDI-users mailing list submissions to
		 ddi-users at icpsr.umich.edu

To subscribe or unsubscribe via the World Wide Web, visit
		 http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
or, via email, send a message with subject or body 'help' to
		 ddi-users-request at icpsr.umich.edu

You can reach the person managing the list at
		 ddi-users-owner at icpsr.umich.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of DDI-users digest..."
Today's Topics:

   1. Re: a question about schemes/versions/references (Alerk Amin)
   2. Re: a question about schemes/versions/references (Wendy Thomas)

----- Message from Alerk Amin <A.Amin at uvt.nl> on Fri, 17 Dec 2010 11:33:04
+0100 -----
                                                          
      To: ddi-users at icpsr.umich.edu                       
                                                          
 Subject: Re: [DDI-users] a question about                
          schemes/versions/references                     
                                                          

Hello,

   I've considered both of the options presented by Achim & Wendy.  I
think some of Achim's alternatives sounds reasonable in theory, but I'm
having some trouble envisioning how they would work in practice.
   Our (simplified) use case is that we could have a questionnaire with
100 questions.  We may have 5 waves of questionnaires, and within each
wave, we might have 5 versions of the questionnaire (pre-test, pilot,
main-wave, follow-up, etc).  In each of these questionnaire versions,
90% of the questions will probably stay the same (eg. "What is your
gender? 1. male 2. female" is probably the same for every version).
   If we have copies of questions in different versions of the
QuestionScheme (such as Achim's Alternative #1 or #4), this would give
us 25 copies of the exact same gender question, each with a different
URN.  If we estimate that 90% of the questions don't change out of 100
questions, this works out to 25*90 = 2250 QuestionItems when we really
only want 90.  This seems extremely complicated for our survey
designers, as well as our disseminators.
   I'm not sure how Grouping or Resource Packages would resolve this
problem, because we will still end up with different versions of the
QuestionScheme, as the questions change from version to version.
   Wendy's solution (which I think is the same as Achim's alternative
#2) seems to avoid the duplication problem, but I am curious how it will
work with respect to References.  In Wave 2 of the questionnaire, we'd
ideally like all the QuestionConstructs to reference items in version
2.* of the QuestionScheme.  But by including qs v1.0.0 in qs v2.0.0 as
described by Wendy, how will a reference such as
urn:ddi:agency:QuestionScheme.qs.2.0.0:QuestionItem.qA.L get resolved?
   Thank you again for all of your help.

Best,
Alerk

On 12/16/2010 3:44 PM, Wendy Thomas wrote:
> Alerk
>
> This can actually be handled well by how you construct the new version of
> your maintainable.
>
> qs v2.0.0
>      qs v1.0.0 except qC
>      qC v2.0.0
>
>
>
> In this way the qA v.1.0.0 continues to live within qs v1.0.0
>
> This was a deep bar napkin discussion at an outside table at the Town
Hall
> Brewery clearly during better weather than we have now.
>
> Wendy
>
>
> On Thu, 16 Dec 2010, Joachim Wackerow wrote:
>
>> Hi Alerk,
>>
>> This understanding is correct. A new version of qC triggers a new
>> version of the question scheme. Then you have basically two identical
>> objects with none-identical identifiers.
>>
>> You can use different approaches to solve the issue:
>>
>> Alternative 1:
>> The QuestionItem for qA and qB can mention the objectSource, which is a
>> reference to the object from which the item is being copied.
>>
http://www.ddialliance.org/sites/default/files/documentation/ddi3.1/schemas/reusable_xsd/complexTypes/AbstractIdentifiableType.html#r5

>>
>> Alternative 2:
>> qs v2.0.0 could reference qA and qB in qs v1.0.0 where they live.
>>
>> Alternative 3:
>> qA and qB could live in a resource package where they can be referenced
>> by wave 1 and 2.
>>
>> Alternative 4:
>> It is possible to state in the comparison module that two questions
>> (here identical, but with different identifiers) are the same
>>
>> Alternative 5:
>> With the grouping approach it is possible to push everything, which is a
>> candidate for reuse, to a higher level. Then no replication of objects
>> takes place.
>>
>> Dependent from the background one of this solutions is preferable to the
>> others. As far as I understand alternative 1 and 3 seem to be good
>> candidates. They are not too hard to implement.
>>
>> Cheers,
>> Achim
>>
>> Alerk Amin wrote:
>>> Hello,
>>>
>>>     I have a question regarding schemes, versioning and references.
The
>>> following is a simplified version of a real use case that we have.
>>>
>>>     Suppose we have a QuestionScheme qs, which has 3 questions qA, qB,
>>> qC.  At the beginning (wave1 for example), we have
>>>
>>> qs v1.0.0
>>> 		 qA v1.0.0
>>> 		 qB v1.0.0
>>> 		 qC v1.0.0
>>>
>>> At this point in time, its clear that any QuestionConstructs or
>>> Variables that reference these questions would use
>>> urn:ddi:agency:QuestionScheme.qs.1.0.0:QuestionItem.qA.1.0.0
>>>    or something similar (I might not have the exact syntax correct).
>>>
>>> Now, my question comes up when we move to wave 2.  Suppose qA and qB
>>> remain the same, but we change qC.  Now, we have
>>>
>>> qs v2.0.0
>>> 		 qA v1.0.0
>>> 		 qB v1.0.0
>>> 		 qC v2.0.0
>>>
>>> Now, a reference to qA would be
>>> urn:ddi:agency:QuestionScheme.qs.2.0.0:QuestionItem.qA.1.0.0
>>>     Even though the question has not changed, we have a different
version
>>> for the QuestionScheme, and therefore a different identifier for the
>>> QuestionItem.  If we have two different variables for the 2 different
>>> waves, they would have different QuestionItemReferences, and therefore
>>> it becomes impossible to determine that they are based on the same
question.
>>>
>>>     Is my understanding of this correct?  If so, doesn't this hurt the
>>> reusability of items?
>>>     Thank you for your help.
>>>
>>> Best,
>>> Alerk
>>>
>>
>>
>> --
>> GESIS - Leibniz Institute for the Social Sciences
>> Department: Monitoring Society and Social Change
>> Unit: Social Science Metadata Standards
>> Visiting address: B2 1, 68159 Mannheim, Germany
>> Postal address: P.O. Box 122155, 68072 Mannheim, Germany
>> Phone: +49 (0)621 1246 262
>> Fax: +49 (0)621 1246 100
>> E-mail: joachim.wackerow at gesis.org
>> www.gesis.org/en/institute/
>> _______________________________________________
>> DDI-users mailing list
>> DDI-users at icpsr.umich.edu
>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>
>
> Wendy L. Thomas                          Phone: +1 612.624.4389
> Data Access Core Director		 		  Fax:   +1 612.626.8375
> Minnesota Population Center              Email: wlt at pop.umn.edu
> University of Minnesota
> 50 Willey Hall
> 225 19th Avenue South
> Minneapolis, MN 55455
> _______________________________________________
> DDI-users mailing list
> DDI-users at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>

--
-------------------------------------------------------------------
Alerk Amin M.Eng.
Senior Software Developer
CentERdata
Room K637 (Tilburg University, Koopmans Building)
Postal address   : PO Box 90153, 5000 LE, Tilburg, The Netherlands
Visiting address : Warandelaan 2, 5037 AB, Tilburg, The Netherlands
Telephone        : +31-13-466 2243 / 8325
Fax              : +31-13-466 2764
WWW              : www.centerdata.nl
Disclaimer       : See http://www.centerdata.nl/maildisclaimer.


----- Message from Wendy Thomas <wlt at pop.umn.edu> on Fri, 17 Dec 2010
08:12:29 -0600 (CST) -----
                                                                   
      To: Data Documentation Initiative Users Group                
          <ddi-users at icpsr.umich.edu>                              
                                                                   
 Subject: Re: [DDI-users] a question about                         
          schemes/versions/references                              
                                                                   

Alerk,

I'll look for some examples of this, but the situation you name
(understanding the context of referenced item) is why we added the
sourceContext attribute to ReferenceType. So if I wanted to reference
GENDER as used in the context of Wave 4 (assuming a new version with each
wave and no change in the Question GENDER since Wave 1):

<QuestionReference isReference="true" isExternal="false" lateBound="False"
sourceContext="urn:ddi:QuestionScheme:QS_1:4.0.0">
<r:URN>urn:ddi:QuestionScheme:QS_1:1.0.0:Question:GENDER:1.0.0</r:URN>
</QuestionReference>


Because GENDER doesn't change and you want a persistant identifier it has
to continue to "live" in version 1.0.0 of the scheme. Otherwise it becomes
another object.....equivilent...but different.

Wendy




On Fri, 17 Dec 2010, Alerk Amin wrote:

> Hello,
>
>   I've considered both of the options presented by Achim & Wendy.  I
> think some of Achim's alternatives sounds reasonable in theory, but I'm
> having some trouble envisioning how they would work in practice.
>   Our (simplified) use case is that we could have a questionnaire with
> 100 questions.  We may have 5 waves of questionnaires, and within each
> wave, we might have 5 versions of the questionnaire (pre-test, pilot,
> main-wave, follow-up, etc).  In each of these questionnaire versions,
> 90% of the questions will probably stay the same (eg. "What is your
> gender? 1. male 2. female" is probably the same for every version).
>   If we have copies of questions in different versions of the
> QuestionScheme (such as Achim's Alternative #1 or #4), this would give
> us 25 copies of the exact same gender question, each with a different
> URN.  If we estimate that 90% of the questions don't change out of 100
> questions, this works out to 25*90 = 2250 QuestionItems when we really
> only want 90.  This seems extremely complicated for our survey
> designers, as well as our disseminators.
>   I'm not sure how Grouping or Resource Packages would resolve this
> problem, because we will still end up with different versions of the
> QuestionScheme, as the questions change from version to version.
>   Wendy's solution (which I think is the same as Achim's alternative
> #2) seems to avoid the duplication problem, but I am curious how it will
> work with respect to References.  In Wave 2 of the questionnaire, we'd
> ideally like all the QuestionConstructs to reference items in version
> 2.* of the QuestionScheme.  But by including qs v1.0.0 in qs v2.0.0 as
> described by Wendy, how will a reference such as
> urn:ddi:agency:QuestionScheme.qs.2.0.0:QuestionItem.qA.L get resolved?
>   Thank you again for all of your help.
>
> Best,
> Alerk
>
> On 12/16/2010 3:44 PM, Wendy Thomas wrote:
>> Alerk
>>
>> This can actually be handled well by how you construct the new version
of
>> your maintainable.
>>
>> qs v2.0.0
>>      qs v1.0.0 except qC
>>      qC v2.0.0
>>
>>
>>
>> In this way the qA v.1.0.0 continues to live within qs v1.0.0
>>
>> This was a deep bar napkin discussion at an outside table at the Town
Hall
>> Brewery clearly during better weather than we have now.
>>
>> Wendy
>>
>>
>> On Thu, 16 Dec 2010, Joachim Wackerow wrote:
>>
>>> Hi Alerk,
>>>
>>> This understanding is correct. A new version of qC triggers a new
>>> version of the question scheme. Then you have basically two identical
>>> objects with none-identical identifiers.
>>>
>>> You can use different approaches to solve the issue:
>>>
>>> Alternative 1:
>>> The QuestionItem for qA and qB can mention the objectSource, which is a
>>> reference to the object from which the item is being copied.
>>>
http://www.ddialliance.org/sites/default/files/documentation/ddi3.1/schemas/reusable_xsd/complexTypes/AbstractIdentifiableType.html#r5

>>>
>>> Alternative 2:
>>> qs v2.0.0 could reference qA and qB in qs v1.0.0 where they live.
>>>
>>> Alternative 3:
>>> qA and qB could live in a resource package where they can be referenced
>>> by wave 1 and 2.
>>>
>>> Alternative 4:
>>> It is possible to state in the comparison module that two questions
>>> (here identical, but with different identifiers) are the same
>>>
>>> Alternative 5:
>>> With the grouping approach it is possible to push everything, which is
a
>>> candidate for reuse, to a higher level. Then no replication of objects
>>> takes place.
>>>
>>> Dependent from the background one of this solutions is preferable to
the
>>> others. As far as I understand alternative 1 and 3 seem to be good
>>> candidates. They are not too hard to implement.
>>>
>>> Cheers,
>>> Achim
>>>
>>> Alerk Amin wrote:
>>>> Hello,
>>>>
>>>>     I have a question regarding schemes, versioning and references.
The
>>>> following is a simplified version of a real use case that we have.
>>>>
>>>>     Suppose we have a QuestionScheme qs, which has 3 questions qA, qB,
>>>> qC.  At the beginning (wave1 for example), we have
>>>>
>>>> qs v1.0.0
>>>> 		 qA v1.0.0
>>>> 		 qB v1.0.0
>>>> 		 qC v1.0.0
>>>>
>>>> At this point in time, its clear that any QuestionConstructs or
>>>> Variables that reference these questions would use
>>>> urn:ddi:agency:QuestionScheme.qs.1.0.0:QuestionItem.qA.1.0.0
>>>>    or something similar (I might not have the exact syntax correct).
>>>>
>>>> Now, my question comes up when we move to wave 2.  Suppose qA and qB
>>>> remain the same, but we change qC.  Now, we have
>>>>
>>>> qs v2.0.0
>>>> 		 qA v1.0.0
>>>> 		 qB v1.0.0
>>>> 		 qC v2.0.0
>>>>
>>>> Now, a reference to qA would be
>>>> urn:ddi:agency:QuestionScheme.qs.2.0.0:QuestionItem.qA.1.0.0
>>>>     Even though the question has not changed, we have a different
version
>>>> for the QuestionScheme, and therefore a different identifier for the
>>>> QuestionItem.  If we have two different variables for the 2 different
>>>> waves, they would have different QuestionItemReferences, and therefore
>>>> it becomes impossible to determine that they are based on the same
question.
>>>>
>>>>     Is my understanding of this correct?  If so, doesn't this hurt the
>>>> reusability of items?
>>>>     Thank you for your help.
>>>>
>>>> Best,
>>>> Alerk
>>>>
>>>
>>>
>>> --
>>> GESIS - Leibniz Institute for the Social Sciences
>>> Department: Monitoring Society and Social Change
>>> Unit: Social Science Metadata Standards
>>> Visiting address: B2 1, 68159 Mannheim, Germany
>>> Postal address: P.O. Box 122155, 68072 Mannheim, Germany
>>> Phone: +49 (0)621 1246 262
>>> Fax: +49 (0)621 1246 100
>>> E-mail: joachim.wackerow at gesis.org
>>> www.gesis.org/en/institute/
>>> _______________________________________________
>>> DDI-users mailing list
>>> DDI-users at icpsr.umich.edu
>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>
>>
>> Wendy L. Thomas                          Phone: +1 612.624.4389
>> Data Access Core Director		 		  Fax:   +1 612.626.8375
>> Minnesota Population Center              Email: wlt at pop.umn.edu
>> University of Minnesota
>> 50 Willey Hall
>> 225 19th Avenue South
>> Minneapolis, MN 55455
>> _______________________________________________
>> DDI-users mailing list
>> DDI-users at icpsr.umich.edu
>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>
>
> --
> -------------------------------------------------------------------
> Alerk Amin M.Eng.
> Senior Software Developer
> CentERdata
> Room K637 (Tilburg University, Koopmans Building)
> Postal address   : PO Box 90153, 5000 LE, Tilburg, The Netherlands
> Visiting address : Warandelaan 2, 5037 AB, Tilburg, The Netherlands
> Telephone        : +31-13-466 2243 / 8325
> Fax              : +31-13-466 2764
> WWW              : www.centerdata.nl
> Disclaimer       : See http://www.centerdata.nl/maildisclaimer.
> _______________________________________________
> DDI-users mailing list
> DDI-users at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>

Wendy L. Thomas                          Phone: +1 612.624.4389
Data Access Core Director		 		  Fax:   +1 612.626.8375
Minnesota Population Center              Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455

_______________________________________________
DDI-users mailing list
DDI-users at icpsr.umich.edu
http://www.icpsr.umich.edu/mailman/listinfo/ddi-users


------------------------------------------------------------------------------------------------
Free publications and statistics available on www.abs.gov.au

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.icpsr.umich.edu/pipermail/ddi-users/attachments/20101219/9fbe900d/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : http://www.icpsr.umich.edu/pipermail/ddi-users/attachments/20101219/9fbe900d/attachment-0002.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
Url : http://www.icpsr.umich.edu/pipermail/ddi-users/attachments/20101219/9fbe900d/attachment-0003.gif 


More information about the DDI-users mailing list