[DDI-users] Use of Comparison: captured from an email discussion

Fri Sep 4 11:33:28 EDT 2009

The following questions and answers are from an email discussion between a 
DDI user, myself and Achim Wackerow. We thought that the questions and 
responses might be useful to others on the list. -- Wendy

I need some help in figuring out how to use the comparison module.

More specifically, could you please provide concrete examples on how to 
use the existing tags?

I am looking at Variable Map:

QUESTION 1."Correspondence" - between the referenced SCHEMES:

- this (parent)element is mandatory. How are we supposed to compare two 
schemes? Looking at what? Schemes are schemes - lists of items.

I understand how we might compare ITEMS, but the schemes? What kind of 
similarities and differences would we be looking for?

(I don't mind the correspondence elements being provided at scheme level, 
but I don't think they should be mandatory.. I myself can't find anything 
sensible to say about how they compare).

RESPONSE:
Each Variable Map compare the contents of ONE source VariableScheme to the 
contents of ONE target VariableScheme. Since all variables live in schemes 
you are noting their parent schemes here.

Correspondence provides the details of the general correspondence 
containing a minimum a textual description of the Commonality and 
Difference between the two schemes. If the two schemes are identical you 
can literally stop here without providing any item maps. You could also 
say that except for the items listed that everything is identical and then 
list those that have differences. Or you could state that ONLY those 
listed explicitly have been evaluated for comparison purposes.

QUESTION 2.Item map:

QUESTION  2a. SourceItem and TargetItem are NCName, but not ReferenceType 
- WHY? Is there a specific reason, or is this just an oversight?

RESPONSE: I think the initial intent was that you have already declared 
your source and target scheme and they didn't want to allow for a 
reference to a variable outside of those schemes. However we have since 
changed ID from NCName to BaseIDtype and I will file this as a bug.

QUESTION  2b.Use of Commonality, Difference, CommonalityTypeCoded, 
CommonalityWeightType, and UserDefinedCorrespondenceProperty (Value+Name)

It would help me most if you could provide an actual example of how these 
tags might actually be used when comparing 2 variables.

To keep it simple, please assume case 1: I am comparing two variables that 
identical (i.e. question, concept, category labels and codes, and 
universes are the same);

RESPONSE:
<c:VariableMap isVersionable="true" id="CVM_1" version="1.0">

<c:SourceSchemeReference><r:ID>VScheme_1</r:ID></c:SourceSchemeReference>

<c:TargetSchemeReference><r:ID>VScheme_1</r:ID></c:TargetSchemeReference>
   <c:Correspondence>
     <c:Commonality xml:lang="en" translated="false" 
translatable="true">Not evaluated except for mapped items</c:Commonality>
     <c:Difference xml:lang="en" translated="false" translatable="true">Not 
evaluated except for mapped items</c:Difference>
   </c:Correspondence>
   <c:ItemMap alias="Sex">
     <c:SourceItem>V_1</c:SourceItem>
     <c:TargetItem>V_1</c:TargetItem>
       <c:Correspondence>
         <c:Commonality xml:lang="en" translated="false" 
translatable="true">question, concept, category labels and codes, and 
universes are the same</c:Commonality>
         <c:Difference xml:lang="en" translated="false" 
translatable="true">None</c:Difference>
         <c:CommonalityTypeCoded 
codeListID="GCode_1">Identical</c:CommonalityTypeCoded>
         <c:CommonalityWeightType>1</c:CommonalityWeightType>
         <c:UserDefinedCorrespondenceProperty>
           <c:Name>No Difference</c:Name>
           <c:Value>1</c:Value>
         </c:UserDefinedCorrespondenceProperty>
       </c:Correspondence>
   </c:ItemMap>
</c:VariableMap>

CommonalityTypeCoded should be used to provide a generic commonality code 
that is preferably widely used. UserDefinedCorrespondenceProperty provides 
for additional, secondary name/value pairs that reflect local usage (for 
example, to support a search engine or capture local comaprison work using 
a variety of scales).

QUESTION case 2: I am comparing two variables that have some differences - 
say, question and category labels and codes, while concept and universe 
are the same.

Could you provide some mock-markup to show me how these fields would be 
used? Does not have to be real xml, but just indicate expected content of 
tags.

WENDY:
<c:VariableMap isVersionable="true" id="CVM_1" version="1.0">

<c:SourceSchemeReference><r:ID>VScheme_1</r:ID></c:SourceSchemeReference>

<c:TargetSchemeReference><r:ID>VScheme_1</r:ID></c:TargetSchemeReference>
   <c:Correspondence>
     <c:Commonality xml:lang="en" translated="false" 
translatable="true">Not evaluated except for mapped items</c:Commonality>
     <c:Difference xml:lang="en" translated="false" translatable="true">Not 
evaluated except for mapped items</c:Difference>
   </c:Correspondence>
   <c:ItemMap alias="Sex">
     <c:SourceItem>V_1</c:SourceItem>
     <c:TargetItem>V_1</c:TargetItem>
       <c:Correspondence>
         <c:Commonality xml:lang="en" translated="false" 
translatable="true">concept and universes are the same</c:Commonality>
         <c:Difference xml:lang="en" translated="false" 
translatable="true">Question, category labels and codes are 
different</c:Difference>
         <c:CommonalityTypeCoded 
codeListID="GCode_1">Similar</c:CommonalityTypeCoded>
         </c:UserDefinedCorrespondenceProperty>
       </c:Correspondence>
   </c:ItemMap>
</c:VariableMap>

Create QuestionMap for the two questions
Create CategoryMap for the categories  (e.g., Source = Boy/Man Target = 
Male and Source = Girl/Women Target = Female)
Create CodeMap (you can use the GenerationInstruction to provide specific 
equivilencies for values)

QUESTION 2c. If we are using a code list - either CV, or local list to 
fill in CommonalityTypeCoded, and UserDefinedCorrespondenceProperty, then 
why should we also be forced to fill in the textual descriptions of 
commonality and difference ? - here again, I disagree with these fields 
being MANDATORY.

WENDY: There are many cases in DDI where we require you to provide a minum 
level of information, namely the verbal description. The other fields are 
indended primarily to support machine actionability. We have found in 
providing users with simple codes that they may misunderstand. This is a 
means of ensuring that the user has basic information and that programmers 
will have a consistant piece of display information for inclusion in 
codebooks, screen displays etc.

QUESTION 2c. Some variables have numeric representation. How do we compare 
their values if we don't use category and code schemes?

WENDY: In describing the Commonality you note that they both use the same 
numeric type, range, top/bottom code if applicable, and missing value 
indicator. If there is a difference, note it in difference and adjust the 
CommonalityTypeCoded to indicate minor variation.

QUESTION  1.<CommonalityWeightType> according to documentation this is 
supposed to include a value between 0 and 1. So it's like a code. But, the 
meaning of the value(s) is user assigned? Or, do we intend to create a 
sort of a CV here and recommend it to all users?

WENDY: Well its user "assigned" not user defines. 0 = no commoniality and 
1 = total commonality. Its like a percent expressed as a decimal so 0 = 0% 
and 1 = 100% The user assigns any mid range (like .5 or .33 etc) if there 
is similarity. It provides a sense of how different/same two items are on 
a standard scale.

QUESTION  2. similar question for <UserDefinedCorrespondenceProperty> - is 
this something that each user defines and uses according to their own 
needs - i.e. a sort of a locally defined CV? How does the value inserted 
here related to the value documented above (in CommonalityWeightType>

WENDY: Here you can provide your own scale and a value. Hopefully this 
corresponds with the above but expressed in a different (locally 
recognized) scale.

QUESTION  3. For the variables that are not identical, but have some 
difference, you are recommending creation of question, category and code 
maps. But, why? Create such maps, since they cannot be linked to the 
variable I am comparing? In the study unit (logical product), the same 
question (or code scheme, or concept, or universe) may be used by several 
variable, and in comparison there is no way to indicate that a particular 
codescheme referenced in a map is analyzed/compared in relation to a 
particular variable.

WENDY: Yes, you would create those maps if you want to express the 
similairities or differences based on them. You'd definately want to 
create the category and/or code maps if you want to be able to complete 
the harmonization. The relationships between the questions, categories, 
universe, concept, codes of a variable are provided in the variable 
itself:

A variable map points to a source variable which points to its 
question(s), concept, universeS), codescheme-->categories

You would need to follow these links to determine if say the question the 
source variable was based on, and the question the target variable was 
based on have a mapping.

QUESTION 4a. -If you have 3 studies that are comparable by design and want 
to use group and inheritance to document them, I assume you take one (the 
first one to be completed) as the "standard" and place the relevant 
documentation (variables, concepts, etc.)  in "group" and then mark up the 
other two using StudyUnit nested within group?

WENDY: You put the materials that are common to all (inherited) parts in 
group and then nest a study unit for each with its unique contents. IF two 
of the three have additional parts in common, a subgroup is created 
containing those common parts and these two study units and their unique 
parts are placed in the subgroup.

ACHIM: When you already know all studies then you can decide which study 
is the most appropriate in the role of a standard study. The first one 
isn't always the best candidate for this. The main idea is to put all the 
common information on the highest level.

When in future new studies should be integrated they should be put in 
subgroups or study units.
When a lot of studies with large differences are added over time, it can 
make sense to identify a new standard study and new second-level (major) 
subgroups. The whole structure has to be reorganized. When we dream of 
tools for grouping purposes this process can be done by a remanufacturing 
tool in an automatic way.

QUESTION 4b. -In this case, how do you handle the study-level information 
(title, abstract, etc.) that only belongs to the first study? Do you place 
that in group too, and then use inheritance patterns to replace in the 
other two studies?

WENDY: No there will be a studyunit for each study containing its unique 
information in this case title, abstract etc.

ACHIM: When the standard study has information (study-level information or 
information on other levels) which belongs only to this study, it should 
be described at the level where the study is. This is for the standard 
study the top level.

All this information can be overwritten by inheritance or deleted 
explicitly on a lower level by the action attribute.

QUESTION 4c. -Or, do you create a StudyUnit also for the study that sits 
above, in group, and use that to enter study level information, and then 
by inheritance the variable level information will be carried over from 
above (does not need to be defined)?

WENDY: Right, the group contains a title etc for the group of studies as 
well as the materials (questions, variables whatever) that is common to 
all of the studies.

QUESTION 4d. -Secondly, if you create a pattern like the one I described 
above, with study A sitting in Group, and studies B and C nested 
underneath, inheritance allows you to compare between studies A and B, and 
A and C, but HOW DO YOU COMPARE STUDIES B and C?

WENDY:
                  GROUP
               common, inherited material
                    |
      ______________|_________________
      |               |              |
    Study A       Study B        Study C

Studies A, B C all inherit from group but not from the unique material of 
each other

                  GROUP
               common, inherited material
                    |
      ______________|_________________
      |                              |
    Study A                     Sub-Group
                            common to B and C but not A
                                     |
                          ___________|___________
                          |                     |
                         Study B             Study C

You can do any comparison mappings for things that are similar as you've 
already covered inheritance. For example a category/code scheme that has 
changed over time.

ACHIM: A tool which process grouping information should be able to 
identify all information which belongs to one study, independently on 
which level the information is in the structure. You can imagine that the 
tools crawls every branch of the grouping structure. This way the tools 
knows which information is common and which is different between studies B 
and C. The process is similar to a remanufacturing of the grouping in a 
way that a new structure is built just for the studies B and C without A.

QUESTION 4e. - I have defined a logical product in the group. In the study 
unit below, where I am trying to use inheritance, I have the exact same 
variable, with one single difference: the variable NAME is changed. But I 
do not know how to mark that up: since NAME has no ID I cannot use the 
"replace" attribute, to signify that the name needs to be replaced (while 
the rest of the description stays the same).

WENDY:
OK we have changed this in DDI 3.1 to clarify the use of NAME (Although 
the same process applies in 3.0 using Name). It is now a UserID as 
follows:
    <xs:complexType name="UserIDType">
  	<xs:annotation>
          <xs:documentation>An identifer that is locally unique within its 
specifc type. The required type attribute points to the local user 
identification system that defines the values. If multiple UserIDs are 
used they must be differentiated by the type attribute.</xs:documentation>
        </xs:annotation>
    	<xs:simpleContent>
    	  <xs:extension base="xs:string">
    	    <xs:attribute name="type" type="xs:string" use="required"/>
    	  </xs:extension>
      	</xs:simpleContent>
    </xs:complexType>

Using the "action" code (Add, Replace, Delete) would require restating the 
entire variable which seems overkill to me. Whereas what you are really 
doing is changing the local (in this particular study) UserID. I will run 
this by Arofan and Achim as we will be speaking tomorrow. I know they 
thought about this use case and I don't remember the details of the 
answer.

ACHIM: I think the SOEP example shows a similar case.

In fact Variable only requires its identification items. So the 
StudyUnit/LogicalProduct/VariableScheme/Varialbe would only have:

<Variable isVersionable="true" id="same as in the inherited" version="1.0" 
action="update">
<UserID type="StudyUnique">Fred</UserID>
</Variable>

This way it should inherit anything that has not been changed (only UserID 
changed) and this does not affect additional inheritance as updates are 
local.

ACHIM: Overwrite or delete of this property would only be possible by 
going up in the DDI/XML structure (not grouping structure) up to the next 
identifiable element and describing this structure on the lower grouping 
level. Then the overwrite by inheritance or the deletion by the attribute 
action can be defined.

WENDY: The alternative is to place the multiple UserIDs in the inherited 
content within Group. Use the type attribute to identify wave number or 
other means of selecting the appropriate local UserID for each studyunit. 
These are all strings so its pretty flexible.

Wendy L. Thomas                          Phone: +1 612.624.4389
Data Access Core Director		 Fax:   +1 612.626.8375
Minnesota Population Center              Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455