[DDI-users] Variable representation

Wendy Thomas wlt at pop.umn.edu
Thu Apr 16 11:42:21 EDT 2009


Remi,

Sorry I didn't respond to the missing value question for numerics. 
Currently the abstract Representation Type (the base for all 
representation types) allows you to designate the values of missing values 
as an attribute. We have noted the need to indicate labels for these but 
I don't see the bug. I will file this if I can't find it.

As for the multiple response variable I have seen it dealt with in a 
number of different ways. Individual items, a smaller series of 
Language1, Language2, Language3...up to the supposed maximum viable 
resonses (For example the US Census Ancestry variables in the microdata), 
and creating a list of every possible combination (US Census Race 
combinations). Each of these serves a different purpose, some giving 
unique combinations totalling to a count of persons, others focusing on 
frequency of individual items.

To handle a bundled response we would need to create a means of 
identifying the use and interpretation of bundled arrays. Storing data in 
this format and describing only the bundle limits the interoperability of 
the data itself with various analysis systems. It is closest in nature to 
the Language1, Language2 etc example so that any translation to another 
storage system or archival format would probably be translated to this 
format.

Wendy

On Thu, 16 Apr 2009, Rémi Dewitte wrote:

> Hello,
>
> Here is an example without the question. I have left data collection out of
> the scope of my experiments for now, even I have read the schema.
>
> The use case is a variable having the body weight of a respondent. There are
> two codes (-1 and -2) indicating that the reason of a missing data. And
> there is a topcode value (200). I would like to give labels for -1 -2 and
> 200. If you have time to have a glance at my test file and point what I may
> not have understood well, it would surely help.
>
> As for the multi choice question. I will take a variable which tells what
> are all the spoken languages (among a choice of 40) in the household. More
> than one is allowed. If I want to know the frequencies, I have to do 40 of
> them. For each of the 40 frequency results, non-response are redundant.
> Beyond simple frequencies, if the analysis software supports it, I can do
> cross tabulation with "language spoken at home" and another variable more
> easily than specifying 40 variables.
>
> Thanks a lot,
> Rémi
>
>
>
> On Thu, Apr 16, 2009 at 00:09, Wendy Thomas <wlt at pop.umn.edu> wrote:
>
>> Numeric respresenation and codeschemes are two different things. Code
>> schemes can have numeric values but can be treated as either numbers or
>> strings. Changes in CodeSchemes over time is done be versioning incoporating
>> the old scheme and adding, subtractng, or changing content. Can you provide
>> a concrete example of a question, variable, storage bundle, and analysis of
>> the contents and I'll see what can be currently handled and what can't. I am
>> pretty clear, I think, regarding the first 3 but not on how you are
>> analyzing an unordered array other than simple frequencies.
>>
>> Wendy
>>
>>
>>
>> On Wed, 15 Apr 2009, Rémi Dewitte wrote:
>>
>>  Hello,
>>>
>>> I need try to figure how to express that a single variable can have an
>>> array
>>> value (which can be treated as a bag if order is irrelevant) now :).
>>> There are several advantages when using this kind of variables in the
>>> analysis. It happens to have 40 codes in a code scheme for a question, we
>>> would like to avoid creating/deleting/modifying variables when the code
>>> scheme changes. And it is convenient when doing statistics/crosstabs, you
>>> don't have to do as many crosstabs as there are of codes for example.
>>>
>>> What about numeric representation able to reference a code scheme ?
>>>
>>> Thanks a lot,
>>> Rémi
>>>
>>> On Wed, Apr 15, 2009 at 16:24, Wendy Thomas <wlt at pop.umn.edu> wrote:
>>>
>>>  Remi,
>>>>
>>>> Basically the question focuses on how the data was captured while the
>>>> logical product variable captures how it is expressed in the data file.
>>>> My
>>>> earlier email assumed a series of binary variables resulting from the
>>>> multiple choice question. I suppose alternatively an array could be
>>>> captured
>>>> and would be treated as a single variable. The processing system would
>>>> need
>>>> to know how to handle that bundled array.
>>>>
>>>> WEndy
>>>>
>>>>
>>>>
>>>> On Wed, 15 Apr 2009, Rémi Dewitte wrote:
>>>>
>>>>  Mary,
>>>>
>>>>>
>>>>> Thanks for your kind reply.
>>>>>
>>>>> I find the pragmatic answer from Nesstar developer not really
>>>>> satisfactory
>>>>> IMHO in the sense that DDI is not about "storing variables in a system
>>>>> independent way" but as I understood so far a way to express things.
>>>>>
>>>>>  From a question with 8 choices where you can pick 2 of them, I see at
>>>>>
>>>>>> least
>>>>>>
>>>>>>  two ways of storing the result :
>>>>> - have 8 slots and code 0 or 1 if code is mentioned
>>>>> - have 2 stots with codes mentionned (you can have also some order with
>>>>> this).
>>>>>
>>>>> You have to make an early choice between 2 or 8 variables. I would argue
>>>>> that it is a store (physical) matter and I don't want this detail to
>>>>> appear
>>>>> at the logical layer. If the CodeScheme changes, it also changes the
>>>>> variable scheme. Dealing with 8 variables is ok but when starting to
>>>>> have
>>>>> few questions with between 20 to 40 codes as a multichoice, it becomes a
>>>>> pain.
>>>>>
>>>>> On the other side, I also understand that in the PhysicalStructure you
>>>>> have
>>>>> to find a way to express how multi choice is handled. Which is not the
>>>>> direction taken by DDI so far.
>>>>>
>>>>> I hope my thoughts make sense.
>>>>>
>>>>> Best regards,
>>>>> Rémi
>>>>>
>>>>> On Wed, Apr 15, 2009 at 13:24, Mary Vardigan <vardigan at umich.edu>
>>>>> wrote:
>>>>>
>>>>>  Remi,
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regarding the multiple choice issue, I asked one of the original
>>>>>> developers
>>>>>> of Nesstar about this because I recalled that it came up. Here is what
>>>>>> he
>>>>>> said:
>>>>>>
>>>>>>
>>>>>>
>>>>>> “I remember that we were looking for an elegant  DDI-solution for this,
>>>>>> but we did not find one. It is really tricky as the only system
>>>>>> independent
>>>>>> way of storing multiple choice variables is as a group of individual
>>>>>> variables. In fact what we did in Nesstar was to use the  variable
>>>>>> group
>>>>>> element as a container and one of the alternatives of the variable
>>>>>> group
>>>>>> type attribute to indicate that this was a multiple response group.”
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hope this is helpful.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Mary
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* ddi-users-bounces at icpsr.umich.edu [mailto:
>>>>>> ddi-users-bounces at icpsr.umich.edu] *On Behalf Of *Rémi Dewitte
>>>>>> *Sent:* Wednesday, April 15, 2009 5:56 AM
>>>>>> *To:* ddi-users at icpsr.umich.edu
>>>>>> *Subject:* [DDI-users] Variable representation
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am going on working with DDI and here some of the questions I raised.
>>>>>>
>>>>>> It is common for a numeric variable, say body height for a respondent,
>>>>>> to
>>>>>> also have some non-response codes defined. NumericRepresentationType
>>>>>> does
>>>>>> not allow to reference a code scheme. Where to write these codes ?
>>>>>>
>>>>>> In our surveys, we have some "multichoice" questions, where do we write
>>>>>> using DDI that the variable can have more than one values ? Sometimes
>>>>>> the
>>>>>> values are ordered, is there a place to say this ?
>>>>>>
>>>>>> Thanks a lot,
>>>>>>
>>>>>> Rémi
>>>>>>
>>>>>> _______________________________________________
>>>>>> DDI-users mailing list
>>>>>> DDI-users at icpsr.umich.edu
>>>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>  Wendy L. Thomas                          Phone: +1 612.624.4389
>>>> Data Access Core Director                Fax:   +1 612.626.8375
>>>> Minnesota Population Center              Email: wlt at pop.umn.edu
>>>> University of Minnesota
>>>> 50 Willey Hall
>>>> 225 19th Avenue South
>>>> Minneapolis, MN 55455
>>>> _______________________________________________
>>>> DDI-users mailing list
>>>> DDI-users at icpsr.umich.edu
>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>>
>>>>
>>>>
>>>
>> Wendy L. Thomas                          Phone: +1 612.624.4389
>> Data Access Core Director                Fax:   +1 612.626.8375
>> Minnesota Population Center              Email: wlt at pop.umn.edu
>> University of Minnesota
>> 50 Willey Hall
>> 225 19th Avenue South
>> Minneapolis, MN 55455
>>
>> _______________________________________________
>> DDI-users mailing list
>> DDI-users at icpsr.umich.edu
>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>
>>
>

Wendy L. Thomas                          Phone: +1 612.624.4389
Data Access Core Director		 Fax:   +1 612.626.8375
Minnesota Population Center              Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455


More information about the DDI-users mailing list