[DDI-users] Variable representation

Rémi Dewitte remi at gide.net
Thu Apr 16 12:17:33 EDT 2009


Wendy,

I don't necessarily see it as a limitation for interoperability :
On the logical level we could state that a variable is array or supports
multiple values.
On the physical level, we could state whether the variable data is
 - on n data items using code values
or
 - on n data items using 0 and 1 (it might be an abuse to call it binary
coded ?)

Of course having one variable like this in the logical product would give
more than one variable in a SPSS dataset, and more than one column in CSV
file.

Using this possibility would not be enforced since it would be still
possible to describe things as able to do with current specification. But
having this possibility would be great !

Last, I made some additional research on the subject and SSS (triple-s)
seems to support what I have just described with a "multiple" type :
      <variable ident="1" type="multiple">
        <name>language</name>
        <label>Language spoken at home</label>
        <position start="1"/>
        <values>
           ...codes...
        </values>
      </variable>
To support the first option, just need to add : <spread subfields="2"/>

What do you think ?

Best regards,
Rémi

On Thu, Apr 16, 2009 at 17:42, Wendy Thomas <wlt at pop.umn.edu> wrote:

> Remi,
>
> Sorry I didn't respond to the missing value question for numerics.
> Currently the abstract Representation Type (the base for all representation
> types) allows you to designate the values of missing values as an attribute.
> We have noted the need to indicate labels for these but I don't see the bug.
> I will file this if I can't find it.
>
> As for the multiple response variable I have seen it dealt with in a number
> of different ways. Individual items, a smaller series of Language1,
> Language2, Language3...up to the supposed maximum viable resonses (For
> example the US Census Ancestry variables in the microdata), and creating a
> list of every possible combination (US Census Race combinations). Each of
> these serves a different purpose, some giving unique combinations totalling
> to a count of persons, others focusing on frequency of individual items.
>
> To handle a bundled response we would need to create a means of identifying
> the use and interpretation of bundled arrays. Storing data in this format
> and describing only the bundle limits the interoperability of the data
> itself with various analysis systems. It is closest in nature to the
> Language1, Language2 etc example so that any translation to another storage
> system or archival format would probably be translated to this format.
>
> Wendy
>
>
> On Thu, 16 Apr 2009, Rémi Dewitte wrote:
>
>  Hello,
>>
>> Here is an example without the question. I have left data collection out
>> of
>> the scope of my experiments for now, even I have read the schema.
>>
>> The use case is a variable having the body weight of a respondent. There
>> are
>> two codes (-1 and -2) indicating that the reason of a missing data. And
>> there is a topcode value (200). I would like to give labels for -1 -2 and
>> 200. If you have time to have a glance at my test file and point what I
>> may
>> not have understood well, it would surely help.
>>
>> As for the multi choice question. I will take a variable which tells what
>> are all the spoken languages (among a choice of 40) in the household. More
>> than one is allowed. If I want to know the frequencies, I have to do 40 of
>> them. For each of the 40 frequency results, non-response are redundant.
>> Beyond simple frequencies, if the analysis software supports it, I can do
>> cross tabulation with "language spoken at home" and another variable more
>> easily than specifying 40 variables.
>>
>> Thanks a lot,
>> Rémi
>>
>>
>>
>> On Thu, Apr 16, 2009 at 00:09, Wendy Thomas <wlt at pop.umn.edu> wrote:
>>
>>  Numeric respresenation and codeschemes are two different things. Code
>>> schemes can have numeric values but can be treated as either numbers or
>>> strings. Changes in CodeSchemes over time is done be versioning
>>> incoporating
>>> the old scheme and adding, subtractng, or changing content. Can you
>>> provide
>>> a concrete example of a question, variable, storage bundle, and analysis
>>> of
>>> the contents and I'll see what can be currently handled and what can't. I
>>> am
>>> pretty clear, I think, regarding the first 3 but not on how you are
>>> analyzing an unordered array other than simple frequencies.
>>>
>>> Wendy
>>>
>>>
>>>
>>> On Wed, 15 Apr 2009, Rémi Dewitte wrote:
>>>
>>>  Hello,
>>>
>>>>
>>>> I need try to figure how to express that a single variable can have an
>>>> array
>>>> value (which can be treated as a bag if order is irrelevant) now :).
>>>> There are several advantages when using this kind of variables in the
>>>> analysis. It happens to have 40 codes in a code scheme for a question,
>>>> we
>>>> would like to avoid creating/deleting/modifying variables when the code
>>>> scheme changes. And it is convenient when doing statistics/crosstabs,
>>>> you
>>>> don't have to do as many crosstabs as there are of codes for example.
>>>>
>>>> What about numeric representation able to reference a code scheme ?
>>>>
>>>> Thanks a lot,
>>>> Rémi
>>>>
>>>> On Wed, Apr 15, 2009 at 16:24, Wendy Thomas <wlt at pop.umn.edu> wrote:
>>>>
>>>>  Remi,
>>>>
>>>>>
>>>>> Basically the question focuses on how the data was captured while the
>>>>> logical product variable captures how it is expressed in the data file.
>>>>> My
>>>>> earlier email assumed a series of binary variables resulting from the
>>>>> multiple choice question. I suppose alternatively an array could be
>>>>> captured
>>>>> and would be treated as a single variable. The processing system would
>>>>> need
>>>>> to know how to handle that bundled array.
>>>>>
>>>>> WEndy
>>>>>
>>>>>
>>>>>
>>>>> On Wed, 15 Apr 2009, Rémi Dewitte wrote:
>>>>>
>>>>>  Mary,
>>>>>
>>>>>
>>>>>> Thanks for your kind reply.
>>>>>>
>>>>>> I find the pragmatic answer from Nesstar developer not really
>>>>>> satisfactory
>>>>>> IMHO in the sense that DDI is not about "storing variables in a system
>>>>>> independent way" but as I understood so far a way to express things.
>>>>>>
>>>>>>  From a question with 8 choices where you can pick 2 of them, I see at
>>>>>>
>>>>>>  least
>>>>>>>
>>>>>>>  two ways of storing the result :
>>>>>>>
>>>>>> - have 8 slots and code 0 or 1 if code is mentioned
>>>>>> - have 2 stots with codes mentionned (you can have also some order
>>>>>> with
>>>>>> this).
>>>>>>
>>>>>> You have to make an early choice between 2 or 8 variables. I would
>>>>>> argue
>>>>>> that it is a store (physical) matter and I don't want this detail to
>>>>>> appear
>>>>>> at the logical layer. If the CodeScheme changes, it also changes the
>>>>>> variable scheme. Dealing with 8 variables is ok but when starting to
>>>>>> have
>>>>>> few questions with between 20 to 40 codes as a multichoice, it becomes
>>>>>> a
>>>>>> pain.
>>>>>>
>>>>>> On the other side, I also understand that in the PhysicalStructure you
>>>>>> have
>>>>>> to find a way to express how multi choice is handled. Which is not the
>>>>>> direction taken by DDI so far.
>>>>>>
>>>>>> I hope my thoughts make sense.
>>>>>>
>>>>>> Best regards,
>>>>>> Rémi
>>>>>>
>>>>>> On Wed, Apr 15, 2009 at 13:24, Mary Vardigan <vardigan at umich.edu>
>>>>>> wrote:
>>>>>>
>>>>>>  Remi,
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regarding the multiple choice issue, I asked one of the original
>>>>>>> developers
>>>>>>> of Nesstar about this because I recalled that it came up. Here is
>>>>>>> what
>>>>>>> he
>>>>>>> said:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> “I remember that we were looking for an elegant  DDI-solution for
>>>>>>> this,
>>>>>>> but we did not find one. It is really tricky as the only system
>>>>>>> independent
>>>>>>> way of storing multiple choice variables is as a group of individual
>>>>>>> variables. In fact what we did in Nesstar was to use the  variable
>>>>>>> group
>>>>>>> element as a container and one of the alternatives of the variable
>>>>>>> group
>>>>>>> type attribute to indicate that this was a multiple response group.”
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hope this is helpful.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Mary
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* ddi-users-bounces at icpsr.umich.edu [mailto:
>>>>>>> ddi-users-bounces at icpsr.umich.edu] *On Behalf Of *Rémi Dewitte
>>>>>>> *Sent:* Wednesday, April 15, 2009 5:56 AM
>>>>>>> *To:* ddi-users at icpsr.umich.edu
>>>>>>> *Subject:* [DDI-users] Variable representation
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am going on working with DDI and here some of the questions I
>>>>>>> raised.
>>>>>>>
>>>>>>> It is common for a numeric variable, say body height for a
>>>>>>> respondent,
>>>>>>> to
>>>>>>> also have some non-response codes defined. NumericRepresentationType
>>>>>>> does
>>>>>>> not allow to reference a code scheme. Where to write these codes ?
>>>>>>>
>>>>>>> In our surveys, we have some "multichoice" questions, where do we
>>>>>>> write
>>>>>>> using DDI that the variable can have more than one values ? Sometimes
>>>>>>> the
>>>>>>> values are ordered, is there a place to say this ?
>>>>>>>
>>>>>>> Thanks a lot,
>>>>>>>
>>>>>>> Rémi
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> DDI-users mailing list
>>>>>>> DDI-users at icpsr.umich.edu
>>>>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   Wendy L. Thomas                          Phone: +1 612.624.4389
>>>>>>
>>>>> Data Access Core Director                Fax:   +1 612.626.8375
>>>>> Minnesota Population Center              Email: wlt at pop.umn.edu
>>>>> University of Minnesota
>>>>> 50 Willey Hall
>>>>> 225 19th Avenue South
>>>>> Minneapolis, MN 55455
>>>>> _______________________________________________
>>>>> DDI-users mailing list
>>>>> DDI-users at icpsr.umich.edu
>>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>  Wendy L. Thomas                          Phone: +1 612.624.4389
>>> Data Access Core Director                Fax:   +1 612.626.8375
>>> Minnesota Population Center              Email: wlt at pop.umn.edu
>>> University of Minnesota
>>> 50 Willey Hall
>>> 225 19th Avenue South
>>> Minneapolis, MN 55455
>>>
>>> _______________________________________________
>>> DDI-users mailing list
>>> DDI-users at icpsr.umich.edu
>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>
>>>
>>>
>>
> Wendy L. Thomas                          Phone: +1 612.624.4389
> Data Access Core Director                Fax:   +1 612.626.8375
> Minnesota Population Center              Email: wlt at pop.umn.edu
> University of Minnesota
> 50 Willey Hall
> 225 19th Avenue South
> Minneapolis, MN 55455
>
> _______________________________________________
> DDI-users mailing list
> DDI-users at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.icpsr.umich.edu/pipermail/ddi-users/attachments/20090416/e146fd86/attachment.html 


More information about the DDI-users mailing list