[DDI-users] Variable representation

Rémi Dewitte remi at gide.net
Fri Apr 17 11:36:20 EDT 2009


Wendy,

I have discussed the issue with some others inside house.
I just want to add or re-say that we have a completely different views for a
same question that have been asked for a series of categories. In that case
we believe that a variable for each response should be created. Whereas for
a multi choice question, a single variable is created.

Thanks for considering our issue,
Rémi

2009/4/16 Wendy Thomas <wlt at pop.umn.edu>

> I'll put this discussion into Mantis for consideration in the next TIC
> review. We are currently getting 3.1 ready for Expert Committe Review and
> vote so as an option it probably would not be available for 6-12 months
> depending on the amount of review required. Thank you for raising the issue
> and providing examples.
>
>
> Wendy
>
>
> On Thu, 16 Apr 2009, Rémi Dewitte wrote:
>
>  Wendy,
>>
>> I don't necessarily see it as a limitation for interoperability :
>> On the logical level we could state that a variable is array or supports
>> multiple values.
>> On the physical level, we could state whether the variable data is
>> - on n data items using code values
>> or
>> - on n data items using 0 and 1 (it might be an abuse to call it binary
>> coded ?)
>>
>> Of course having one variable like this in the logical product would give
>> more than one variable in a SPSS dataset, and more than one column in CSV
>> file.
>>
>> Using this possibility would not be enforced since it would be still
>> possible to describe things as able to do with current specification. But
>> having this possibility would be great !
>>
>> Last, I made some additional research on the subject and SSS (triple-s)
>> seems to support what I have just described with a "multiple" type :
>>     <variable ident="1" type="multiple">
>>       <name>language</name>
>>       <label>Language spoken at home</label>
>>       <position start="1"/>
>>       <values>
>>          ...codes...
>>       </values>
>>     </variable>
>> To support the first option, just need to add : <spread subfields="2"/>
>>
>> What do you think ?
>>
>> Best regards,
>> Rémi
>>
>> On Thu, Apr 16, 2009 at 17:42, Wendy Thomas <wlt at pop.umn.edu> wrote:
>>
>>  Remi,
>>>
>>> Sorry I didn't respond to the missing value question for numerics.
>>> Currently the abstract Representation Type (the base for all
>>> representation
>>> types) allows you to designate the values of missing values as an
>>> attribute.
>>> We have noted the need to indicate labels for these but I don't see the
>>> bug.
>>> I will file this if I can't find it.
>>>
>>> As for the multiple response variable I have seen it dealt with in a
>>> number
>>> of different ways. Individual items, a smaller series of Language1,
>>> Language2, Language3...up to the supposed maximum viable resonses (For
>>> example the US Census Ancestry variables in the microdata), and creating
>>> a
>>> list of every possible combination (US Census Race combinations). Each of
>>> these serves a different purpose, some giving unique combinations
>>> totalling
>>> to a count of persons, others focusing on frequency of individual items.
>>>
>>> To handle a bundled response we would need to create a means of
>>> identifying
>>> the use and interpretation of bundled arrays. Storing data in this format
>>> and describing only the bundle limits the interoperability of the data
>>> itself with various analysis systems. It is closest in nature to the
>>> Language1, Language2 etc example so that any translation to another
>>> storage
>>> system or archival format would probably be translated to this format.
>>>
>>> Wendy
>>>
>>>
>>> On Thu, 16 Apr 2009, Rémi Dewitte wrote:
>>>
>>>  Hello,
>>>
>>>>
>>>> Here is an example without the question. I have left data collection out
>>>> of
>>>> the scope of my experiments for now, even I have read the schema.
>>>>
>>>> The use case is a variable having the body weight of a respondent. There
>>>> are
>>>> two codes (-1 and -2) indicating that the reason of a missing data. And
>>>> there is a topcode value (200). I would like to give labels for -1 -2
>>>> and
>>>> 200. If you have time to have a glance at my test file and point what I
>>>> may
>>>> not have understood well, it would surely help.
>>>>
>>>> As for the multi choice question. I will take a variable which tells
>>>> what
>>>> are all the spoken languages (among a choice of 40) in the household.
>>>> More
>>>> than one is allowed. If I want to know the frequencies, I have to do 40
>>>> of
>>>> them. For each of the 40 frequency results, non-response are redundant.
>>>> Beyond simple frequencies, if the analysis software supports it, I can
>>>> do
>>>> cross tabulation with "language spoken at home" and another variable
>>>> more
>>>> easily than specifying 40 variables.
>>>>
>>>> Thanks a lot,
>>>> Rémi
>>>>
>>>>
>>>>
>>>> On Thu, Apr 16, 2009 at 00:09, Wendy Thomas <wlt at pop.umn.edu> wrote:
>>>>
>>>>  Numeric respresenation and codeschemes are two different things. Code
>>>>
>>>>> schemes can have numeric values but can be treated as either numbers or
>>>>> strings. Changes in CodeSchemes over time is done be versioning
>>>>> incoporating
>>>>> the old scheme and adding, subtractng, or changing content. Can you
>>>>> provide
>>>>> a concrete example of a question, variable, storage bundle, and
>>>>> analysis
>>>>> of
>>>>> the contents and I'll see what can be currently handled and what can't.
>>>>> I
>>>>> am
>>>>> pretty clear, I think, regarding the first 3 but not on how you are
>>>>> analyzing an unordered array other than simple frequencies.
>>>>>
>>>>> Wendy
>>>>>
>>>>>
>>>>>
>>>>> On Wed, 15 Apr 2009, Rémi Dewitte wrote:
>>>>>
>>>>>  Hello,
>>>>>
>>>>>
>>>>>> I need try to figure how to express that a single variable can have an
>>>>>> array
>>>>>> value (which can be treated as a bag if order is irrelevant) now :).
>>>>>> There are several advantages when using this kind of variables in the
>>>>>> analysis. It happens to have 40 codes in a code scheme for a question,
>>>>>> we
>>>>>> would like to avoid creating/deleting/modifying variables when the
>>>>>> code
>>>>>> scheme changes. And it is convenient when doing statistics/crosstabs,
>>>>>> you
>>>>>> don't have to do as many crosstabs as there are of codes for example.
>>>>>>
>>>>>> What about numeric representation able to reference a code scheme ?
>>>>>>
>>>>>> Thanks a lot,
>>>>>> Rémi
>>>>>>
>>>>>> On Wed, Apr 15, 2009 at 16:24, Wendy Thomas <wlt at pop.umn.edu> wrote:
>>>>>>
>>>>>>  Remi,
>>>>>>
>>>>>>
>>>>>>> Basically the question focuses on how the data was captured while the
>>>>>>> logical product variable captures how it is expressed in the data
>>>>>>> file.
>>>>>>> My
>>>>>>> earlier email assumed a series of binary variables resulting from the
>>>>>>> multiple choice question. I suppose alternatively an array could be
>>>>>>> captured
>>>>>>> and would be treated as a single variable. The processing system
>>>>>>> would
>>>>>>> need
>>>>>>> to know how to handle that bundled array.
>>>>>>>
>>>>>>> WEndy
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, 15 Apr 2009, Rémi Dewitte wrote:
>>>>>>>
>>>>>>>  Mary,
>>>>>>>
>>>>>>>
>>>>>>>  Thanks for your kind reply.
>>>>>>>>
>>>>>>>> I find the pragmatic answer from Nesstar developer not really
>>>>>>>> satisfactory
>>>>>>>> IMHO in the sense that DDI is not about "storing variables in a
>>>>>>>> system
>>>>>>>> independent way" but as I understood so far a way to express things.
>>>>>>>>
>>>>>>>>  From a question with 8 choices where you can pick 2 of them, I see
>>>>>>>> at
>>>>>>>>
>>>>>>>>  least
>>>>>>>>
>>>>>>>>>
>>>>>>>>>  two ways of storing the result :
>>>>>>>>>
>>>>>>>>>  - have 8 slots and code 0 or 1 if code is mentioned
>>>>>>>> - have 2 stots with codes mentionned (you can have also some order
>>>>>>>> with
>>>>>>>> this).
>>>>>>>>
>>>>>>>> You have to make an early choice between 2 or 8 variables. I would
>>>>>>>> argue
>>>>>>>> that it is a store (physical) matter and I don't want this detail to
>>>>>>>> appear
>>>>>>>> at the logical layer. If the CodeScheme changes, it also changes the
>>>>>>>> variable scheme. Dealing with 8 variables is ok but when starting to
>>>>>>>> have
>>>>>>>> few questions with between 20 to 40 codes as a multichoice, it
>>>>>>>> becomes
>>>>>>>> a
>>>>>>>> pain.
>>>>>>>>
>>>>>>>> On the other side, I also understand that in the PhysicalStructure
>>>>>>>> you
>>>>>>>> have
>>>>>>>> to find a way to express how multi choice is handled. Which is not
>>>>>>>> the
>>>>>>>> direction taken by DDI so far.
>>>>>>>>
>>>>>>>> I hope my thoughts make sense.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Rémi
>>>>>>>>
>>>>>>>> On Wed, Apr 15, 2009 at 13:24, Mary Vardigan <vardigan at umich.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>  Remi,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regarding the multiple choice issue, I asked one of the original
>>>>>>>>> developers
>>>>>>>>> of Nesstar about this because I recalled that it came up. Here is
>>>>>>>>> what
>>>>>>>>> he
>>>>>>>>> said:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> “I remember that we were looking for an elegant  DDI-solution for
>>>>>>>>> this,
>>>>>>>>> but we did not find one. It is really tricky as the only system
>>>>>>>>> independent
>>>>>>>>> way of storing multiple choice variables is as a group of
>>>>>>>>> individual
>>>>>>>>> variables. In fact what we did in Nesstar was to use the  variable
>>>>>>>>> group
>>>>>>>>> element as a container and one of the alternatives of the variable
>>>>>>>>> group
>>>>>>>>> type attribute to indicate that this was a multiple response
>>>>>>>>> group.”
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hope this is helpful.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Mary
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *From:* ddi-users-bounces at icpsr.umich.edu [mailto:
>>>>>>>>> ddi-users-bounces at icpsr.umich.edu] *On Behalf Of *Rémi Dewitte
>>>>>>>>> *Sent:* Wednesday, April 15, 2009 5:56 AM
>>>>>>>>> *To:* ddi-users at icpsr.umich.edu
>>>>>>>>> *Subject:* [DDI-users] Variable representation
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I am going on working with DDI and here some of the questions I
>>>>>>>>> raised.
>>>>>>>>>
>>>>>>>>> It is common for a numeric variable, say body height for a
>>>>>>>>> respondent,
>>>>>>>>> to
>>>>>>>>> also have some non-response codes defined.
>>>>>>>>> NumericRepresentationType
>>>>>>>>> does
>>>>>>>>> not allow to reference a code scheme. Where to write these codes ?
>>>>>>>>>
>>>>>>>>> In our surveys, we have some "multichoice" questions, where do we
>>>>>>>>> write
>>>>>>>>> using DDI that the variable can have more than one values ?
>>>>>>>>> Sometimes
>>>>>>>>> the
>>>>>>>>> values are ordered, is there a place to say this ?
>>>>>>>>>
>>>>>>>>> Thanks a lot,
>>>>>>>>>
>>>>>>>>> Rémi
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> DDI-users mailing list
>>>>>>>>> DDI-users at icpsr.umich.edu
>>>>>>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Wendy L. Thomas                          Phone: +1 612.624.4389
>>>>>>>>>
>>>>>>>>
>>>>>>>>  Data Access Core Director                Fax:   +1 612.626.8375
>>>>>>> Minnesota Population Center              Email: wlt at pop.umn.edu
>>>>>>> University of Minnesota
>>>>>>> 50 Willey Hall
>>>>>>> 225 19th Avenue South
>>>>>>> Minneapolis, MN 55455
>>>>>>> _______________________________________________
>>>>>>> DDI-users mailing list
>>>>>>> DDI-users at icpsr.umich.edu
>>>>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   Wendy L. Thomas                          Phone: +1 612.624.4389
>>>>>>
>>>>> Data Access Core Director                Fax:   +1 612.626.8375
>>>>> Minnesota Population Center              Email: wlt at pop.umn.edu
>>>>> University of Minnesota
>>>>> 50 Willey Hall
>>>>> 225 19th Avenue South
>>>>> Minneapolis, MN 55455
>>>>>
>>>>> _______________________________________________
>>>>> DDI-users mailing list
>>>>> DDI-users at icpsr.umich.edu
>>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>  Wendy L. Thomas                          Phone: +1 612.624.4389
>>> Data Access Core Director                Fax:   +1 612.626.8375
>>> Minnesota Population Center              Email: wlt at pop.umn.edu
>>> University of Minnesota
>>> 50 Willey Hall
>>> 225 19th Avenue South
>>> Minneapolis, MN 55455
>>>
>>> _______________________________________________
>>> DDI-users mailing list
>>> DDI-users at icpsr.umich.edu
>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>
>>>
>>>
>>
> Wendy L. Thomas                          Phone: +1 612.624.4389
> Data Access Core Director                Fax:   +1 612.626.8375
> Minnesota Population Center              Email: wlt at pop.umn.edu
> University of Minnesota
> 50 Willey Hall
> 225 19th Avenue South
> Minneapolis, MN 55455
>
> _______________________________________________
> DDI-users mailing list
> DDI-users at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.icpsr.umich.edu/pipermail/ddi-users/attachments/20090417/9902f404/attachment-0001.html 


More information about the DDI-users mailing list