[DDI-users] Variable representation

Wendy Thomas wlt at pop.umn.edu
Fri Apr 17 11:42:09 EDT 2009


Remi,

thanks for the clarification. That was my understanding.

Wendy


On Fri, 17 Apr 2009, Rémi Dewitte wrote:

> Wendy,
>
> I have discussed the issue with some others inside house.
> I just want to add or re-say that we have a completely different views for a
> same question that have been asked for a series of categories. In that case
> we believe that a variable for each response should be created. Whereas for
> a multi choice question, a single variable is created.
>
> Thanks for considering our issue,
> Rémi
>
> 2009/4/16 Wendy Thomas <wlt at pop.umn.edu>
>
>> I'll put this discussion into Mantis for consideration in the next TIC
>> review. We are currently getting 3.1 ready for Expert Committe Review and
>> vote so as an option it probably would not be available for 6-12 months
>> depending on the amount of review required. Thank you for raising the issue
>> and providing examples.
>>
>>
>> Wendy
>>
>>
>> On Thu, 16 Apr 2009, Rémi Dewitte wrote:
>>
>>  Wendy,
>>>
>>> I don't necessarily see it as a limitation for interoperability :
>>> On the logical level we could state that a variable is array or supports
>>> multiple values.
>>> On the physical level, we could state whether the variable data is
>>> - on n data items using code values
>>> or
>>> - on n data items using 0 and 1 (it might be an abuse to call it binary
>>> coded ?)
>>>
>>> Of course having one variable like this in the logical product would give
>>> more than one variable in a SPSS dataset, and more than one column in CSV
>>> file.
>>>
>>> Using this possibility would not be enforced since it would be still
>>> possible to describe things as able to do with current specification. But
>>> having this possibility would be great !
>>>
>>> Last, I made some additional research on the subject and SSS (triple-s)
>>> seems to support what I have just described with a "multiple" type :
>>>     <variable ident="1" type="multiple">
>>>       <name>language</name>
>>>       <label>Language spoken at home</label>
>>>       <position start="1"/>
>>>       <values>
>>>          ...codes...
>>>       </values>
>>>     </variable>
>>> To support the first option, just need to add : <spread subfields="2"/>
>>>
>>> What do you think ?
>>>
>>> Best regards,
>>> Rémi
>>>
>>> On Thu, Apr 16, 2009 at 17:42, Wendy Thomas <wlt at pop.umn.edu> wrote:
>>>
>>>  Remi,
>>>>
>>>> Sorry I didn't respond to the missing value question for numerics.
>>>> Currently the abstract Representation Type (the base for all
>>>> representation
>>>> types) allows you to designate the values of missing values as an
>>>> attribute.
>>>> We have noted the need to indicate labels for these but I don't see the
>>>> bug.
>>>> I will file this if I can't find it.
>>>>
>>>> As for the multiple response variable I have seen it dealt with in a
>>>> number
>>>> of different ways. Individual items, a smaller series of Language1,
>>>> Language2, Language3...up to the supposed maximum viable resonses (For
>>>> example the US Census Ancestry variables in the microdata), and creating
>>>> a
>>>> list of every possible combination (US Census Race combinations). Each of
>>>> these serves a different purpose, some giving unique combinations
>>>> totalling
>>>> to a count of persons, others focusing on frequency of individual items.
>>>>
>>>> To handle a bundled response we would need to create a means of
>>>> identifying
>>>> the use and interpretation of bundled arrays. Storing data in this format
>>>> and describing only the bundle limits the interoperability of the data
>>>> itself with various analysis systems. It is closest in nature to the
>>>> Language1, Language2 etc example so that any translation to another
>>>> storage
>>>> system or archival format would probably be translated to this format.
>>>>
>>>> Wendy
>>>>
>>>>
>>>> On Thu, 16 Apr 2009, Rémi Dewitte wrote:
>>>>
>>>>  Hello,
>>>>
>>>>>
>>>>> Here is an example without the question. I have left data collection out
>>>>> of
>>>>> the scope of my experiments for now, even I have read the schema.
>>>>>
>>>>> The use case is a variable having the body weight of a respondent. There
>>>>> are
>>>>> two codes (-1 and -2) indicating that the reason of a missing data. And
>>>>> there is a topcode value (200). I would like to give labels for -1 -2
>>>>> and
>>>>> 200. If you have time to have a glance at my test file and point what I
>>>>> may
>>>>> not have understood well, it would surely help.
>>>>>
>>>>> As for the multi choice question. I will take a variable which tells
>>>>> what
>>>>> are all the spoken languages (among a choice of 40) in the household.
>>>>> More
>>>>> than one is allowed. If I want to know the frequencies, I have to do 40
>>>>> of
>>>>> them. For each of the 40 frequency results, non-response are redundant.
>>>>> Beyond simple frequencies, if the analysis software supports it, I can
>>>>> do
>>>>> cross tabulation with "language spoken at home" and another variable
>>>>> more
>>>>> easily than specifying 40 variables.
>>>>>
>>>>> Thanks a lot,
>>>>> Rémi
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 16, 2009 at 00:09, Wendy Thomas <wlt at pop.umn.edu> wrote:
>>>>>
>>>>>  Numeric respresenation and codeschemes are two different things. Code
>>>>>
>>>>>> schemes can have numeric values but can be treated as either numbers or
>>>>>> strings. Changes in CodeSchemes over time is done be versioning
>>>>>> incoporating
>>>>>> the old scheme and adding, subtractng, or changing content. Can you
>>>>>> provide
>>>>>> a concrete example of a question, variable, storage bundle, and
>>>>>> analysis
>>>>>> of
>>>>>> the contents and I'll see what can be currently handled and what can't.
>>>>>> I
>>>>>> am
>>>>>> pretty clear, I think, regarding the first 3 but not on how you are
>>>>>> analyzing an unordered array other than simple frequencies.
>>>>>>
>>>>>> Wendy
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, 15 Apr 2009, Rémi Dewitte wrote:
>>>>>>
>>>>>>  Hello,
>>>>>>
>>>>>>
>>>>>>> I need try to figure how to express that a single variable can have an
>>>>>>> array
>>>>>>> value (which can be treated as a bag if order is irrelevant) now :).
>>>>>>> There are several advantages when using this kind of variables in the
>>>>>>> analysis. It happens to have 40 codes in a code scheme for a question,
>>>>>>> we
>>>>>>> would like to avoid creating/deleting/modifying variables when the
>>>>>>> code
>>>>>>> scheme changes. And it is convenient when doing statistics/crosstabs,
>>>>>>> you
>>>>>>> don't have to do as many crosstabs as there are of codes for example.
>>>>>>>
>>>>>>> What about numeric representation able to reference a code scheme ?
>>>>>>>
>>>>>>> Thanks a lot,
>>>>>>> Rémi
>>>>>>>
>>>>>>> On Wed, Apr 15, 2009 at 16:24, Wendy Thomas <wlt at pop.umn.edu> wrote:
>>>>>>>
>>>>>>>  Remi,
>>>>>>>
>>>>>>>
>>>>>>>> Basically the question focuses on how the data was captured while the
>>>>>>>> logical product variable captures how it is expressed in the data
>>>>>>>> file.
>>>>>>>> My
>>>>>>>> earlier email assumed a series of binary variables resulting from the
>>>>>>>> multiple choice question. I suppose alternatively an array could be
>>>>>>>> captured
>>>>>>>> and would be treated as a single variable. The processing system
>>>>>>>> would
>>>>>>>> need
>>>>>>>> to know how to handle that bundled array.
>>>>>>>>
>>>>>>>> WEndy
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 15 Apr 2009, Rémi Dewitte wrote:
>>>>>>>>
>>>>>>>>  Mary,
>>>>>>>>
>>>>>>>>
>>>>>>>>  Thanks for your kind reply.
>>>>>>>>>
>>>>>>>>> I find the pragmatic answer from Nesstar developer not really
>>>>>>>>> satisfactory
>>>>>>>>> IMHO in the sense that DDI is not about "storing variables in a
>>>>>>>>> system
>>>>>>>>> independent way" but as I understood so far a way to express things.
>>>>>>>>>
>>>>>>>>>  From a question with 8 choices where you can pick 2 of them, I see
>>>>>>>>> at
>>>>>>>>>
>>>>>>>>>  least
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  two ways of storing the result :
>>>>>>>>>>
>>>>>>>>>>  - have 8 slots and code 0 or 1 if code is mentioned
>>>>>>>>> - have 2 stots with codes mentionned (you can have also some order
>>>>>>>>> with
>>>>>>>>> this).
>>>>>>>>>
>>>>>>>>> You have to make an early choice between 2 or 8 variables. I would
>>>>>>>>> argue
>>>>>>>>> that it is a store (physical) matter and I don't want this detail to
>>>>>>>>> appear
>>>>>>>>> at the logical layer. If the CodeScheme changes, it also changes the
>>>>>>>>> variable scheme. Dealing with 8 variables is ok but when starting to
>>>>>>>>> have
>>>>>>>>> few questions with between 20 to 40 codes as a multichoice, it
>>>>>>>>> becomes
>>>>>>>>> a
>>>>>>>>> pain.
>>>>>>>>>
>>>>>>>>> On the other side, I also understand that in the PhysicalStructure
>>>>>>>>> you
>>>>>>>>> have
>>>>>>>>> to find a way to express how multi choice is handled. Which is not
>>>>>>>>> the
>>>>>>>>> direction taken by DDI so far.
>>>>>>>>>
>>>>>>>>> I hope my thoughts make sense.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Rémi
>>>>>>>>>
>>>>>>>>> On Wed, Apr 15, 2009 at 13:24, Mary Vardigan <vardigan at umich.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>  Remi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regarding the multiple choice issue, I asked one of the original
>>>>>>>>>> developers
>>>>>>>>>> of Nesstar about this because I recalled that it came up. Here is
>>>>>>>>>> what
>>>>>>>>>> he
>>>>>>>>>> said:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> “I remember that we were looking for an elegant  DDI-solution for
>>>>>>>>>> this,
>>>>>>>>>> but we did not find one. It is really tricky as the only system
>>>>>>>>>> independent
>>>>>>>>>> way of storing multiple choice variables is as a group of
>>>>>>>>>> individual
>>>>>>>>>> variables. In fact what we did in Nesstar was to use the  variable
>>>>>>>>>> group
>>>>>>>>>> element as a container and one of the alternatives of the variable
>>>>>>>>>> group
>>>>>>>>>> type attribute to indicate that this was a multiple response
>>>>>>>>>> group.”
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hope this is helpful.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Mary
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *From:* ddi-users-bounces at icpsr.umich.edu [mailto:
>>>>>>>>>> ddi-users-bounces at icpsr.umich.edu] *On Behalf Of *Rémi Dewitte
>>>>>>>>>> *Sent:* Wednesday, April 15, 2009 5:56 AM
>>>>>>>>>> *To:* ddi-users at icpsr.umich.edu
>>>>>>>>>> *Subject:* [DDI-users] Variable representation
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I am going on working with DDI and here some of the questions I
>>>>>>>>>> raised.
>>>>>>>>>>
>>>>>>>>>> It is common for a numeric variable, say body height for a
>>>>>>>>>> respondent,
>>>>>>>>>> to
>>>>>>>>>> also have some non-response codes defined.
>>>>>>>>>> NumericRepresentationType
>>>>>>>>>> does
>>>>>>>>>> not allow to reference a code scheme. Where to write these codes ?
>>>>>>>>>>
>>>>>>>>>> In our surveys, we have some "multichoice" questions, where do we
>>>>>>>>>> write
>>>>>>>>>> using DDI that the variable can have more than one values ?
>>>>>>>>>> Sometimes
>>>>>>>>>> the
>>>>>>>>>> values are ordered, is there a place to say this ?
>>>>>>>>>>
>>>>>>>>>> Thanks a lot,
>>>>>>>>>>
>>>>>>>>>> Rémi
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> DDI-users mailing list
>>>>>>>>>> DDI-users at icpsr.umich.edu
>>>>>>>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  Wendy L. Thomas                          Phone: +1 612.624.4389
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Data Access Core Director                Fax:   +1 612.626.8375
>>>>>>>> Minnesota Population Center              Email: wlt at pop.umn.edu
>>>>>>>> University of Minnesota
>>>>>>>> 50 Willey Hall
>>>>>>>> 225 19th Avenue South
>>>>>>>> Minneapolis, MN 55455
>>>>>>>> _______________________________________________
>>>>>>>> DDI-users mailing list
>>>>>>>> DDI-users at icpsr.umich.edu
>>>>>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   Wendy L. Thomas                          Phone: +1 612.624.4389
>>>>>>>
>>>>>> Data Access Core Director                Fax:   +1 612.626.8375
>>>>>> Minnesota Population Center              Email: wlt at pop.umn.edu
>>>>>> University of Minnesota
>>>>>> 50 Willey Hall
>>>>>> 225 19th Avenue South
>>>>>> Minneapolis, MN 55455
>>>>>>
>>>>>> _______________________________________________
>>>>>> DDI-users mailing list
>>>>>> DDI-users at icpsr.umich.edu
>>>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>  Wendy L. Thomas                          Phone: +1 612.624.4389
>>>> Data Access Core Director                Fax:   +1 612.626.8375
>>>> Minnesota Population Center              Email: wlt at pop.umn.edu
>>>> University of Minnesota
>>>> 50 Willey Hall
>>>> 225 19th Avenue South
>>>> Minneapolis, MN 55455
>>>>
>>>> _______________________________________________
>>>> DDI-users mailing list
>>>> DDI-users at icpsr.umich.edu
>>>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>>>
>>>>
>>>>
>>>
>> Wendy L. Thomas                          Phone: +1 612.624.4389
>> Data Access Core Director                Fax:   +1 612.626.8375
>> Minnesota Population Center              Email: wlt at pop.umn.edu
>> University of Minnesota
>> 50 Willey Hall
>> 225 19th Avenue South
>> Minneapolis, MN 55455
>>
>> _______________________________________________
>> DDI-users mailing list
>> DDI-users at icpsr.umich.edu
>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
>>
>>
>

Wendy L. Thomas                          Phone: +1 612.624.4389
Data Access Core Director		 Fax:   +1 612.626.8375
Minnesota Population Center              Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455


More information about the DDI-users mailing list