[DDI-users] standard missing values via DDI

Adrian Dușa dusa.adrian at unibuc.ro
Wed Jun 25 11:52:30 EDT 2014


Not sure...
It has to be variable specific, because each variable has different
cases with missing data.

But if the CodeList contains information for <each> variable which has
missing data, then it's ok. I was thinking about embedding this kind
of information inside each variable, but a reference to a CodeList
might also be an idea (provided the above).

My previous email needs a slight correction: the numbers 1, 5, 8, 9,
15 and 78 should not be line numbers but rather unique identifiers
(sort of a Primary Key) for the cases where the missing values are
found.

IMPORTANT: in this case, we also need to know which variable in the
dataset contains the unique identifiers (ex. "CaseID").

That actually solves all matters, because I can automatically create
the necessary commands in the specific setup file(s) which will
replace missing with the specific desired values depending on the
statistical package.

In SPSS, for a hypothetical variable "Age" it would be something like this:

DO IF (CaseID = 1 | CaseID = 5 | CaseID = 9).
RECODE Age (SYSMIS = -1).
END IF.
EXECUTE.

I'm sure that SAS and Stata are much easier to work with, and R is just trivial:
mydata$Age[mydata$CaseID %in% c(1, 5, 9)] <- -1

On Wed, Jun 25, 2014 at 5:31 PM, Wendy Thomas <wlt at umn.edu> wrote:
>
> Does this 3.2 structure do what you need? it can be referenced from any
> variable, noted as the default missing values for a LogicalRecord and a
> Physical Instance.
>
> <r:ManagedMissingValuesRepresentation>  (note I've left off the
> identification and other versionable type information)
>   <r:ManagedMissingValuesRepresenntationName>Combined Missing
> Types</r:ManagedMissingValuesRepresentationName>
>   <r:MissingCodeRepresentation>
>     <r:RecommendedDataType>integer</r:RecommendedDataType>
>     <r:CodeListReference/>               to a CodeList with name Missing at
> Random
>  </r:MissingCodeRepresentation>
>   <r:MissingCodeRepresentation>
>     <r:RecommendedDataType>integer</r:RecommendedDataType>
>     <r:CodeListReference/>               to a CodeList with name Missing by
> Design
>  </r:MissingCodeRepresentation>
> </r:ManagedMissingValuesRepresentation>
>
>
> On Wed, Jun 25, 2014 at 2:39 AM, Adrian Dușa <dusa.adrian at unibuc.ro> wrote:
>>
>> Dear All,
>>
>> Following a private discussion, an idea emerged that i think it's useful
>> to circulate and discuss.
>>
>> From what I understand, SAS codes special missing values as extremely low
>> values, while Stata went for the opposite way, coding them as extremely
>> large values.
>>
>> Those are decisions which are software specific, and it is unlikely that
>> other software packages will follow one trend or another.
>>
>> There might be a way to solve all particular needs, using DDI as a
>> mediator and most importantly using only "normal" values.
>>
>> The main quest is to differentiate between missing values. In R, and I'm
>> sure DDI can do that too, each variable can be attached with a list of
>> attributes. One such component of the list of attributes could be dedicated
>> to the missing values, and further differentiate within:
>> - "missing at random": 1, 5, 9
>> - "missing by design": 8, 15, 78
>>
>> Here, the (simple integer) numbers 1, 5, 8, 9, 15 and 78 are nothing but
>> the indexes of the line numbers (ie the cases) where the missing values
>> reside in a particular variable.
>>
>> If I had this kind of information in the DDI XML file, I could then
>> instruct my R function to create <specific> setup files for SAS or Stata
>> using .r and .d in those specific cases, while in R all missing values could
>> remain as simple NAs but users can still differentiate between missings by
>> just looking at the list of attributes.
>>
>> This way it would accomplish the other need to avoid accidental mistakes,
>> and it is both package independent and specific in the same time, using DDI
>> as an exchange platform.
>>
>> Recoding specific missing values is trivial in R, but I have to confess I
>> don't know if and how this might be done in other software via setup files.
>> People using specific software packages might confirm if this approach is
>> possible or not. Raw data should be read by all packages from a .csv file
>> where missing values are system missing (empty) values.
>>
>> Best wishes,
>> Adrian
>>
>>
>> --
>> Adrian Dusa
>> University of Bucharest
>> Romanian Social Data Archive
>> 1, Schitu Magureanu Bd.
>> 050025 Bucharest sector 5
>> Romania
>> Tel.:+40 21 3126618 \
>>         +40 21 3120210 / int.101
>> Fax: +40 21 3158391
>>
>>
>> _______________________________________________
>> DDI-users mailing list
>> DDI-users at icpsr.umich.edu
>> http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users
>>
>
>
>
> --
> Wendy L. Thomas                              Phone: +1 612.624.4389
> Data Access Core Director                 Fax:   +1 612.626.8375
> Minnesota Population Center             Email: wlt at umn.edu
> University of Minnesota
> 50 Willey Hall
> 225 19th Avenue South
> Minneapolis, MN 55455
>
> _______________________________________________
> DDI-users mailing list
> DDI-users at icpsr.umich.edu
> http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users
>



-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
        +40 21 3120210 / int.101
Fax: +40 21 3158391



More information about the DDI-users mailing list