[DDI-users] standard missing values via DDI

Wed Jun 25 12:25:11 EDT 2014

Excellent.
Looking forward to see a concrete DDI XML example and put this
procedure to a test...

Adrian

On Wed, Jun 25, 2014 at 7:13 PM, Wendy Thomas <wlt at umn.edu> wrote:
> OK. Each variable declares BOTH its valid value representation and it
> missing value representation. Missing value representations are managed
> structures which can be described by any combination of a code/numeric/text
> representation. In addition a default missing value can be declared for a
> logical record or for a physical data file.
>
> So in effect each variable using the same set of missing values would each
> reference the same managed missing value description. If a missing value is
> not an option (i.e. it must have a valid value) then no
> MissingValuesReference would be included in the
> Variable/VariableRepresentation.
>
> Regarding identification of CaseID: see
> DataRelationship/LogicalRecord/CaseIdentification/
>
>
> On Wed, Jun 25, 2014 at 10:52 AM, Adrian Dușa <dusa.adrian at unibuc.ro> wrote:
>>
>> Not sure...
>> It has to be variable specific, because each variable has different
>> cases with missing data.
>>
>> But if the CodeList contains information for <each> variable which has
>> missing data, then it's ok. I was thinking about embedding this kind
>> of information inside each variable, but a reference to a CodeList
>> might also be an idea (provided the above).
>>
>> My previous email needs a slight correction: the numbers 1, 5, 8, 9,
>> 15 and 78 should not be line numbers but rather unique identifiers
>> (sort of a Primary Key) for the cases where the missing values are
>> found.
>>
>> IMPORTANT: in this case, we also need to know which variable in the
>> dataset contains the unique identifiers (ex. "CaseID").
>>
>> That actually solves all matters, because I can automatically create
>> the necessary commands in the specific setup file(s) which will
>> replace missing with the specific desired values depending on the
>> statistical package.
>>
>> In SPSS, for a hypothetical variable "Age" it would be something like
>> this:
>>
>> DO IF (CaseID = 1 | CaseID = 5 | CaseID = 9).
>> RECODE Age (SYSMIS = -1).
>> END IF.
>> EXECUTE.
>>
>> I'm sure that SAS and Stata are much easier to work with, and R is just
>> trivial:
>> mydata$Age[mydata$CaseID %in% c(1, 5, 9)] <- -1
>>
>> On Wed, Jun 25, 2014 at 5:31 PM, Wendy Thomas <wlt at umn.edu> wrote:
>> >
>> > Does this 3.2 structure do what you need? it can be referenced from any
>> > variable, noted as the default missing values for a LogicalRecord and a
>> > Physical Instance.
>> >
>> > <r:ManagedMissingValuesRepresentation>  (note I've left off the
>> > identification and other versionable type information)
>> >   <r:ManagedMissingValuesRepresenntationName>Combined Missing
>> > Types</r:ManagedMissingValuesRepresentationName>
>> >   <r:MissingCodeRepresentation>
>> >     <r:RecommendedDataType>integer</r:RecommendedDataType>
>> >     <r:CodeListReference/>               to a CodeList with name Missing
>> > at
>> > Random
>> >  </r:MissingCodeRepresentation>
>> >   <r:MissingCodeRepresentation>
>> >     <r:RecommendedDataType>integer</r:RecommendedDataType>
>> >     <r:CodeListReference/>               to a CodeList with name Missing
>> > by
>> > Design
>> >  </r:MissingCodeRepresentation>
>> > </r:ManagedMissingValuesRepresentation>
>> >
>> >
>> > On Wed, Jun 25, 2014 at 2:39 AM, Adrian Dușa <dusa.adrian at unibuc.ro>
>> > wrote:
>> >>
>> >> Dear All,
>> >>
>> >> Following a private discussion, an idea emerged that i think it's
>> >> useful
>> >> to circulate and discuss.
>> >>
>> >> From what I understand, SAS codes special missing values as extremely
>> >> low
>> >> values, while Stata went for the opposite way, coding them as extremely
>> >> large values.
>> >>
>> >> Those are decisions which are software specific, and it is unlikely
>> >> that
>> >> other software packages will follow one trend or another.
>> >>
>> >> There might be a way to solve all particular needs, using DDI as a
>> >> mediator and most importantly using only "normal" values.
>> >>
>> >> The main quest is to differentiate between missing values. In R, and
>> >> I'm
>> >> sure DDI can do that too, each variable can be attached with a list of
>> >> attributes. One such component of the list of attributes could be
>> >> dedicated
>> >> to the missing values, and further differentiate within:
>> >> - "missing at random": 1, 5, 9
>> >> - "missing by design": 8, 15, 78
>> >>
>> >> Here, the (simple integer) numbers 1, 5, 8, 9, 15 and 78 are nothing
>> >> but
>> >> the indexes of the line numbers (ie the cases) where the missing values
>> >> reside in a particular variable.
>> >>
>> >> If I had this kind of information in the DDI XML file, I could then
>> >> instruct my R function to create <specific> setup files for SAS or
>> >> Stata
>> >> using .r and .d in those specific cases, while in R all missing values
>> >> could
>> >> remain as simple NAs but users can still differentiate between missings
>> >> by
>> >> just looking at the list of attributes.
>> >>
>> >> This way it would accomplish the other need to avoid accidental
>> >> mistakes,
>> >> and it is both package independent and specific in the same time, using
>> >> DDI
>> >> as an exchange platform.
>> >>
>> >> Recoding specific missing values is trivial in R, but I have to confess
>> >> I
>> >> don't know if and how this might be done in other software via setup
>> >> files.
>> >> People using specific software packages might confirm if this approach
>> >> is
>> >> possible or not. Raw data should be read by all packages from a .csv
>> >> file
>> >> where missing values are system missing (empty) values.
>> >>
>> >> Best wishes,
>> >> Adrian
>> >>
>> >>
>> >> --
>> >> Adrian Dusa
>> >> University of Bucharest
>> >> Romanian Social Data Archive
>> >> 1, Schitu Magureanu Bd.
>> >> 050025 Bucharest sector 5
>> >> Romania
>> >> Tel.:+40 21 3126618 \
>> >>         +40 21 3120210 / int.101
>> >> Fax: +40 21 3158391
>> >>
>> >>
>> >> _______________________________________________
>> >> DDI-users mailing list
>> >> DDI-users at icpsr.umich.edu
>> >> http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users
>> >>
>> >
>> >
>> >
>> > --
>> > Wendy L. Thomas                              Phone: +1 612.624.4389
>> > Data Access Core Director                 Fax:   +1 612.626.8375
>> > Minnesota Population Center             Email: wlt at umn.edu
>> > University of Minnesota
>> > 50 Willey Hall
>> > 225 19th Avenue South
>> > Minneapolis, MN 55455
>> >
>> > _______________________________________________
>> > DDI-users mailing list
>> > DDI-users at icpsr.umich.edu
>> > http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users
>> >
>>
>>
>>
>> --
>> Adrian Dusa
>> University of Bucharest
>> Romanian Social Data Archive
>> 1, Schitu Magureanu Bd.
>> 050025 Bucharest sector 5
>> Romania
>> Tel.:+40 21 3126618 \
>>         +40 21 3120210 / int.101
>> Fax: +40 21 3158391
>>
>> _______________________________________________
>> DDI-users mailing list
>> DDI-users at icpsr.umich.edu
>> http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users
>
>
>
>
> --
> Wendy L. Thomas                              Phone: +1 612.624.4389
> Data Access Core Director                 Fax:   +1 612.626.8375
> Minnesota Population Center             Email: wlt at umn.edu
> University of Minnesota
> 50 Willey Hall
> 225 19th Avenue South
> Minneapolis, MN 55455
>
> _______________________________________________
> DDI-users mailing list
> DDI-users at icpsr.umich.edu
> http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users
>

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
        +40 21 3120210 / int.101
Fax: +40 21 3158391