[DDI-users] standard missing values via DDI

Wendy Thomas wlt at umn.edu
Wed Jun 25 12:13:15 EDT 2014


OK. Each variable declares BOTH its valid value representation and it
missing value representation. Missing value representations are managed
structures which can be described by any combination of a code/numeric/text
representation. In addition a default missing value can be declared for a
logical record or for a physical data file.

So in effect each variable using the same set of missing values would each
reference the same managed missing value description. If a missing value is
not an option (i.e. it must have a valid value) then no
MissingValuesReference would be included in the
Variable/VariableRepresentation.

Regarding identification of CaseID: see
DataRelationship/LogicalRecord/CaseIdentification/


On Wed, Jun 25, 2014 at 10:52 AM, Adrian Dușa <dusa.adrian at unibuc.ro> wrote:

> Not sure...
> It has to be variable specific, because each variable has different
> cases with missing data.
>
> But if the CodeList contains information for <each> variable which has
> missing data, then it's ok. I was thinking about embedding this kind
> of information inside each variable, but a reference to a CodeList
> might also be an idea (provided the above).
>
> My previous email needs a slight correction: the numbers 1, 5, 8, 9,
> 15 and 78 should not be line numbers but rather unique identifiers
> (sort of a Primary Key) for the cases where the missing values are
> found.
>
> IMPORTANT: in this case, we also need to know which variable in the
> dataset contains the unique identifiers (ex. "CaseID").
>
> That actually solves all matters, because I can automatically create
> the necessary commands in the specific setup file(s) which will
> replace missing with the specific desired values depending on the
> statistical package.
>
> In SPSS, for a hypothetical variable "Age" it would be something like this:
>
> DO IF (CaseID = 1 | CaseID = 5 | CaseID = 9).
> RECODE Age (SYSMIS = -1).
> END IF.
> EXECUTE.
>
> I'm sure that SAS and Stata are much easier to work with, and R is just
> trivial:
> mydata$Age[mydata$CaseID %in% c(1, 5, 9)] <- -1
>
> On Wed, Jun 25, 2014 at 5:31 PM, Wendy Thomas <wlt at umn.edu> wrote:
> >
> > Does this 3.2 structure do what you need? it can be referenced from any
> > variable, noted as the default missing values for a LogicalRecord and a
> > Physical Instance.
> >
> > <r:ManagedMissingValuesRepresentation>  (note I've left off the
> > identification and other versionable type information)
> >   <r:ManagedMissingValuesRepresenntationName>Combined Missing
> > Types</r:ManagedMissingValuesRepresentationName>
> >   <r:MissingCodeRepresentation>
> >     <r:RecommendedDataType>integer</r:RecommendedDataType>
> >     <r:CodeListReference/>               to a CodeList with name Missing
> at
> > Random
> >  </r:MissingCodeRepresentation>
> >   <r:MissingCodeRepresentation>
> >     <r:RecommendedDataType>integer</r:RecommendedDataType>
> >     <r:CodeListReference/>               to a CodeList with name Missing
> by
> > Design
> >  </r:MissingCodeRepresentation>
> > </r:ManagedMissingValuesRepresentation>
> >
> >
> > On Wed, Jun 25, 2014 at 2:39 AM, Adrian Dușa <dusa.adrian at unibuc.ro>
> wrote:
> >>
> >> Dear All,
> >>
> >> Following a private discussion, an idea emerged that i think it's useful
> >> to circulate and discuss.
> >>
> >> From what I understand, SAS codes special missing values as extremely
> low
> >> values, while Stata went for the opposite way, coding them as extremely
> >> large values.
> >>
> >> Those are decisions which are software specific, and it is unlikely that
> >> other software packages will follow one trend or another.
> >>
> >> There might be a way to solve all particular needs, using DDI as a
> >> mediator and most importantly using only "normal" values.
> >>
> >> The main quest is to differentiate between missing values. In R, and I'm
> >> sure DDI can do that too, each variable can be attached with a list of
> >> attributes. One such component of the list of attributes could be
> dedicated
> >> to the missing values, and further differentiate within:
> >> - "missing at random": 1, 5, 9
> >> - "missing by design": 8, 15, 78
> >>
> >> Here, the (simple integer) numbers 1, 5, 8, 9, 15 and 78 are nothing but
> >> the indexes of the line numbers (ie the cases) where the missing values
> >> reside in a particular variable.
> >>
> >> If I had this kind of information in the DDI XML file, I could then
> >> instruct my R function to create <specific> setup files for SAS or Stata
> >> using .r and .d in those specific cases, while in R all missing values
> could
> >> remain as simple NAs but users can still differentiate between missings
> by
> >> just looking at the list of attributes.
> >>
> >> This way it would accomplish the other need to avoid accidental
> mistakes,
> >> and it is both package independent and specific in the same time, using
> DDI
> >> as an exchange platform.
> >>
> >> Recoding specific missing values is trivial in R, but I have to confess
> I
> >> don't know if and how this might be done in other software via setup
> files.
> >> People using specific software packages might confirm if this approach
> is
> >> possible or not. Raw data should be read by all packages from a .csv
> file
> >> where missing values are system missing (empty) values.
> >>
> >> Best wishes,
> >> Adrian
> >>
> >>
> >> --
> >> Adrian Dusa
> >> University of Bucharest
> >> Romanian Social Data Archive
> >> 1, Schitu Magureanu Bd.
> >> 050025 Bucharest sector 5
> >> Romania
> >> Tel.:+40 21 3126618 \
> >>         +40 21 3120210 / int.101
> >> Fax: +40 21 3158391
> >>
> >>
> >> _______________________________________________
> >> DDI-users mailing list
> >> DDI-users at icpsr.umich.edu
> >> http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users
> >>
> >
> >
> >
> > --
> > Wendy L. Thomas                              Phone: +1 612.624.4389
> > Data Access Core Director                 Fax:   +1 612.626.8375
> > Minnesota Population Center             Email: wlt at umn.edu
> > University of Minnesota
> > 50 Willey Hall
> > 225 19th Avenue South
> > Minneapolis, MN 55455
> >
> > _______________________________________________
> > DDI-users mailing list
> > DDI-users at icpsr.umich.edu
> > http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users
> >
>
>
>
> --
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> 1, Schitu Magureanu Bd.
> 050025 Bucharest sector 5
> Romania
> Tel.:+40 21 3126618 \
>         +40 21 3120210 / int.101
> Fax: +40 21 3158391
>
> _______________________________________________
> DDI-users mailing list
> DDI-users at icpsr.umich.edu
> http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users
>



-- 
Wendy L. Thomas                              Phone: +1 612.624.4389
Data Access Core Director                 Fax:   +1 612.626.8375
Minnesota Population Center             Email: wlt at umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.icpsr.umich.edu/pipermail/ddi-users/attachments/20140625/9a4e5097/attachment.html 


More information about the DDI-users mailing list