[DDI-users] DDI-users Digest, Vol 105, Issue 6 (SAS/Stata extended missings)

Adrian Dușa dusa.adrian at unibuc.ro
Tue Jun 24 12:39:49 EDT 2014


I agree with your argument, Larry, that it is important to
differentiate between types of missing data.

All I'm saying is, with a proper care on behalf of the researcher,
coding -1 and -2 instead of .r and .d should not matter.
If -1 is missing at random and -2 is missing by design, then I can
write a dedicated function to impute based on -2 and ignore -1.

One of the main reasons why I've switched from SPSS to R is that SPSS
was / is doing way too many things "by default" and I haven't realised
that fully until completely switching to R. In some cases, I suspect
this might be the case with .d and .r in SAS or Stata.

Without advocating too much for R, it is one big advantage of this
software that it actually forces me to think what to do instead of
accepting an already made cookbook. If the results are wrong it should
always be the researcher's responsibility to fix it and not trust the
software is doing the "right" thing.

>From what I'm feeling, DDI is getting more and more complex. As any
XML, we can invent special entries for any imaginable special case,
and for any number of different statistical packages. There are
already a lot of such packages, and documenting which special values
are specific to which package is a huge effort for the metadata
creator.
There is actually no guarantee that an XML DDI file produced today
would fit the special cases of a future statistical package, 15 years
from now.

Agreeing on a common standard, however, ensures that special
interpretations are well catered for, and researchers using particular
packages could get their specific analysis post reading the DDI file.
At least this is how I see it...

Adrian



On Tue, Jun 24, 2014 at 7:01 PM, Hoyle, Larry <larryhoyle at ku.edu> wrote:
> I would not minimize the utility of avoiding accidental treatment of missing
> values as missing This is one reason that R has NA. Also when more than one
> variable is involved having missing values allows for the possibility of
> doing things other than casewise deletion.
>
> DDI3.2 allows for the specification of a codelist describing missing values
> even for continuous (numeric) variables. Such a codelist can be applied to
> numeric values (like -1 = 'Refused' in your original example) or other
> values (like .r = 'Refused') so this achieves a level of software
> independence.
>
> The differences among software packages in their treatment of missing values
> are deeper than just syntax. Those that allow distinguishing among types of
> missing are allowing for the assignment of semantics to the different types
> of missing but they are still missing values.  Missing values with different
> meanings may even need to be treated differently.  For imputation “missing
> at random” has different implications than “missing by design”.


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
        +40 21 3120210 / int.101
Fax: +40 21 3158391



More information about the DDI-users mailing list