<div dir="ltr"><div>OK. Each variable declares BOTH its valid value representation and it missing value representation. Missing value representations are managed structures which can be described by any combination of a code/numeric/text representation. In addition a default missing value can be declared for a logical record or for a physical data file.</div>
<div><br></div><div>So in effect each variable using the same set of missing values would each reference the same managed missing value description. If a missing value is not an option (i.e. it must have a valid value) then no MissingValuesReference would be included in the Variable/VariableRepresentation.</div>
<div><br></div><div>Regarding identification of CaseID: see DataRelationship/LogicalRecord/CaseIdentification/</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jun 25, 2014 at 10:52 AM, Adrian Dușa <span dir="ltr"><<a href="mailto:dusa.adrian@unibuc.ro" target="_blank">dusa.adrian@unibuc.ro</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Not sure...<br>
It has to be variable specific, because each variable has different<br>
cases with missing data.<br>
<br>
But if the CodeList contains information for <each> variable which has<br>
missing data, then it's ok. I was thinking about embedding this kind<br>
of information inside each variable, but a reference to a CodeList<br>
might also be an idea (provided the above).<br>
<br>
My previous email needs a slight correction: the numbers 1, 5, 8, 9,<br>
15 and 78 should not be line numbers but rather unique identifiers<br>
(sort of a Primary Key) for the cases where the missing values are<br>
found.<br>
<br>
IMPORTANT: in this case, we also need to know which variable in the<br>
dataset contains the unique identifiers (ex. "CaseID").<br>
<br>
That actually solves all matters, because I can automatically create<br>
the necessary commands in the specific setup file(s) which will<br>
replace missing with the specific desired values depending on the<br>
statistical package.<br>
<br>
In SPSS, for a hypothetical variable "Age" it would be something like this:<br>
<br>
DO IF (CaseID = 1 | CaseID = 5 | CaseID = 9).<br>
RECODE Age (SYSMIS = -1).<br>
END IF.<br>
EXECUTE.<br>
<br>
I'm sure that SAS and Stata are much easier to work with, and R is just trivial:<br>
mydata$Age[mydata$CaseID %in% c(1, 5, 9)] <- -1<br>
<br>
On Wed, Jun 25, 2014 at 5:31 PM, Wendy Thomas <<a href="mailto:wlt@umn.edu">wlt@umn.edu</a>> wrote:<br>
><br>
> Does this 3.2 structure do what you need? it can be referenced from any<br>
> variable, noted as the default missing values for a LogicalRecord and a<br>
> Physical Instance.<br>
><br>
> <r:ManagedMissingValuesRepresentation> (note I've left off the<br>
> identification and other versionable type information)<br>
> <r:ManagedMissingValuesRepresenntationName>Combined Missing<br>
> Types</r:ManagedMissingValuesRepresentationName><br>
> <r:MissingCodeRepresentation><br>
> <r:RecommendedDataType>integer</r:RecommendedDataType><br>
> <r:CodeListReference/> to a CodeList with name Missing at<br>
> Random<br>
> </r:MissingCodeRepresentation><br>
> <r:MissingCodeRepresentation><br>
> <r:RecommendedDataType>integer</r:RecommendedDataType><br>
> <r:CodeListReference/> to a CodeList with name Missing by<br>
> Design<br>
> </r:MissingCodeRepresentation><br>
> </r:ManagedMissingValuesRepresentation><br>
><br>
><br>
> On Wed, Jun 25, 2014 at 2:39 AM, Adrian Dușa <<a href="mailto:dusa.adrian@unibuc.ro">dusa.adrian@unibuc.ro</a>> wrote:<br>
>><br>
>> Dear All,<br>
>><br>
>> Following a private discussion, an idea emerged that i think it's useful<br>
>> to circulate and discuss.<br>
>><br>
>> From what I understand, SAS codes special missing values as extremely low<br>
>> values, while Stata went for the opposite way, coding them as extremely<br>
>> large values.<br>
>><br>
>> Those are decisions which are software specific, and it is unlikely that<br>
>> other software packages will follow one trend or another.<br>
>><br>
>> There might be a way to solve all particular needs, using DDI as a<br>
>> mediator and most importantly using only "normal" values.<br>
>><br>
>> The main quest is to differentiate between missing values. In R, and I'm<br>
>> sure DDI can do that too, each variable can be attached with a list of<br>
>> attributes. One such component of the list of attributes could be dedicated<br>
>> to the missing values, and further differentiate within:<br>
>> - "missing at random": 1, 5, 9<br>
>> - "missing by design": 8, 15, 78<br>
>><br>
>> Here, the (simple integer) numbers 1, 5, 8, 9, 15 and 78 are nothing but<br>
>> the indexes of the line numbers (ie the cases) where the missing values<br>
>> reside in a particular variable.<br>
>><br>
>> If I had this kind of information in the DDI XML file, I could then<br>
>> instruct my R function to create <specific> setup files for SAS or Stata<br>
>> using .r and .d in those specific cases, while in R all missing values could<br>
>> remain as simple NAs but users can still differentiate between missings by<br>
>> just looking at the list of attributes.<br>
>><br>
>> This way it would accomplish the other need to avoid accidental mistakes,<br>
>> and it is both package independent and specific in the same time, using DDI<br>
>> as an exchange platform.<br>
>><br>
>> Recoding specific missing values is trivial in R, but I have to confess I<br>
>> don't know if and how this might be done in other software via setup files.<br>
>> People using specific software packages might confirm if this approach is<br>
>> possible or not. Raw data should be read by all packages from a .csv file<br>
>> where missing values are system missing (empty) values.<br>
>><br>
>> Best wishes,<br>
>> Adrian<br>
>><br>
>><br>
>> --<br>
>> Adrian Dusa<br>
>> University of Bucharest<br>
>> Romanian Social Data Archive<br>
>> 1, Schitu Magureanu Bd.<br>
>> 050025 Bucharest sector 5<br>
>> Romania<br>
>> Tel.:+40 21 3126618 \<br>
>> +40 21 3120210 / int.101<br>
>> Fax: +40 21 3158391<br>
>><br>
>><br>
>> _______________________________________________<br>
>> DDI-users mailing list<br>
>> <a href="mailto:DDI-users@icpsr.umich.edu">DDI-users@icpsr.umich.edu</a><br>
>> <a href="http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users" target="_blank">http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users</a><br>
>><br>
><br>
><br>
><br>
> --<br>
> Wendy L. Thomas Phone: +1 612.624.4389<br>
> Data Access Core Director Fax: +1 612.626.8375<br>
> Minnesota Population Center Email: <a href="mailto:wlt@umn.edu">wlt@umn.edu</a><br>
> University of Minnesota<br>
> 50 Willey Hall<br>
> 225 19th Avenue South<br>
> Minneapolis, MN 55455<br>
><br>
> _______________________________________________<br>
> DDI-users mailing list<br>
> <a href="mailto:DDI-users@icpsr.umich.edu">DDI-users@icpsr.umich.edu</a><br>
> <a href="http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users" target="_blank">http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users</a><br>
><br>
<br>
<br>
<br>
--<br>
Adrian Dusa<br>
University of Bucharest<br>
Romanian Social Data Archive<br>
1, Schitu Magureanu Bd.<br>
050025 Bucharest sector 5<br>
Romania<br>
Tel.:+40 21 3126618 \<br>
+40 21 3120210 / int.101<br>
Fax: +40 21 3158391<br>
<br>
_______________________________________________<br>
DDI-users mailing list<br>
<a href="mailto:DDI-users@icpsr.umich.edu">DDI-users@icpsr.umich.edu</a><br>
<a href="http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users" target="_blank">http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users</a><br>
</blockquote></div><br><br clear="all"><br>-- <br><div>Wendy L. Thomas Phone: +1 612.624.4389</div><div>Data Access Core Director Fax: +1 612.626.8375</div><div>Minnesota Population Center Email: <a href="mailto:wlt@umn.edu" target="_blank">wlt@umn.edu</a></div>
<div>University of Minnesota</div><div>50 Willey Hall</div><div>225 19th Avenue South</div><div>Minneapolis, MN 55455</div>
</div>