[DDI-users] DDI software that recognizes extended Stata/SAS missing values?

Hoyle, Larry larryhoyle at ku.edu
Sun Jun 22 11:51:44 EDT 2014


I’m guessing that the features in DDI3.2 for handling this are still too new for existing tools to handle complex SAS numeric variables completely, but the structure is there now in DDI3.2 to do so. The following SAS code illustrates the issue. A variable (named “value”) is numeric and has labeled values, ranges and missing values of several types. The low and high ranges might be recoded to missing values, but they might also be used as valid data. The format could be used for the transformation and it also documents the ranges. There is also a labeled valid value.

It looks like the SAS format would need to be split up in DDI3.2 into a ManagedNumericRepresentation for the valid values and a ManagedMissingValuesRepresentation for  the missing values, since it is not possible to have non-numeric values bounding the NumberRange.


<l:VariableRepresentation>
          <r:NumericRepresentationReference>
            <r:URN>URN:DDI:exampleagency.subagency:7ef2bb24-c3b0-4b50-9195-270e18dbdec0:1.0</r:URN>
            <r:TypeOfObject>ManagedNumericRepresentation</r:TypeOfObject>
          </r:NumericRepresentationReference>
          <l:MissingValuesReference>
            <r:URN>URN:DDI:exampleagency.subagency:MMVR1:1.0</r:URN>
            <r:TypeOfObject>ManagedMissingValuesRepresentation</r:TypeOfObject>
          </l:MissingValuesReference>
        </l:VariableRepresentation>

The representation of the valid values of the variable (without the missing values) can be described in DDI3.2 with a ManagedNumericRepresentation. Note that this includes the possibility of labels for the ranges.


      <r:ManagedNumericRepresentation>
        <r:URN>URN:DDI:exampleagency.subagency:7ef2bb24-c3b0-4b50-9195-270e18dbdec0:1.0</r:URN>
       <r:Label>
          <r:Content>UFMT.N</r:Content>
        </r:Label>
        <r:Description>
          <r:Content>Labeled numeric ranges generated from the SAS format UFMT.N</r:Content>
        </r:Description>
        <r:NumberRange>
          <r:Label>
            <r:Content>32 calibration value</r:Content>
            <r:TypeOfLabel>text</r:TypeOfLabel>
          </r:Label>
          <r:Low isInclusive="true">0</r:Low>
          <r:High isInclusive="true">0</r:High>
        </r:NumberRange>
        <r:NumberRange>
          <r:Label>
            <r:Content>out of bounds high</r:Content>
            <r:TypeOfLabel>text</r:TypeOfLabel>
          </r:Label>
          <r:Low isInclusive="false">110.0000000000</r:Low>
        </r:NumberRange>
        <r:NumberRange>
          <r:Label>
            <r:Content>out of bounds low</r:Content>
            <r:TypeOfLabel>text</r:TypeOfLabel>
          </r:Label>
          <r:High isInclusive="true">-30.0000000000</r:High>
        </r:NumberRange>
      </r:ManagedNumericRepresentation>


The missing values would be described by a ManagedMissingValuesRepresentation pointing to a codelist. A choice could be made here as to whether to use codes like “.a” or “A”. The latter is how the missing values are exported from SAS so that is probably the better option. The Codes could, of course, point to Categories for the labels.

<r:ManagedMissingValuesRepresentation>
      <r:URN>URN:DDI:exampleagency.subagency:MMVR1:1.0</r:URN>
      <r:MissingCodeRepresentation>
        <r:CodeListReference>
          <r:URN>URN:DDI:exampleagency.subagency:CDLST1:1.0</r:URN>
          <r:TypeOfObject>CodeList</r:TypeOfObject>
        </r:CodeListReference>
      </r:MissingCodeRepresentation>
    </r:ManagedMissingValuesRepresentation>











/* Test dataset for SAS missing Data */
libname L 'C:\Ddrive\projects\various\DDI\SASMissingData';

proc format cntlout=L.formats;
value ufmt
low - -30 = 'out of bounds low'
0 = '32 calibration value'
110 <- high = 'out of bounds high'
. = 'system missing'
.s = "scheduled"
.i = 'instrument failure'
._ = 'underscored missing'
;

data L.testData;
input value sequenceNumber;

datalines;
-40 0
1 1
32 2
78 3
. 4
.s 5
.d 6
.i 7
._ 8
201 9
;
data L.testDataFormatted;
set L.testData;
format value ufmt.;

run;







--- Larry Hoyle


From: Hoyle, Larry
Sent: Friday, June 20, 2014 1:11 PM
To: 'Data Documentation Initiative Users Group'
Subject: RE: [DDI-users] DDI software that recognizes extended Stata/SAS missing values?

This is a somewhat tricky issue in that “.a” et al are numeric but “out of band” in that they are not valid numbers (they are missing codes) but are part of a numeric representation. You asked about tools that handle these.
I’ve been working on a SAS Enterprise Guide tool to generate DDI 3.2  let me check as to whether It’s handling these missing codes in the fashion that Achim and I discussed.
I’m also not sure about some of the other tools, but will take a look at those I have access to as well.

--- Larry Hoyle


From: ddi-users-bounces at icpsr.umich.edu<mailto:ddi-users-bounces at icpsr.umich.edu> [mailto:ddi-users-bounces at icpsr.umich.edu] On Behalf Of Wendy Thomas
Sent: Friday, June 20, 2014 11:05 AM
To: Data Documentation Initiative Users Group
Subject: Re: [DDI-users] DDI software that recognizes extended Stata/SAS missing values?

Hi Bob,

In DDI 3.2 there is a specific differentiation between valid and invalid (missing values) so that they are declared separately. In PhysicalDataProduct the PhysicalStructure/DefaultMissingData can be used for this as this is a description of a specific file type (like SAS, etc.). Larry Hoyle or Achim Wackerow can probably provide more details as they were looking at the new structure for Missing Values in this context while it was being developed.

Wendy

On Fri, Jun 20, 2014 at 10:26 AM, Bob McConnaughey <bobmcconn at gmail.com<mailto:bobmcconn at gmail.com>> wrote:
Dear Folks:
    Do any of the DDI documentation/codebook generating packages handle extended Stata/SAS missing values? .a-.z ; "."?  Our analysis datasets (as opposed to the original questionnaires) invariably use multiple sets of special missing codes, and converting every thing into faux numeric missings (-1---9) for instance kind of defeats the purpose.
any way forwards will be most appreciated.
thanks
bob


Bob McConnaughey, PhD.
Westat/NIEHS epidemiology branch contractor
919-941-8300<tel:919-941-8300> (w)
919-542-5653<tel:919-542-5653> (h)

"She is at the brink of never being hurt again
but pauses to say, All of us.  Every blade of grass."
from Kuan Yin, Laura Fargas


_______________________________________________
DDI-users mailing list
DDI-users at icpsr.umich.edu<mailto:DDI-users at icpsr.umich.edu>
http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users



--
Wendy L. Thomas                              Phone: +1 612.624.4389
Data Access Core Director                 Fax:   +1 612.626.8375
Minnesota Population Center             Email: wlt at umn.edu<mailto:wlt at umn.edu>
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.icpsr.umich.edu/pipermail/ddi-users/attachments/20140622/66e8676e/attachment-0001.html 


More information about the DDI-users mailing list