[DDI-SRG] [disco] issues on missing values for numeric response domain and on conceptual variable

Hilde Orten Hilde.Orten at nsd.no
Sun Mar 17 10:32:51 EDT 2019


Hi Achim and all,

Great you are working to finalise Disco.

I agree that missings are less important for searches than the valid representations, and therefore less important in Disco.

Just wanted to remind about the rationale why we keep valids and missings separate in the QDDT tool.

See:
https://github.com/DASISH/qddt-client/wiki/DDI---Handling-of-missing-values-in-the-QDDT

Hilde
________________________________
From: ddi-srg-bounces at icpsr.umich.edu <ddi-srg-bounces at icpsr.umich.edu> on behalf of Wendy Thomas <wlt at umn.edu>
Sent: Saturday, March 16, 2019 5:56:12 PM
To: Wackerow, Joachim
Cc: DDI Structural Reform Working Group.; Zapilko, Benjamin
Subject: Re: [DDI-SRG] [disco] issues on missing values for numeric response domain and on conceptual variable

Achim,
Your understanding is correct. Note that 3.2 still retains the 3.1 means of identifying missing values also. Larry's point is also valid. The extended details are more important in terms of working with and analyzing  the data than it is in searching for data. Essentially the difference would be that in 2.5 and 3.2 Disco would pull from a single depiction of a value representation while in 3.2 it should pull from both value representation and missing value representation. At that point those from a missing value representation would indicate they were missing in however Disco is relaying that. This would ensure that those using the new structure ONLY could also map missing values to Disco. For the most part people don't search for missing values when searching for data :-)

Wendy

On Sat, Mar 16, 2019 at 5:41 AM Wackerow, Joachim <Joachim.Wackerow at gesis.org<mailto:Joachim.Wackerow at gesis.org>> wrote:
Wendy,

Yes, this all makes absolutely sense to me. Disco can indicate per code or value if it is missing or not. This seems to be sufficient to represent 2.5. and 3.1.

The question is if it should represent 3.2 in this regard. Is the more advanced description of missing values really important for a data search.

A related change would be some work.

I tend to invent for Disco a section on limitations where it is mentioned that the ConceptualVariable in 3.2 and the advanced description of missing values in 3.2 cannot be represented.

This is my understanding from 3.2

Variable has VariableRepresentation 0..1
VariableRepresentation has ValueRepresentation 0..1 (i.e. CodeRepresentation, NumericRepresentation) and MissingValuesReference 0..1

CodeRepresentation has CodeListReference
CodeListReference points to CodeList
CodeList has Code 0..*

NumericRepresentation has NumberRange 0..*

MissingValuesReference points to ManagedMissingValuesRepresentation
ManagedMissingValuesRepresentation has MissingCodeRepresentation 0..*, MissingNumericRepresentation 0..*
MissingCodeRepresentation has CodeListReference 0..1
MissingNumericRepresentation has NumberRange 0..*

Achim

From: Wendy Thomas [mailto:wlt at umn.edu<mailto:wlt at umn.edu>]
Sent: Freitag, 15. März 2019 15:40
To: Wackerow, Joachim
Cc: DDI Structural Reform Working Group.; Zapilko, Benjamin
Subject: Re: [DDI-SRG] [disco] issues on missing values for numeric response domain and on conceptual variable

RE: missing values
Achim,
As I recall Disco was created prior to our division of substantive and sentinel values. Earlier DDI-Lifecycle used something similar to Codebook where you could designate blank as missing and specify which values were missing either by listing in an attribute (3.1) or indicating if a value in a catVal was determined to be missing. If Disco covers that it will work with all versions of codebook and lifecycle. When we added the ability to show missing values as a separate representation we did not remove the short hand approach so that major surgery on earlier versions was not required.

Wendy

On Fri, Mar 15, 2019 at 7:14 AM Wackerow, Joachim <Joachim.Wackerow at gesis.org<mailto:Joachim.Wackerow at gesis.org>> wrote:
Many thanks to Dan, Larry, and Wendy for your thoughts.

First, I would like to mention again the frame of this discussion,
Our focus is here: What can we do for Disco like it is currently?
The whole approach of Disco is to focus on a simple subset of DDI Codebook and DDI Lifecycle for Discovery purposes.
It is not a 1:1 representation of DDI Codebook or Lifecycle. It is not related to DDI 4 which is a moving target.
The intention is to finalize Disco not to make Disco as good or better than DDI 4.
Furthermore, any changes shouldn’t be extensive. This wouldn’t be affordable.

Re: ConceptualVariable
I have here a similar thinking as Wendy. For the purpose of Disco, i.e. for searches on specific data, does the ConceptualVariable really add substantial value?
I tend to leave the current Disco structure unchanged.

Re: missing variables for numeric response domain
It looks like an actionable item (math expression) seems to be the right way to go. My impression is that Disco has Representation but doesn’t make a distinction between categorical and numeric representation.
Would the simple approach be that Representation has a property (i.e. missingValue) with the type skos:Concept?
See variable diagram of Disco: https://raw.githubusercontent.com/linked-statistics/disco-spec/master/diagrams/variable.png.

Achim

From: Wendy Thomas [mailto:wlt at umn.edu<mailto:wlt at umn.edu>]
Sent: Donnerstag, 14. März 2019 15:29
To: Wackerow, Joachim
Cc: DDI Structural Reform Working Group.; Zapilko, Benjamin
Subject: Re: [DDI-SRG] [disco] issues on missing values for numeric response domain and on conceptual variable

I think that it is not as important to have that step in the hierarchy in Disco. The purpose of Disco, at least initially, was to facilitate discovery of data and related metadata in an RDF environment. As one can locate and link a concept to an existing variable that is what seems to be important. The value of a Represented Variable in the discovery process is the ability to track variable reuse across iterations of a study or a common variable, such as the U.S. OMB definition of the Race variable across studies. Unless there is some discovery advantage to exposing a Conceptual Variable I don't think its expression in Disco if vital.

Wendy

On Thu, Mar 14, 2019 at 7:29 AM Wackerow, Joachim <Joachim.Wackerow at gesis.org<mailto:Joachim.Wackerow at gesis.org>> wrote:
Benjamin Zapilko and I are currently reviewing the open issues of Disco. The goal is to resolve the issues and to prepare Disco finally for publication.

Now I have questions on two issues:

--
There is an issue on how to describe missing values for a numeric response domain.
Details and my comment see at https://github.com/linked-statistics/disco-spec/issues/130.

My question:
Is there really missing something in Disco? I don’t have the impression. But maybe I misunderstood something.

--
The other issue is that the conceptual variable of DDI 3.2 does not exist in Disco.
The hierarchy is only Variable, RepresentedVariable, skos:Concept.
Details and my comment see at https://github.com/linked-statistics/disco-spec/issues/226.

The whole approach of Disco is to focus on a simple subset of DDI Codebook and DDI Lifecycle for Discovery purposes. It is not a 1:1 representation.

My question:
Is it really important to be able to search for the ConceptualVariable in addition to Variable, RepresentedVariable, and Concept.
Is this addition really worth it? This might result in some work for Disco.
Any thoughts would be helpful.

Thanks
Achim


_______________________________________________
DDI-SRG mailing list
DDI-SRG at icpsr.umich.edu<mailto:DDI-SRG at icpsr.umich.edu>
http://lists.icpsr.umich.edu/mailman/listinfo/ddi-srg


--
Wendy L. Thomas                              Phone: +1 612.624.4389
Data Access Core Director                 Fax:   +1 612.626.8375
Minnesota Population Center             Email: wlt at umn.edu<mailto:wlt at umn.edu>
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455
_______________________________________________
DDI-SRG mailing list
DDI-SRG at icpsr.umich.edu<mailto:DDI-SRG at icpsr.umich.edu>
http://lists.icpsr.umich.edu/mailman/listinfo/ddi-srg


--
Wendy L. Thomas                              Phone: +1 612.624.4389
Data Access Core Director                 Fax:   +1 612.626.8375
Minnesota Population Center             Email: wlt at umn.edu<mailto:wlt at umn.edu>
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455


--
Wendy L. Thomas                              Phone: +1 612.624.4389
Data Access Core Director                 Fax:   +1 612.626.8375
Minnesota Population Center             Email: wlt at umn.edu<mailto:wlt at umn.edu>
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455



More information about the DDI-SRG mailing list