[DDI-SRG] [disco] issues on missing values for numeric response domain and on conceptual variable

Hoyle, Larry larryhoyle at ku.edu
Sat Mar 16 10:36:12 EDT 2019


Given the focus as a concise discovery vocabulary, having all of the details of complex missing value structures shouldn’t be necessary. One would assume that once data is discovered, more detailed machine actionable metadata would be part of the discovery.

From: ddi-srg-bounces at icpsr.umich.edu <ddi-srg-bounces at icpsr.umich.edu> On Behalf Of Wackerow, Joachim
Sent: Saturday, March 16, 2019 5:41 AM
To: Wendy Thomas <wlt at umn.edu>
Cc: DDI Structural Reform Working Group. <ddi-srg at icpsr.umich.edu>; Zapilko, Benjamin <Benjamin.Zapilko at gesis.org>
Subject: Re: [DDI-SRG] [disco] issues on missing values for numeric response domain and on conceptual variable

Wendy,

Yes, this all makes absolutely sense to me. Disco can indicate per code or value if it is missing or not. This seems to be sufficient to represent 2.5. and 3.1.

The question is if it should represent 3.2 in this regard. Is the more advanced description of missing values really important for a data search.

A related change would be some work.

I tend to invent for Disco a section on limitations where it is mentioned that the ConceptualVariable in 3.2 and the advanced description of missing values in 3.2 cannot be represented.

This is my understanding from 3.2

Variable has VariableRepresentation 0..1
VariableRepresentation has ValueRepresentation 0..1 (i.e. CodeRepresentation, NumericRepresentation) and MissingValuesReference 0..1

CodeRepresentation has CodeListReference
CodeListReference points to CodeList
CodeList has Code 0..*

NumericRepresentation has NumberRange 0..*

MissingValuesReference points to ManagedMissingValuesRepresentation
ManagedMissingValuesRepresentation has MissingCodeRepresentation 0..*, MissingNumericRepresentation 0..*
MissingCodeRepresentation has CodeListReference 0..1
MissingNumericRepresentation has NumberRange 0..*

Achim

From: Wendy Thomas [mailto:wlt at umn.edu]
Sent: Freitag, 15. März 2019 15:40
To: Wackerow, Joachim
Cc: DDI Structural Reform Working Group.; Zapilko, Benjamin
Subject: Re: [DDI-SRG] [disco] issues on missing values for numeric response domain and on conceptual variable

RE: missing values
Achim,
As I recall Disco was created prior to our division of substantive and sentinel values. Earlier DDI-Lifecycle used something similar to Codebook where you could designate blank as missing and specify which values were missing either by listing in an attribute (3.1) or indicating if a value in a catVal was determined to be missing. If Disco covers that it will work with all versions of codebook and lifecycle. When we added the ability to show missing values as a separate representation we did not remove the short hand approach so that major surgery on earlier versions was not required.

Wendy

On Fri, Mar 15, 2019 at 7:14 AM Wackerow, Joachim <Joachim.Wackerow at gesis.org<mailto:Joachim.Wackerow at gesis.org>> wrote:
Many thanks to Dan, Larry, and Wendy for your thoughts.

First, I would like to mention again the frame of this discussion,
Our focus is here: What can we do for Disco like it is currently?
The whole approach of Disco is to focus on a simple subset of DDI Codebook and DDI Lifecycle for Discovery purposes.
It is not a 1:1 representation of DDI Codebook or Lifecycle. It is not related to DDI 4 which is a moving target.
The intention is to finalize Disco not to make Disco as good or better than DDI 4.
Furthermore, any changes shouldn’t be extensive. This wouldn’t be affordable.

Re: ConceptualVariable
I have here a similar thinking as Wendy. For the purpose of Disco, i.e. for searches on specific data, does the ConceptualVariable really add substantial value?
I tend to leave the current Disco structure unchanged.

Re: missing variables for numeric response domain
It looks like an actionable item (math expression) seems to be the right way to go. My impression is that Disco has Representation but doesn’t make a distinction between categorical and numeric representation.
Would the simple approach be that Representation has a property (i.e. missingValue) with the type skos:Concept?
See variable diagram of Disco: https://raw.githubusercontent.com/linked-statistics/disco-spec/master/diagrams/variable.png<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fraw.githubusercontent.com%2Flinked-statistics%2Fdisco-spec%2Fmaster%2Fdiagrams%2Fvariable.png&data=02%7C01%7Clarryhoyle%40ku.edu%7Cb9a6c8dada10417eda3508d6a9fbedbc%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636883296827961191&sdata=r3xxHodreMsC%2F94gi9X%2F%2BWb0IdFqTJd50nn6a%2Fx9WGQ%3D&reserved=0>.

Achim

From: Wendy Thomas [mailto:wlt at umn.edu<mailto:wlt at umn.edu>]
Sent: Donnerstag, 14. März 2019 15:29
To: Wackerow, Joachim
Cc: DDI Structural Reform Working Group.; Zapilko, Benjamin
Subject: Re: [DDI-SRG] [disco] issues on missing values for numeric response domain and on conceptual variable

I think that it is not as important to have that step in the hierarchy in Disco. The purpose of Disco, at least initially, was to facilitate discovery of data and related metadata in an RDF environment. As one can locate and link a concept to an existing variable that is what seems to be important. The value of a Represented Variable in the discovery process is the ability to track variable reuse across iterations of a study or a common variable, such as the U.S. OMB definition of the Race variable across studies. Unless there is some discovery advantage to exposing a Conceptual Variable I don't think its expression in Disco if vital.

Wendy

On Thu, Mar 14, 2019 at 7:29 AM Wackerow, Joachim <Joachim.Wackerow at gesis.org<mailto:Joachim.Wackerow at gesis.org>> wrote:
Benjamin Zapilko and I are currently reviewing the open issues of Disco. The goal is to resolve the issues and to prepare Disco finally for publication.

Now I have questions on two issues:

--
There is an issue on how to describe missing values for a numeric response domain.
Details and my comment see at https://github.com/linked-statistics/disco-spec/issues/130<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flinked-statistics%2Fdisco-spec%2Fissues%2F130&data=02%7C01%7Clarryhoyle%40ku.edu%7Cb9a6c8dada10417eda3508d6a9fbedbc%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636883296827971200&sdata=iWE1RqbMLO3jU%2Fpy87MP1rzDqWD2Ys1za1zOGth0HfE%3D&reserved=0>.

My question:
Is there really missing something in Disco? I don’t have the impression. But maybe I misunderstood something.

--
The other issue is that the conceptual variable of DDI 3.2 does not exist in Disco.
The hierarchy is only Variable, RepresentedVariable, skos:Concept.
Details and my comment see at https://github.com/linked-statistics/disco-spec/issues/226<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flinked-statistics%2Fdisco-spec%2Fissues%2F226&data=02%7C01%7Clarryhoyle%40ku.edu%7Cb9a6c8dada10417eda3508d6a9fbedbc%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636883296827981209&sdata=KUIB9DKvQSDFouCe%2FlDTWEbgQzEeTJV3gdtM5R7bjTw%3D&reserved=0>.

The whole approach of Disco is to focus on a simple subset of DDI Codebook and DDI Lifecycle for Discovery purposes. It is not a 1:1 representation.

My question:
Is it really important to be able to search for the ConceptualVariable in addition to Variable, RepresentedVariable, and Concept.
Is this addition really worth it? This might result in some work for Disco.
Any thoughts would be helpful.

Thanks
Achim


_______________________________________________
DDI-SRG mailing list
DDI-SRG at icpsr.umich.edu<mailto:DDI-SRG at icpsr.umich.edu>
http://lists.icpsr.umich.edu/mailman/listinfo/ddi-srg<https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.icpsr.umich.edu%2Fmailman%2Flistinfo%2Fddi-srg&data=02%7C01%7Clarryhoyle%40ku.edu%7Cb9a6c8dada10417eda3508d6a9fbedbc%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636883296827981209&sdata=Tfm33BXTM9xFHEv04jLrXCm%2B44vGhPIFCHnEbE303xE%3D&reserved=0>


--
Wendy L. Thomas                              Phone: +1 612.624.4389
Data Access Core Director                 Fax:   +1 612.626.8375
Minnesota Population Center             Email: wlt at umn.edu<mailto:wlt at umn.edu>
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455
_______________________________________________
DDI-SRG mailing list
DDI-SRG at icpsr.umich.edu<mailto:DDI-SRG at icpsr.umich.edu>
http://lists.icpsr.umich.edu/mailman/listinfo/ddi-srg<https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.icpsr.umich.edu%2Fmailman%2Flistinfo%2Fddi-srg&data=02%7C01%7Clarryhoyle%40ku.edu%7Cb9a6c8dada10417eda3508d6a9fbedbc%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636883296827991218&sdata=7Q9NgxaNiqrVq9Ahh1XAbgCBFbK7HBvzp60O5wrVYxw%3D&reserved=0>


--
Wendy L. Thomas                              Phone: +1 612.624.4389
Data Access Core Director                 Fax:   +1 612.626.8375
Minnesota Population Center             Email: wlt at umn.edu<mailto:wlt at umn.edu>
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.icpsr.umich.edu/pipermail/ddi-srg/attachments/20190316/36d62356/attachment-0001.html 


More information about the DDI-SRG mailing list