[DDI-users] data dissemination..how "raw" should lab data be?

Hoyle, Larry larryhoyle at ku.edu
Thu Feb 17 09:51:43 EST 2011


A variant of this question came up in the discussions about longitudinal data at Dagstuhl. At a stage of a project when data are used for something published, it would be good practice to preserve a copy of the data as used. In your example if publications were based on analyses using changed/imputed data then those data are of interest. For someone interested in the SAS code used to do those changes and possible other imputation methods, the raw data used by that code is of interest.

 

Larry Hoyle
Senior Scientist
Institute for Policy & Social Research, University of Kansas
1541 Lilac Lane, Blake 607
Lawrence, KS  66045-3129

http://www.ipsr.ku.edu


> -----Original Message-----
> From: ddi-users-bounces at icpsr.umich.edu [mailto:ddi-users-
> bounces at icpsr.umich.edu] On Behalf Of Bob McConnaughey
> Sent: Tuesday, February 15, 2011 1:56 PM
> To: ddi-users at icpsr.umich.edu
> Cc: Wilcox, Allen (NIH/NIEHS) [E]
> Subject: [DDI-users] data dissemination..how "raw" should lab data be?
> 
> Dear folks:
>   We're working on making the key data files for one of our major long term
> studies @ the EPI branch of the NIEHS public, at this point as I and the
> original researchers are getting older, sooner rather than later.  The Early
> Pregnancy Study, not atypically, has multiple components - both several sets
> of interview instruments but also a good deal of lab data, as our subjects not
> only were willing to provide detailed reproductive diaries but also daily urine
> samples.  Initially the urines were assayed for LH (luteinizing hormone) - the
> gold standard for ovulation, but one that's easy to miss.  Soon, thereafter,
> the urines were assayed for other hormonal metabolites (estradiol,
> progesterone, hCG; now the urines are being sent to CDC for BPA and
> pthalate assays..).  In analyses, over time, basic statistical handling of the
> variables has changed .  In general our analyses switched, quite early on, to
> using geometric means rather than arithmetic means.  But my inclination is
> to present the data "as is."
> 
> For hCG, for instance, for any given day, there might be 2,3 or 4 replicates.
> So someone else wanting to use the data could, if they so desire, use other
> averaging techniques.
> 
> id          week         day              date                    hcG1          hcG2        hcG3
> pilot          hcG4         geo_mean hCG
> 
> xx 7 4 03/17/83 0 0 . 1 . 0.01
> 
> xx 7 5 03/18/83 0 0 . 1 . 0.01
> 
> xx 7 6 03/19/83 0 0 . 1 . 0.01
> 
> xx 7 7 03/20/83 0.03 0.02 . 1 . 0.0244948974 (just fyi...indicative of
> conception)
> 
> xx 8 1 03/21/83 0.044 0.053 . 1 . 0.0482907859
> 
> xx 8 2 03/22/83 0.162 0.19 . 1 . 0.1754422982
> 
> xx 8 3 03/23/83 0.145 0.175 . 1 . 0.1592953232
> 
> 
> we have 27000+ days of urine samples, multiple assays, with replicates, for
> most days.
> 
> 
> For our purposes, we've created composite files that put all the different
> assays together, in a few instances the final analysis files use imputed values,
> or lab values that are clearly wack set to missings.   In general, how would
> people be inclined to make this type of data public?  One file for hCG?
> another for PdG, E1G and CRT  and another for LH? or merge them all
> together into one synthetic file   There are also subsets of women who've
> had other assay suites done on their pee.  The original lab data arrived in a
> variety of ways: lab forms that were double keyed, transmitted over modem,
> FAR too many lotus and Quattro Pro spreadsheets and *.prn files and there
> seems no compelling reason I can imagine to make this morass of "ur-data"
> public??
> 
> 
> There're also files, really long snippets of SAS code, that were used when
> researchers determined that changes/imputes were appropriate (ie,
> recalibrating assay thresholds) and these could be made available for others
> to use or not use as they wish.
> 
> 
> 
> 
> Our composite files also contain  demographic, cycle specific and study
> outcome information; my inclination, again, is to distangle these data and
> make them available in separate files.
> 
> Thanks for any and all suggestions
> 
> http://www.niehs.nih.gov/research/atniehs/labs/epi/studies/eps/index.cfm
> 
> Bob McConn....
> 
> "She is at the brink of never being hurt again but pauses to say, All of us.
> Every blade of grass."
> from Kuan Yin, Laura Fargas




More information about the DDI-users mailing list