[DDI-users] data dissemination..how "raw" should lab data be?

Bob McConnaughey bobmcconn at gmail.com
Tue Feb 15 14:55:47 EST 2011


Dear folks:
  We're working on making the key data files for one of our major long term
studies @ the EPI branch of the NIEHS public, at this point as I and the
original researchers are getting older, sooner rather than later.  The Early
Pregnancy Study, not atypically, has multiple components - both several sets
of interview instruments but also a good deal of lab data, as our subjects
not only were willing to provide detailed reproductive diaries but also
daily urine samples.  Initially the urines were assayed for LH
(luteinizing hormone) - the gold standard for ovulation, but one that's easy
to miss.  Soon, thereafter, the urines were assayed for other hormonal
metabolites (estradiol, progesterone, hCG; now the urines are being sent to
CDC for BPA and pthalate assays..).  In analyses, over time, basic
statistical handling of the variables has changed .  In general our analyses
switched, quite early on, to using geometric means rather than arithmetic
means.  But my inclination is to present the data "as is."
For hCG, for instance, for any given day, there might be 2,3 or 4
replicates.  So someone else wanting to use the data could, if they so
desire, use other averaging techniques.

id          week         day              date                    hcG1
   hcG2        hcG3       pilot          hcG4         geo_mean hCG

xx 7 4 03/17/83 0 0 . 1 . 0.01

xx 7 5 03/18/83 0 0 . 1 . 0.01

xx 7 6 03/19/83 0 0 . 1 . 0.01

xx 7 7 03/20/83 0.03 0.02 . 1 . 0.0244948974 (just fyi...indicative of
conception)

xx 8 1 03/21/83 0.044 0.053 . 1 . 0.0482907859

xx 8 2 03/22/83 0.162 0.19 . 1 . 0.1754422982

xx 8 3 03/23/83 0.145 0.175 . 1 . 0.1592953232

we have 27000+ days of urine samples, multiple assays, with replicates, for
most days.

For our purposes, we've created composite files that put all the different
assays together, in a few instances the final analysis files use imputed
values, or lab values that are clearly wack set to missings.   In general,
how would people be inclined to make this type of data public?  One file for
hCG? another for PdG, E1G and CRT  and another for LH? or merge them all
together into one synthetic file   There are also subsets of women who've
had other assay suites done on their pee.  The original lab data arrived in
a variety of ways: lab forms that were double keyed, transmitted over modem,
FAR too many lotus and Quattro Pro spreadsheets and *.prn files and there
seems no compelling reason I can imagine to make this morass of "ur-data"
public??

There're also files, really long snippets of SAS code, that were used when
researchers determined that changes/imputes were appropriate
(ie, recalibrating assay thresholds) and these could be made available for
others to use or not use as they wish.


Our composite files also contain  demographic, cycle specific and study
outcome information; my inclination, again, is to distangle these data and
make them available in separate files.

Thanks for any and all suggestions

http://www.niehs.nih.gov/research/atniehs/labs/epi/studies/eps/index.cfm

Bob McConn....

"She is at the brink of never being hurt again
but pauses to say, All of us.  Every blade of grass."
from *Kuan Yin*, Laura Fargas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.icpsr.umich.edu/pipermail/ddi-users/attachments/20110215/463b0733/attachment.html 


More information about the DDI-users mailing list