[DDI-users] data dissemination..how "raw" should lab data be?
Bob McConnaughey
bobmcconn at gmail.com
Tue Feb 15 14:55:47 EST 2011
Dear folks:
We're working on making the key data files for one of our major long term
studies @ the EPI branch of the NIEHS public, at this point as I and the
original researchers are getting older, sooner rather than later. The Early
Pregnancy Study, not atypically, has multiple components - both several sets
of interview instruments but also a good deal of lab data, as our subjects
not only were willing to provide detailed reproductive diaries but also
daily urine samples. Initially the urines were assayed for LH
(luteinizing hormone) - the gold standard for ovulation, but one that's easy
to miss. Soon, thereafter, the urines were assayed for other hormonal
metabolites (estradiol, progesterone, hCG; now the urines are being sent to
CDC for BPA and pthalate assays..). In analyses, over time, basic
statistical handling of the variables has changed . In general our analyses
switched, quite early on, to using geometric means rather than arithmetic
means. But my inclination is to present the data "as is."
For hCG, for instance, for any given day, there might be 2,3 or 4
replicates. So someone else wanting to use the data could, if they so
desire, use other averaging techniques.
id week day date hcG1
hcG2 hcG3 pilot hcG4 geo_mean hCG
xx 7 4 03/17/83 0 0 . 1 . 0.01
xx 7 5 03/18/83 0 0 . 1 . 0.01
xx 7 6 03/19/83 0 0 . 1 . 0.01
xx 7 7 03/20/83 0.03 0.02 . 1 . 0.0244948974 (just fyi...indicative of
conception)
xx 8 1 03/21/83 0.044 0.053 . 1 . 0.0482907859
xx 8 2 03/22/83 0.162 0.19 . 1 . 0.1754422982
xx 8 3 03/23/83 0.145 0.175 . 1 . 0.1592953232
we have 27000+ days of urine samples, multiple assays, with replicates, for
most days.
For our purposes, we've created composite files that put all the different
assays together, in a few instances the final analysis files use imputed
values, or lab values that are clearly wack set to missings. In general,
how would people be inclined to make this type of data public? One file for
hCG? another for PdG, E1G and CRT and another for LH? or merge them all
together into one synthetic file There are also subsets of women who've
had other assay suites done on their pee. The original lab data arrived in
a variety of ways: lab forms that were double keyed, transmitted over modem,
FAR too many lotus and Quattro Pro spreadsheets and *.prn files and there
seems no compelling reason I can imagine to make this morass of "ur-data"
public??
There're also files, really long snippets of SAS code, that were used when
researchers determined that changes/imputes were appropriate
(ie, recalibrating assay thresholds) and these could be made available for
others to use or not use as they wish.
Our composite files also contain demographic, cycle specific and study
outcome information; my inclination, again, is to distangle these data and
make them available in separate files.
Thanks for any and all suggestions
http://www.niehs.nih.gov/research/atniehs/labs/epi/studies/eps/index.cfm
Bob McConn....
"She is at the brink of never being hurt again
but pauses to say, All of us. Every blade of grass."
from *Kuan Yin*, Laura Fargas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.icpsr.umich.edu/pipermail/ddi-users/attachments/20110215/463b0733/attachment.html
More information about the DDI-users
mailing list