[DDI-users] date structures/formats for DDI

Mary Vardigan vardigan at umich.edu
Thu Apr 21 10:29:32 EDT 2011


Bob,

 

Regarding your last question about the form in which to release the data
publicly, I checked with data managers here at ICPSR and they think you
are exactly right to publish the cleaned, raw data first. 

 

Regards,

Mary

 

From: ddi-users-bounces at icpsr.umich.edu
[mailto:ddi-users-bounces at icpsr.umich.edu] On Behalf Of Bob McConnaughey
Sent: Wednesday, April 20, 2011 3:37 PM
To: ddi-users at icpsr.umich.edu
Subject: [DDI-users] date structures/formats for DDI

 

When one hopes to make available multiple sets of assays carried out
over the course of a study, what is the preferred approach to handling
dates?  In the Early Pregnancy Study each participant provided a daily
diary over the course of her participation (up to 6 months if she didn't
conceive whilst in the study) as well as daily first morning urine
samples which have been subsequently assayed for everything from
hCG/e1g/progesterone/luteinizing hormone/creatinine to BPA (well, the
urines are being sent off to CDC for BPA and phthalates assays as i
write).  For most of our analyses the datasets have been created by
merging by id and date of the sample and date of the diary entry.  When
I looked at the MIDUS documentation it appears that the dates are broken
down into m/d/y variables and if one wanted to extract  and later match
in subject/dates the user would (probably) create new date variables in
whatever package they wanted to use?  

 

If one wants (as we do, eventually) to make these data public, I
initially thought that having the date encoded as a character string in
ISO6801 yyyy-mm-dd would be the way to uniquely identify id/day pairs.
Our ids are numeric and the string would obviously be ..character.
While i'm used to being able to merge by any and all types of variables
in SAS, is it the case that in DDI usage identifier "dyads" need to be
of the same variable type..all numeric or all character?  Or is this a
quirk of  the Nesstar server?  Since, at the moment, i'm playing w/ the
Nesstar server as a potential mechanism for allowing people access to
subsets of our data, that's the only "dissemination" mechanism with
which i'm the least bit familiar - i'm certainly open to other options
(as long as they allow us to "keep" the data - i'm sure there's no way
in he** that the PIs would want their data hosted, say, at the SDA site
@ Berkeley).   

 

I could have all dates available as SAS "raw" date numbers...as well as
formatted values and the unformatted date values could be used for
merges etc.  Or i could go back and create MDY variables for each day,
or create a "character" copy of the ID variables.  Is there a
"preferred" mode of presentation for this sort of data?

 

And ...a more general question.  What I'm doing, at the moment, is
converting all the various original questionnaire data files and the
original lab files into separate data files.  Of course i have some
massive files that are built up from merging much of the original data,
new variables that we've created and the synthetic files are set up in a
variety of different structures.  While none of the original data was
organized around menstrual cycles, in fact, most of the time, the basic
"unit" of analysis was a menstrual cycle.  So we have some files that
are ~ 28000 records long that include all the assay as well as date
information, variables specific to each cycle etc;  other analysis files
that have 740 records (the number of menstrual cycles observed in the
course of the study; and others (ie the questionnaire files) which have
231 or fewer records (the N of women who provided the intake and then
those who "qualified" for the various in and post study questionnaires
).   At some point i'd assume some of the major "synthetic" files will
be made part of the data that others can access, but my basic "feeling"
is that what should be made available, at least initially, is the
cleaned, "raw" data that we started with.  Again..is this the more or
less kosher approach to setting data/studies go out in the wild?

 

as usual, thanks to all.

Bob McConnaughey PhD
Westat/NIEHS epidemiology branch contractor
919-941-8300 (w)
919-542-5653 (h)

"Well, I too would be capable of killing for a book."
"I wouldn't recommend it.  That's how it starts. Murder doesn't seem
like a big deal, but then you end up lying, voting in elections, things
like that"
Perez-Reverte - The Club Dumas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.icpsr.umich.edu/pipermail/ddi-users/attachments/20110421/d400bc74/attachment.html 


More information about the DDI-users mailing list