[DDI-users] new DDI related package in R

Wed Jul 30 07:34:12 EDT 2014

Dear All,

I would like to announce that I have just submitted a new package to
CRAN, called "DDIwR" (from DDI with R).

At this very early version, it contains only two functions:
- getMetadata() - to read DDI codebooks from .xml files
- setupfile() to write specific setup files for SPSS, Stata, SAS and R.

Given my complete lack of knowledge in Stata and SAS, it is likely to
get these (generated) files improved in subsequent versions. User
feedback, naturally, is welcome.

For the R setup files, I have implemented what I believe it is the
"universally" acceptable way of treating the missing values. When the
setup file is run, it will read the .csv file to create an R data
frame, and apply four additional attributes:
- "variable labels"
- "value labels"
- "unique id"
- "missing types"

The first two are self explanatory, and the attribute "unique id"
contains the name of the variable which contains the unique case
identifiers.

The last attribute "missing types" is a list of this structure
(arbitrary values):

$V1
$V1$types
$V1$types$`Not applicable`
[1] -7
$V1$types$`Not answered`
[1] -9
$V1$types$`Missing by design`
[1] 97 98 99

$V1$cases
$V1$cases$`Not applicable`
  [1]    23   272   281   349   352
$V1$cases$`Not answered`
  [1]    58   176   312   388   412
$V1$cases$`Missing by design`
  [1]    12   213   215   293   531

The first component contains the values (or ranges of values) in the
raw data, for each missing "type". The second component contains the
cases in the variable V1 which were recoded to the standard missing
value in R (namely NA), in fact the case "identifiers" found in the
variable mentioned in the attribute "unique id".

I have also written (but not released yet) functions to write and
update an XML file. They are very early drafts, though already
functional. In the case of updating an XML function, I would also need
suggestions as to what an "update" is:
- two different XML files to be combined
and / or
- add statistical information to an existing XML file
and / or
- modify informations based on a later version of the data, etc.

For the moment, I am able to add statistical information if the raw
data is provided. All current information is preserved, and specific
tags are added. Once I get this function to a more mature version, it
would be easier to produce a Lifecycle version.

The package will appear to CRAN in a couple of days, and it needs some
time to reach your closest mirror.

The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement no. 262608 (DwB - Data without Boundaries).

Best wishes,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania