[DDI-users] When the Author of a DDI isn't the Archive

Mark R. Diggory mdiggory at latte.harvard.edu
Thu Sep 1 10:07:06 EDT 2005


Thanks Reto,

Reto Hadorn wrote:
> 
> 
> 
> Hi Mark,
> 
> 
> docDscr expressed the conception of the codebook as a document, which 
> was created by an identifiable service and/or person. The document based 
> XML expression of the metadata model could only confirm that approach.

Absolutely

> In the new life-cycle approach, the metadata may pass through several 
> systems, be modified by still more actors, and published in several 
> forms. The metadata model is not any more just the base for a document 
> which will make visible the work of the publishing data archive, but a 
> resource for many people and services along the life cycle of the data.

This is very true, however, our challenge on the VDC project is to still 
maintain production systems based on the existing DDI 2.0 and capture 
the provenance trail still within our federation. Until we have the 3.0 
metamodel and corresponding XML schemas, we are restricted to using the 
2.x capabilities to accomplish this. Which we find somewhat limiting.

> 
> Hence we need a new concept, which could be 'action', defined by an 
> author, an action, a process etc. (to be further analysed!) and attached 
> to various objects in the metadata model depending on the type of action 
> (research project, study design, dataset, variable, questionnaire etc. 
> etc.). 'Publication' could be one of those actions, where the kind of 
> publication should be more precisely defined (codebook, question base, 
> Nesstar server, VDC etc.). The creation of the XML file would not be the 
> founding act of the metadata publication, since the latter can take many 
> forms, based on various implementations of the metadata model (database, 
> XML file). Metadata entry, data processing, consistency controls, import 
> of external variables, construction of variables may be other possible 
> actions.

Yes, again I agree, having this approach clearly defined would allow a 
strong recording of the provenance trail and any changes that occurred 
during the studies existence. Currently, in 2.0 - I hypothesize that 
this was possibly what "Link" was vaguely meant to be able to capture.

For instance, say my archive ingests your study (metadata+data). In our 
process of doing this we change the format of your original dataFiles to 
work within our system. In theory, we would include a new docDscr for 
our archive (preserving any old ones), if this docDscr had an ID, then 
we could then identify that we did this in the fileDscr (via fields like 
"notes" or using the "Link" IDREF field to point back to our docDscr.

A point of uncertainty is that we would actually be "replacing" fileDscr 
content with our own. Its important to fully document this change of 
information because there is a removal of metadata as much as an 
inclusion of our own.

> 
> Perhaps this approach can be combined with a general solution for 
> various roles, in which persons and services are mentioned in the 
> metadata model ('author', funding agency, distributor). There is still 
> some analysis work ahead...

Yes, its difficult because the current DDI is both restrictive in the 
definition of what a "source" is and it is very undefined in how to 
reference this "source" throughout the rest of the model. This "cries 
out" for a more clearly defined and capable mechanism.

Cheers,
Mark

> 
> 
> Best wishes
> 
> Reto
> 
> 
> 
> 
> 
> At 31.08.2005, you wrote:
> 
>> Hello DDI community,
>>
>> I'm working on a project for the VDC right now which I could use some 
>> advice.
>>
>> In a DDI the "docDscr" section is reserved for both the Author to 
>> describe bibliographic information about the DDI-compliant document 
>> itself as a whole (producer) and for the archive to describe the 
>> document in their system (archive).
>>
>> I see often that tools use this to propagate information about the 
>> creation of the DDI (for instance Nesstar Publisher inserts info about 
>> the software used to author the document). In a VDC we do likewise, 
>> injecting a docDscr into any imported DDI content describing the 
>> Instance within our system.
>>
>> My question is the following: When the Author of a DDI isn't the 
>> Archive, for instance, because it was produced external of the archive 
>> but imported into the archive, should the Archive insert a DocDscr of 
>> its own as well? Should it preserve the original docDscrs? These may 
>> seem mundane or obvious questions, but there isn't any real policy 
>> suggesting such.
>>
>> In the case that we simply maintain docDscr for each, they can be 
>> separated conceptually by their source="archive|producer". Is this 
>> enough to maintain separate distinction? Is this enough to maintain 
>> provenance? What if there are other other archival docDscrs present, 
>> is there a policy for their preservation? How do we reuse them in our 
>> system to maintain the provenance chain?
>>
>> Finally, when attempting to document a chain of provenance on the 
>> study, such information is very important. Not only am I referring to 
>> the path this instance took to get into my archive, but also important 
>> Acknowledgment, Access and  Usage policies which the originators would 
>> require to be enforced by my archive. While its interesting to 
>> preserve these things, its extremely difficult to use them for machine 
>> actionable purposes. My challenge to the group is how do we support 
>> such "machine actionable" capabilities in such a case. In my opinion, 
>> this is an area that the DDI is very weak in functioning. What are 
>> your experiences with using the docDscr to really maintain archival 
>> information that is useful by our systems?
>>
>> Looking forward to you opinion on the subject,
>> -Mark
>> _______________________________________________
>> DDI-users mailing list
>> DDI-users at icpsr.umich.edu
>> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users
> 
> 
> _______________________________________________
> DDI-users mailing list
> DDI-users at icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users


More information about the DDI-users mailing list