[DDI-users] Re: [DDI-SRG] When the Author of a DDI isn't the
Archive
Mark R. Diggory
mdiggory at latte.harvard.edu
Fri Sep 9 08:49:44 EDT 2005
Wendy, we have 3 cases to deal with here so I'll try to respond for each
case:
Case 1 : We are importing an existing DDI into a VDC Server.
Case 2 : We are importing a MARC record into a VDC Server
(which generates a new DDI in the VDC Server)
Case 3 : A VDC Server in the Federation is Harvesting an existing OAI
service. (caching a copy of the DDI within its Index/Repository)
Wendy Thomas wrote:
> Mark,
>
> There are currently 3 sections that identify the source, intellectual
> content and current XML instance in the DDI. Section 1.1 describes the
> current document. If all your system is doing is importing an extant XML
> instance verbatim then generally the only thing that would change in this
> section is the holdings and deposit information.
I think the question is "Are we putting this 'original' in our archive,
or are we creating a 'derivative' of the 'original' in our archive?". In
some of our cases its the former (3) in others its the later (2) and
others its very unclear which it is (1).
(1) It seems that in this case, the Server would at least generate a new
docDscr, describing the entire event of the import, this includes as you
state, holdings and deposit information. It is important to preserve the
original information for provenance and archival reasons. So I'm not
suggesting appending these in the original docDscr. There is also an
issue that theres more than holdings and deposit info, and theres more
than one archive to be tracked, so all this information needs to be
organized per archive.
(2) This is much easier, a docDscr is generated for the study, like
above, but the curator is the producer of the instance. the docSrc can
be used to document the original MARC record.
(3) In the case of a Harvesting of metadata from one VDC to another for
search and discovery purposes. Again, there is a requirement of
preservation of the original, while documenting its inclusion into the
new server. In our "Harvesting" case, all the original holdings still
persist and are accessible, all the fileDscr/otherMat URI's still point
to the original resources, this is just a mirroring of that information
on the new system. This is where the challenge of altering the original
really becomes problematic to me. If we insert a docDscr representing
the harvest into our archive, we've in a sense altered the "instance"
simply to document that it was added to our archive. I'm unsure if this
is necessary.
> If you are creating an
> XML instance from another source this it the section you would use to
> record the "authorship" of the XML version. Your source documents would go
> in section 1.2. This would include an XML version where you made changes
> or incorporated additional materials when importing an XML. The original
> XML would be cited in section 1.2.
Do you mean 1.3 (docSrc) not 1.2 (guide) ?
This issue is that in the case that this is actually another DDI that we
are using to generate the DDI in the system, there would already
possibly be a docDscr/docSrc present.
It seems wierd to alter the original existing docDscr, moving its
citation contents into a docSrc section, some information resides
outside of the docDscr/citation, and the place to preserve this metadata
(guide, status, notes, other docSrc's) is unclear.
It seems to me that preservation of the original structure is of the
utmost importance, that the unit of administrative metadata needs to be
as encapsulated as much as possible. The docDscr appears to be such a
unit of encapsulation, whereas I stated above, theres unclarity as to do
with the rest of the content if "citation" is the unit of encapsulation.
If it is the case that this docDscr structure is preserved and that new
archives insert new docDscrs, then the origin of the document can be
consistently tracked through hetergenious archive systems.
>
> As Reto suggests this seems inadequate for documenting the modular and
> sub-modular level changes occuring in a life cycle model.
This is more a discussion for 3.x than a discussion about current usage:
I do agree, at first I thought it is possible to combine this with the
usage of Link elements to attain some degree of change documentation.
But it is very poor form.
> It is the
> primary reason for providing the reusable class "citation" to allow for
> tracking the provinance and source of descrete pieces of information.
As Reto points out (and I've discovered through trying to use it) this
"citation" thing is just not enough.
> There is a need to track the path the XML followed to get to its present
> state. This should be kept in the "archive" module as it is dynamic in
> nature and varies archive by archive. Right now this module is not well
> defined. It is definately something that uses will need to hammer on a bit
> when testing out the proposed 3.0 structure.
One idea I have for 2.x is that we could redefine the definition of
"@source" to be an IDREF (or a "reference" in 3.0) which points at a
"docDscr". So as instead of having @source="archive|producer", there is
a docDscr for each and every "editor" of the document, and then
modifications to content can be at least linked back to the editor.
For example:
<codeBook>
<docDscr id="producer">...</docDscr>
<docDscr id="archive1">...</docDscr>
<docDscr id="archive2">...</docDscr>
...
<fileDscr source="producer">...</otherMat>
<fileDscr source="archive1">...</otherMat>
<fileDscr source="archive2">...</otherMat>
Another even better idea I have for 3.x is something like the change
tracking found in Open Office. It's format has a "Change Tracking"
strategy which may be of interest to the SRG group.
it contains the following elements
<!-- elements for change tracking -->
<!ELEMENT text:change EMPTY>
<!ATTLIST text:change text:change-id CDATA #REQUIRED>
<!ELEMENT text:change-start EMPTY>
<!ATTLIST text:change-start text:change-id CDATA #REQUIRED>
<!ELEMENT text:change-end EMPTY>
<!ATTLIST text:change-end text:change-id CDATA #REQUIRED>
<!ELEMENT text:tracked-changes (text:changed-region)*>
<!ATTLIST text:tracked-changes text:track-changes %boolean; "true">
<!ATTLIST text:tracked-changes text:protection-key CDATA #IMPLIED>
<!ELEMENT text:changed-region (text:insertion |
(text:deletion, text:insertion?) |
text:format-change) >
<!ATTLIST text:changed-region text:id ID #REQUIRED>
<!ATTLIST text:changed-region text:merge-last-paragraph %boolean; "true">
<!ELEMENT text:insertion (office:change-info, %sectionText;)>
<!ELEMENT text:deletion (office:change-info, %sectionText;)>
<!ELEMENT text:format-change (office:change-info)>
A reference can be found here:
http://xml.openoffice.org/source/browse/xml/xmloff/dtd/text.mod?rev=1.57.220.1&content-type=text/vnd.viewcvs-markup
This used in conjunction with identification of the change authors in
the archive section provides a means of tracking changes done in the DDI
over time, however, this is clearly something which an application would
need to provide support for as it would be allowed almost everywhere
within the DDI.
Having changes documented in an alternate namespace does provide easy
identification for adding and stripping them out during
rendering/processing if necessary.
-Mark
More information about the DDI-users
mailing list