[DDI-users] A "home" for the DDI

Mark R. Diggory ddi-users@icpsr.umich.edu
Wed, 16 Jul 2003 18:48:43 -0400


I-Lin,

Thank you for your reply. I'll intermix my comments within the body of 
your response.

ikuo@icpsr.umich.edu wrote:
> Hi Mark,
> 
> The following is my own opinion and does not necessarily reflect the opinion of 
> my employer.
> 
> While I agree with the overall goal of greater transparency in the DDI 
> development process, I question whether Sourceforge and/or CVS are the means to 
> achieve it.
> 
> CVS is a version control system. It solves the problem of managing and tracking 
> changes when  multiple parties are working on a single project. That is a 
> common problem of large software development projects, and that is a problem 
> that SourceForge has tackled successfully.
> 
> SourceForge & CVS are tools that have been used to successfully manage software 
> development. However, to say that these tools may therefore be used to 
> successfuly manage DDI development is to make an erroneous analogy. The DDI is 
> a standard, not software, and standards development has different kinds of 
> problems from software development. 
> 

Noting that I developed the w3c Schema translation of DDI DTD for the
current versions now published on the ICPSR site, I feel from my
technical experience with this subject, that I can strongly disagree
with you above conclusion on a number of points:

1.) I think its important to separate the "technical implementations" of
a standard from the "conceptual definition" of the standard itself.

While it is wise to have committees decide on the "conceptual content" 
of the specification it is unwise to "forfeit" the various technical
representations of the specification that can come into existence (w3c
Schema, RelaxNG, DTD, etc) entirely over to a "conceptual committee's"
decision making process. One can easily write "n" number of w3c schema
implementations that adhere to the current DDI specification, all would
be valid, and would validate DDI documents correctly, and all could be 
of drastically different structure.

Until a "technical committee" establishes that a technical 
implementation of the standard is the "be all, end all" for that medium, 
there is going to be much room for debate, discussion and 
decision-making. I would not begin to suggest that my current w3c Schema 
implementations meet any ideal "technical criteria" beyond being able to 
correctly validate the same content as their corresponding DTD. They 
certainly fall short in the area of taking full advantage of the 
capabilities that differentiate xsd from dtd, such as type restriction 
on attribute values.

2.) Considering the above situation, versioning is critical to managing
the technical implementations of the DDI above and beyond its conceptual
versioning. While the currently generated w3c schemas on the DDI site do
meet the DDI specification, I can tell you right now, there are a number
of errors which I have corrected in my own versions of the w3c schema
that are not reflected in the current versions released on the ICPSR DDI
site. These were only discovered through discussion and interaction
between Matthew Richardson, Sanda Ionescu and myself. We have had a few
discussions concerning where to track these changes and where to house
these w3c schema copies/versions. Clearly there is a technical and
development related versioning issue here that is above that of
conceptual versioning. I would contest that this should be appropriately
tracked, I my opinion CVS is the best tool to manage this.

3.) As someone who has worked with CVS extensively, I cannot say enough 
about the capability of being able to track all changes, both in 
committee decision making and in technical revisioning. I think the 
large amount of verbose commenting about changes between versions at the 
head of the DTD's could be managed from within such a system, logged and 
published on a site appropriate for such content, instead of bloating 
the DTD files over time with such an excess of documentation. Both 
Matthew and I have worked on documentation free versions of these files 
for exchange over the internet. Certainly downloading these 
documentation comments over and over constitutes little need in the real 
world where these spec's are used strictly for machine validation. In 
fact, such bloat only will consume bandwidth on the hosting institutions 
gateway/network.

4.) I required the development of a w3c Schema for many specific reasons 
to deal with the limitations in the integration of DTD based XML 
content. OAI's Harvesting Protocol being the largest requirement for xsd 
based validation and a central location for DDI w3c schema's. It would 
be a false statement to suggest that these w3c Schema implementations 
had any Council involvement beyond my direct interaction with the DDI 
group. Unfortunately, I was not able to attend the Conference last 
month, so I do not know if any discussions occurred around this subject 
at the meeting.

I suppose we at Harvard could have setup our own host and isolated our
own namespace for the xsd versions I developed here, but I personally
think that would be quite counter productive and quite anti-community at 
large, thus the reason for donating them back to the DDI group and 
dealing with this subject as a group issue.

> The DDI development process suffers from a number of problems, but it does not 
> suffering from having too many versions of the DDI being worked on by too many 
> parties. There are only a total of 7 versions of the DTD and all changes have 
> to go through the DDI Council -- this is not the kind of problem which CVS is 
> good at solving. If you look at major standards organizations such as W3C, 
> IEEE, or JCP, I don't think you'll find any of their standards projects on 
> SourceForge because it's just a vastly different kind of process from software 
> development. 

However, I would suspect that these organizations do maintain some 
versioning control for tracking and documenting changes, archiving and 
managing the working versions of these specifications published through 
these organizations, Version Control does have application beyond 
Software Development. My point is just that Sourceforge really provides 
a low maintenance and no cost service with these capabilities. 
Certainly, if an institution could offer such services and the 
management of such services internally for the group they would be just 
as useful, but why not take advantage of such a service and spend such 
funds elsewhere.

If it were the case that any of these standards organizations provided 
such services for the standards they publish, this would be as well, a 
viable solution.

Just to make another small point, searches for DDI on w3c and xml.org 
result in the following:

we get one fairly negative hit on w3c
http://lists.w3.org/Archives/Public/www-forms/2000May/0004.html

and one broken link on www.xml.org:
http://www.xml.org/xml/registry_searchresults.jsp?industry=62&keyword=&update_date=7200&schema_type=0#641

I'm concerned that without centralization and persistent locations for 
specs, such references on the sites of these Standards Organizations are 
of little value if not negative in promotional nature.

> 
> Nor do I think does inclusion within SourceForge automatically guarantee 
> transparency to the development process. For example, taking the VDC project at 
> SourceForge that you mentioned, the last feature request was in August of 2002, 
> the only closed bug reports occurred in 2001, and the download packages RPMS 
> and Saxon Exchanges contain no README files to explain what they are for. While 
> the CVS source tree for the project does contain much source code, the project 
> itself seems pretty opaque to an outsider like me.
> 

Yes, unfortunately, exposing your source under an OpenSource license
does not necessarily constitute a "community" in any way. At the VDC, we
are still working to establish a community, currently all of our
development is in-house and as you point out, the cvs project site is 
not active beyond its CVS tree (the very core of a sourceforge site). 
Without a extensive inter-institutional developer community yet, we are 
in need of the involvement of other groups which could help to improve 
these aspects of our own project. My personal opinion is that Opensource 
projects need developers throughout a community for it to truly be 
called such. Directors of different groups may come to grand agreements, 
but without a solid developer community base, I'm afraid these are often 
somewhat fragile in nature.

I would like to point out that there is a large difference between 
trying to turn an in house project out to the community and trying to 
focus an already existing community project into a centralized location.

I would look at any number of other sourceforge sites to see that there
is a large range of variability in community involvement. Your right
that just being on Sourceforge doesn't necessarily create community, but 
if that community already exists, are are good and free tools there that 
a community can easily take advantage of.

I can look across my own involvement on Sourceforge and Apache to see 
dramatic variance in community involvement on these sites, simply to 
point out the above is true even for the one individual working on 
several very different projects:

http://jakarta.apache.org/commons/sandbox/math (high)
http://repast.sourceforge.net (high)
http://repast-jellytag.sourceforge.net (low)
http://thedata.sourceforge.net (low)


> In conclusion, it just seems to me that CVS/SourceForge is the right tool for 
> the wrong problem.
> 
> P.S. In the middle of writing this, I came to the realization that what may 
> have prompted your message was that the older versions of the DDI DTD do not 
> appear on the revised DDI site. After asking around, I found out that the older 
> versions are available, only not obviously so. On the Users Information >> 
> DDI/Schema >> page, there is a link "Archival DTDs" that takes you to the older 
> versions of the DTD. I've asked them to make that link more prominent.
> 
> I-Lin Kuo
> Programmer/Analyst, ICPSR

No, I assume that the currently public versions are quite enough to base 
future development upon. Of course, having stable resolvable archival 
versions is integral to retaining any backward compatability with older 
software that may be dependent still upon them.

Thank you I-Lin for your opinion on the subject, you do point out a 
number of important issues related to Specification development vs. 
Software Development. I enjoy the discussion and sharing of opinion. :-)

-Mark Diggory
Software Developer
Harvard MIT Data Center