[DDI-users] A "home" for the DDI

Matthew Richardson ddi-users@icpsr.umich.edu
Thu, 17 Jul 2003 09:45:10 -0400


I think one good thing to keep in mind is that a lot of the problems Mark 
and I-Lin are discussing really stem from the fact that the DDI really 
hasn't made the leap from DTD to XML Schema yet. Currently the DTD *is* the 
specification, and the Schema version are treated something as an 
afterthought. As such, version control of the schema just isn't being given 
much attention.

Not that it really solves this discussion, but I do think the DDI needs to 
begin serious consideration to moving to a schema-centered approach.

--On Wednesday, July 16, 2003 06:48 PM -0400 "Mark R. Diggory" 
<mdiggory@latte.harvard.edu> wrote:

> I-Lin,
>
> Thank you for your reply. I'll intermix my comments within the body of
> your response.
>
> ikuo@icpsr.umich.edu wrote:
>> Hi Mark,
>>
>> The following is my own opinion and does not necessarily reflect the
>> opinion of  my employer.
>>
>> While I agree with the overall goal of greater transparency in the DDI
>> development process, I question whether Sourceforge and/or CVS are the
>> means to  achieve it.
>>
>> CVS is a version control system. It solves the problem of managing and
>> tracking  changes when  multiple parties are working on a single
>> project. That is a  common problem of large software development
>> projects, and that is a problem  that SourceForge has tackled
>> successfully.
>>
>> SourceForge & CVS are tools that have been used to successfully manage
>> software  development. However, to say that these tools may therefore be
>> used to  successfuly manage DDI development is to make an erroneous
>> analogy. The DDI is  a standard, not software, and standards development
>> has different kinds of  problems from software development.
>>
>
> Noting that I developed the w3c Schema translation of DDI DTD for the
> current versions now published on the ICPSR site, I feel from my
> technical experience with this subject, that I can strongly disagree
> with you above conclusion on a number of points:
>
> 1.) I think its important to separate the "technical implementations" of
> a standard from the "conceptual definition" of the standard itself.
>
> While it is wise to have committees decide on the "conceptual content" of
> the specification it is unwise to "forfeit" the various technical
> representations of the specification that can come into existence (w3c
> Schema, RelaxNG, DTD, etc) entirely over to a "conceptual committee's"
> decision making process. One can easily write "n" number of w3c schema
> implementations that adhere to the current DDI specification, all would
> be valid, and would validate DDI documents correctly, and all could be of
> drastically different structure.
>
> Until a "technical committee" establishes that a technical implementation
> of the standard is the "be all, end all" for that medium, there is going
> to be much room for debate, discussion and decision-making. I would not
> begin to suggest that my current w3c Schema implementations meet any
> ideal "technical criteria" beyond being able to correctly validate the
> same content as their corresponding DTD. They certainly fall short in the
> area of taking full advantage of the capabilities that differentiate xsd
> from dtd, such as type restriction on attribute values.
>
> 2.) Considering the above situation, versioning is critical to managing
> the technical implementations of the DDI above and beyond its conceptual
> versioning. While the currently generated w3c schemas on the DDI site do
> meet the DDI specification, I can tell you right now, there are a number
> of errors which I have corrected in my own versions of the w3c schema
> that are not reflected in the current versions released on the ICPSR DDI
> site. These were only discovered through discussion and interaction
> between Matthew Richardson, Sanda Ionescu and myself. We have had a few
> discussions concerning where to track these changes and where to house
> these w3c schema copies/versions. Clearly there is a technical and
> development related versioning issue here that is above that of
> conceptual versioning. I would contest that this should be appropriately
> tracked, I my opinion CVS is the best tool to manage this.
>
> 3.) As someone who has worked with CVS extensively, I cannot say enough
> about the capability of being able to track all changes, both in
> committee decision making and in technical revisioning. I think the large
> amount of verbose commenting about changes between versions at the head
> of the DTD's could be managed from within such a system, logged and
> published on a site appropriate for such content, instead of bloating the
> DTD files over time with such an excess of documentation. Both Matthew
> and I have worked on documentation free versions of these files for
> exchange over the internet. Certainly downloading these documentation
> comments over and over constitutes little need in the real world where
> these spec's are used strictly for machine validation. In fact, such
> bloat only will consume bandwidth on the hosting institutions
> gateway/network.
>
> 4.) I required the development of a w3c Schema for many specific reasons
> to deal with the limitations in the integration of DTD based XML content.
> OAI's Harvesting Protocol being the largest requirement for xsd based
> validation and a central location for DDI w3c schema's. It would be a
> false statement to suggest that these w3c Schema implementations had any
> Council involvement beyond my direct interaction with the DDI group.
> Unfortunately, I was not able to attend the Conference last month, so I
> do not know if any discussions occurred around this subject at the
> meeting.
>
> I suppose we at Harvard could have setup our own host and isolated our
> own namespace for the xsd versions I developed here, but I personally
> think that would be quite counter productive and quite anti-community at
> large, thus the reason for donating them back to the DDI group and
> dealing with this subject as a group issue.
>
>> The DDI development process suffers from a number of problems, but it
>> does not  suffering from having too many versions of the DDI being
>> worked on by too many  parties. There are only a total of 7 versions of
>> the DTD and all changes have  to go through the DDI Council -- this is
>> not the kind of problem which CVS is  good at solving. If you look at
>> major standards organizations such as W3C,  IEEE, or JCP, I don't think
>> you'll find any of their standards projects on  SourceForge because it's
>> just a vastly different kind of process from software  development.
>
> However, I would suspect that these organizations do maintain some
> versioning control for tracking and documenting changes, archiving and
> managing the working versions of these specifications published through
> these organizations, Version Control does have application beyond
> Software Development. My point is just that Sourceforge really provides a
> low maintenance and no cost service with these capabilities. Certainly,
> if an institution could offer such services and the management of such
> services internally for the group they would be just as useful, but why
> not take advantage of such a service and spend such funds elsewhere.
>
> If it were the case that any of these standards organizations provided
> such services for the standards they publish, this would be as well, a
> viable solution.
>
> Just to make another small point, searches for DDI on w3c and xml.org
> result in the following:
>
> we get one fairly negative hit on w3c
> http://lists.w3.org/Archives/Public/www-forms/2000May/0004.html
>
> and one broken link on www.xml.org:
> http://www.xml.org/xml/registry_searchresults.jsp?industry=62&keyword=&up
> date_date=7200&schema_type=0#641
>
> I'm concerned that without centralization and persistent locations for
> specs, such references on the sites of these Standards Organizations are
> of little value if not negative in promotional nature.
>
>>
>> Nor do I think does inclusion within SourceForge automatically guarantee
>> transparency to the development process. For example, taking the VDC
>> project at  SourceForge that you mentioned, the last feature request was
>> in August of 2002,  the only closed bug reports occurred in 2001, and
>> the download packages RPMS  and Saxon Exchanges contain no README files
>> to explain what they are for. While  the CVS source tree for the project
>> does contain much source code, the project  itself seems pretty opaque
>> to an outsider like me.
>>
>
> Yes, unfortunately, exposing your source under an OpenSource license
> does not necessarily constitute a "community" in any way. At the VDC, we
> are still working to establish a community, currently all of our
> development is in-house and as you point out, the cvs project site is not
> active beyond its CVS tree (the very core of a sourceforge site). Without
> a extensive inter-institutional developer community yet, we are in need
> of the involvement of other groups which could help to improve these
> aspects of our own project. My personal opinion is that Opensource
> projects need developers throughout a community for it to truly be called
> such. Directors of different groups may come to grand agreements, but
> without a solid developer community base, I'm afraid these are often
> somewhat fragile in nature.
>
> I would like to point out that there is a large difference between trying
> to turn an in house project out to the community and trying to focus an
> already existing community project into a centralized location.
>
> I would look at any number of other sourceforge sites to see that there
> is a large range of variability in community involvement. Your right
> that just being on Sourceforge doesn't necessarily create community, but
> if that community already exists, are are good and free tools there that
> a community can easily take advantage of.
>
> I can look across my own involvement on Sourceforge and Apache to see
> dramatic variance in community involvement on these sites, simply to
> point out the above is true even for the one individual working on
> several very different projects:
>
> http://jakarta.apache.org/commons/sandbox/math (high)
> http://repast.sourceforge.net (high)
> http://repast-jellytag.sourceforge.net (low)
> http://thedata.sourceforge.net (low)
>
>
>> In conclusion, it just seems to me that CVS/SourceForge is the right
>> tool for  the wrong problem.
>>
>> P.S. In the middle of writing this, I came to the realization that what
>> may  have prompted your message was that the older versions of the DDI
>> DTD do not  appear on the revised DDI site. After asking around, I found
>> out that the older  versions are available, only not obviously so. On
>> the Users Information >>  DDI/Schema >> page, there is a link "Archival
>> DTDs" that takes you to the older  versions of the DTD. I've asked them
>> to make that link more prominent.
>>
>> I-Lin Kuo
>> Programmer/Analyst, ICPSR
>
> No, I assume that the currently public versions are quite enough to base
> future development upon. Of course, having stable resolvable archival
> versions is integral to retaining any backward compatability with older
> software that may be dependent still upon them.
>
> Thank you I-Lin for your opinion on the subject, you do point out a
> number of important issues related to Specification development vs.
> Software Development. I enjoy the discussion and sharing of opinion. :-)
>
> -Mark Diggory
> Software Developer
> Harvard MIT Data Center
>
> _______________________________________________
> DDI-users mailing list
> DDI-users@icpsr.umich.edu
> http://www.icpsr.umich.edu/mailman/listinfo/ddi-users



Matthew A. Richardson
Inter-university Consortium for Political and Social Research
Phone: 734.615.7901
Email: matvey@umich.edu
"Everything tires with time, and starts to seek some opposition,
to save it from itself." --Clive Barker, The Hellbound Heart