[DDI-users] A "home" for the DDI

ddi-users@icpsr.umich.edu ddi-users@icpsr.umich.edu
Thu, 17 Jul 2003 11:23:25 -0400


Quoting "Mark R. Diggory" <mdiggory@latte.harvard.edu>:

> 1.) I think its important to separate the "technical implementations" of
> a standard from the "conceptual definition" of the standard itself.

I am in total agreement.

> While it is wise to have committees decide on the "conceptual content" 
> of the specification it is unwise to "forfeit" the various technical
> representations of the specification that can come into existence (w3c
> Schema, RelaxNG, DTD, etc) entirely over to a "conceptual committee's"
> decision making process. 

Yes, and I believe there are two reasons for this:
  1. Generally, a "conceptual committee" does not have technical expertise to 
make good technical specification decisions.
  2. A conceptual committee has entirely different goals from a technical 
committee. The former is concerned with fulfilling the end users needs while 
the latter is concerned with technical correctness and ease of implementation 
of the specification. It's too much burden for one committee to balance both 
types of goals.

It seems to me that the DDI Council is slowly beginning to recognize this with 
the establishment of the steering committee and the expert committee, but it 
remains to be seen how much real political power the new committees will have...

> One can easily write "n" number of w3c schema
> implementations that adhere to the current DDI specification, all would
> be valid, and would validate DDI documents correctly, and all could be 
> of drastically different structure.

I don't find anything inherently bad in this situation, as long as the 
implementations have the same functionality. However, with the technologies you 
mentioned, both Relax NG and XSD offer far more functionality than DTDs. In my 
opinion, DTDs are obsolete and should be phased out, and as a result of not 
doing this, the DDI specification is technically about two years behind where 
it should be.

> Until a "technical committee" establishes that a technical 
> implementation of the standard is the "be all, end all" for that medium, 
> there is going to be much room for debate, discussion and 
> decision-making. I would not begin to suggest that my current w3c Schema 
> implementations meet any ideal "technical criteria" beyond being able to 
> correctly validate the same content as their corresponding DTD. 

While that is a problem, I think it is a far lesser problem than the type 
restriction problem you mention later, and the problem of overall inconsistency 
of design of the specifications. An simple example of the latter is the 
inconsistency in naming reference attributes -- some are named ___ref, while 
others are named with out a "ref" suffix (such as "qstn"). While these cause 
difficulty at the technical implementation level, they are not technical 
problems per se, but are really symptom of a lack of TECHNICAL design 
philosophy/guidelines.

> They certainly fall short in the area of taking full advantage of the 
> capabilities that differentiate xsd from dtd, such as type restriction 
> on attribute values.

Because of type restrction and namespace handling, I've been trying to argue 
for XSDs ever since I started working here a year ago, but there's political 
resistance. One source of the resistance seems to be the belief that type 
restriction reduces the flexibility of the markup. I contend that 
this "flexibility" comes at too high of a cost -- if the types of attributes 
are not restricted, then that attribute cannot be processed by a machine. The 
other source of resistance comes from the heavy investment in the older DTD 
technology. I hope that changes soon.

> 2.) Considering the above situation, versioning is critical to managing
> the technical implementations of the DDI above and beyond its conceptual
> versioning. While the currently generated w3c schemas on the DDI site do
> meet the DDI specification, I can tell you right now, there are a number
> of errors which I have corrected in my own versions of the w3c schema
> that are not reflected in the current versions released on the ICPSR DDI
> site. These were only discovered through discussion and interaction
> between Matthew Richardson, Sanda Ionescu and myself. We have had a few
> discussions concerning where to track these changes and where to house
> these w3c schema copies/versions. Clearly there is a technical and
> development related versioning issue here that is above that of
> conceptual versioning. I would contest that this should be appropriately
> tracked, I my opinion CVS is the best tool to manage this.

This would, I think, be an appropriate use of CVS, because you have multiple 
people working on the same set of documents. All right, so you've convinced me. 

Now, if you were just to set up a private CVS, I don't think there would be any 
objection to it. But as you say, it would be far easier to set this up via 
SourceForge. However, whether this process should be placed on SourceForge with 
an open license is something that would require the blessing of the DDI 
Council. I'd certainly like to see this made an item on the next Council agenda.

> 4.) I required the development of a w3c Schema for many specific reasons 
> to deal with the limitations in the integration of DTD based XML 
> content. OAI's Harvesting Protocol being the largest requirement for xsd 
> based validation and a central location for DDI w3c schema's. It would 
> be a false statement to suggest that these w3c Schema implementations 
> had any Council involvement beyond my direct interaction with the DDI 
> group. Unfortunately, I was not able to attend the Conference last 
> month, so I do not know if any discussions occurred around this subject 
> at the meeting.

I'm not familiar with OAI Harvesting Protocol so I'm not sure how it mandates 
the use of a schema, but your problem seems to be a general problem of ensuring 
that the markup meets certain standards. You've chosen (or OAI mandates that 
you choose) to use XSD to do this.

I have a similar problem in that my application accepts DDI documents to be 
input the database so that a search can be performed on the variable level. 
However, I've found that documents that validate to the DTD do not necessarily 
provide high-enough quality markup for a search to be effective. My solution to 
the problem is that I've started working on is an XSLT quality-checker 
stylesheet to supplement DTD validation.

The advantages of this XSLT approach are:

  - I can restrict attribute type even if the specifications do not.
  - I can check validity, type, and number of ID references. Even XSD cannot do 
this, and there's a lot more I can do via XSLT that XSD will never be able to 
do.
  - While this "XSLT validation" overlaps with DDI specification development, 
it does not actually conflict with it. I will have no problems if and when the 
DDI becomes an XSD. Using your approach, because a single XML document cannot 
validate against two XSDs (unless namespaces are used), you would have to 
abandon your XSD when an official DDI XSD came out which did not adopt all your 
suggested changes.
  - Because I'm not writing a DTD or XSD, I don't need pre-approval from the 
Council, I can go ahead and start working.
  - I'm effectively separating validation into mostly "routine validation" to 
be handled by DTD/XSD, and a little bit of "custom validation" to be handled 
by XSLT.

> I suppose we at Harvard could have setup our own host and isolated our
> own namespace for the xsd versions I developed here, but I personally
> think that would be quite counter productive and quite anti-community at 
> large, thus the reason for donating them back to the DDI group and 
> dealing with this subject as a group issue.

See previous paragraph.

The problem with a community is getting the community to agree that you're 
right ;)

For my XSLT quality-checker stylesheet, I'm designing it in a modular way so 
that each custom validation rule can be turned on or off. Some rules are 
internal to my application, while others are general improvements. For those 
general improvement rules which can be validated by an XSD but not a DTD, I'd 
planning to translate those rules to XSD and then submit them as suggestions to 
the DDI Group. For the general rules which cannot be validated by an XSD, I'd 
release the XSLT itself to the group...
 
> However, I would suspect that these organizations do maintain some 
> versioning control for tracking and documenting changes, archiving and 
> managing the working versions of these specifications published through 
> these organizations, Version Control does have application beyond 
> Software Development. My point is just that Sourceforge really provides 
> a low maintenance and no cost service with these capabilities. 
> Certainly, if an institution could offer such services and the 
> management of such services internally for the group they would be just 
> as useful, but why not take advantage of such a service and spend such 
> funds elsewhere.
> 
> If it were the case that any of these standards organizations provided 
> such services for the standards they publish, this would be as well, a 
> viable solution.

That's an interesting suggestion.

> Just to make another small point, searches for DDI on w3c and xml.org 
> result in the following:
> 
> we get one fairly negative hit on w3c
> http://lists.w3.org/Archives/Public/www-forms/2000May/0004.html

Yes, I too think the DDI is bloated because it's trying to be too many things 
to too many people. 271 tags....

> I'm concerned that without centralization and persistent locations for 
> specs, such references on the sites of these Standards Organizations are 
> of little value if not negative in promotional nature.

I'm concerned too, but I don't think these are problems that SourceForge 
solves. In my opinion, these are process problems which manifest as technical 
problems and bloat problems. SourceForge can track all the changes, but it 
cannot correct the fact that there is no overall design philosophy to prevent 
ad hoc changes and feature bloat.

> Yes, unfortunately, exposing your source under an OpenSource license
> does not necessarily constitute a "community" in any way. At the VDC, we
> are still working to establish a community, currently all of our
> development is in-house and as you point out, the cvs project site is 
> not active beyond its CVS tree (the very core of a sourceforge site). 
> Without a extensive inter-institutional developer community yet, we are 
> in need of the involvement of other groups which could help to improve 
> these aspects of our own project. My personal opinion is that Opensource 
> projects need developers throughout a community for it to truly be 
> called such. Directors of different groups may come to grand agreements, 
> but without a solid developer community base, I'm afraid these are often 
> somewhat fragile in nature.

I would love to have a developer community associated with the DDI project. 
However, I think a serious obstacle to that in the orientation of the DDI 
Group. From my point of view, a lot of the changes have been made without 
taking into account of how difficult the changes would be to implement. It 
seems to me that to the DDI Group, "difficult to implement" means "difficult to 
mark up a study", whereas to me, a developer, it should also mean "difficult to 
get a machine to process the resultant markup".

The newly added specification for aggregate data in the 2.0 DTD is an example 
of this. While the conceptual cube model is sound, the actual technical 
specification is not (in my opinion). The markup is difficult to produce, and 
even more difficult to process. It seems that in its review process, the focus 
was on whether this specification included all the desired attributes while 
neglecting the question of "can this specification be altered so that it is 
easy to implement?" There is currently a lack of a technical review process 
which I hope will be addressed by the two new committees.

In any case, without an orientation change to align the needs of the group with 
the needs of a developer community, I doubt that the latter will ever come to 
fruition.

> I would like to point out that there is a large difference between 
> trying to turn an in house project out to the community and trying to 
> focus an already existing community project into a centralized location.
> 
> I would look at any number of other sourceforge sites to see that there
> is a large range of variability in community involvement. Your right
> that just being on Sourceforge doesn't necessarily create community, but 
> if that community already exists, are are good and free tools there that 
> a community can easily take advantage of.
> 
> I can look across my own involvement on Sourceforge and Apache to see 
> dramatic variance in community involvement on these sites, simply to 
> point out the above is true even for the one individual working on 
> several very different projects:
> 
> http://jakarta.apache.org/commons/sandbox/math (high)
> http://repast.sourceforge.net (high)
> http://repast-jellytag.sourceforge.net (low)
> http://thedata.sourceforge.net (low)

Building a community isn't easy. I run the Ann Arbor Java Users Group and it 
takes a lot to keep up the participation level....

>...
> Thank you I-Lin for your opinion on the subject, you do point out a 
> number of important issues related to Specification development vs. 
> Software Development. I enjoy the discussion and sharing of opinion. :-)

Thank you as well, Mark. Your message has started a vigorous discussion here 
within the ICPSR, and it's quite possible that you'll very soon see some of the 
results ...