[DDI-users] A "home" for the DDI
ddi-users@icpsr.umich.edu
ddi-users@icpsr.umich.edu
Thu, 17 Jul 2003 11:23:25 -0400
Quoting "Mark R. Diggory" <mdiggory@latte.harvard.edu>:
> 1.) I think its important to separate the "technical implementations" of
> a standard from the "conceptual definition" of the standard itself.
I am in total agreement.
> While it is wise to have committees decide on the "conceptual content"
> of the specification it is unwise to "forfeit" the various technical
> representations of the specification that can come into existence (w3c
> Schema, RelaxNG, DTD, etc) entirely over to a "conceptual committee's"
> decision making process.
Yes, and I believe there are two reasons for this:
1. Generally, a "conceptual committee" does not have technical expertise to
make good technical specification decisions.
2. A conceptual committee has entirely different goals from a technical
committee. The former is concerned with fulfilling the end users needs while
the latter is concerned with technical correctness and ease of implementation
of the specification. It's too much burden for one committee to balance both
types of goals.
It seems to me that the DDI Council is slowly beginning to recognize this with
the establishment of the steering committee and the expert committee, but it
remains to be seen how much real political power the new committees will have...
> One can easily write "n" number of w3c schema
> implementations that adhere to the current DDI specification, all would
> be valid, and would validate DDI documents correctly, and all could be
> of drastically different structure.
I don't find anything inherently bad in this situation, as long as the
implementations have the same functionality. However, with the technologies you
mentioned, both Relax NG and XSD offer far more functionality than DTDs. In my
opinion, DTDs are obsolete and should be phased out, and as a result of not
doing this, the DDI specification is technically about two years behind where
it should be.
> Until a "technical committee" establishes that a technical
> implementation of the standard is the "be all, end all" for that medium,
> there is going to be much room for debate, discussion and
> decision-making. I would not begin to suggest that my current w3c Schema
> implementations meet any ideal "technical criteria" beyond being able to
> correctly validate the same content as their corresponding DTD.
While that is a problem, I think it is a far lesser problem than the type
restriction problem you mention later, and the problem of overall inconsistency
of design of the specifications. An simple example of the latter is the
inconsistency in naming reference attributes -- some are named ___ref, while
others are named with out a "ref" suffix (such as "qstn"). While these cause
difficulty at the technical implementation level, they are not technical
problems per se, but are really symptom of a lack of TECHNICAL design
philosophy/guidelines.
> They certainly fall short in the area of taking full advantage of the
> capabilities that differentiate xsd from dtd, such as type restriction
> on attribute values.
Because of type restrction and namespace handling, I've been trying to argue
for XSDs ever since I started working here a year ago, but there's political
resistance. One source of the resistance seems to be the belief that type
restriction reduces the flexibility of the markup. I contend that
this "flexibility" comes at too high of a cost -- if the types of attributes
are not restricted, then that attribute cannot be processed by a machine. The
other source of resistance comes from the heavy investment in the older DTD
technology. I hope that changes soon.
> 2.) Considering the above situation, versioning is critical to managing
> the technical implementations of the DDI above and beyond its conceptual
> versioning. While the currently generated w3c schemas on the DDI site do
> meet the DDI specification, I can tell you right now, there are a number
> of errors which I have corrected in my own versions of the w3c schema
> that are not reflected in the current versions released on the ICPSR DDI
> site. These were only discovered through discussion and interaction
> between Matthew Richardson, Sanda Ionescu and myself. We have had a few
> discussions concerning where to track these changes and where to house
> these w3c schema copies/versions. Clearly there is a technical and
> development related versioning issue here that is above that of
> conceptual versioning. I would contest that this should be appropriately
> tracked, I my opinion CVS is the best tool to manage this.
This would, I think, be an appropriate use of CVS, because you have multiple
people working on the same set of documents. All right, so you've convinced me.
Now, if you were just to set up a private CVS, I don't think there would be any
objection to it. But as you say, it would be far easier to set this up via
SourceForge. However, whether this process should be placed on SourceForge with
an open license is something that would require the blessing of the DDI
Council. I'd certainly like to see this made an item on the next Council agenda.
> 4.) I required the development of a w3c Schema for many specific reasons
> to deal with the limitations in the integration of DTD based XML
> content. OAI's Harvesting Protocol being the largest requirement for xsd
> based validation and a central location for DDI w3c schema's. It would
> be a false statement to suggest that these w3c Schema implementations
> had any Council involvement beyond my direct interaction with the DDI
> group. Unfortunately, I was not able to attend the Conference last
> month, so I do not know if any discussions occurred around this subject
> at the meeting.
I'm not familiar with OAI Harvesting Protocol so I'm not sure how it mandates
the use of a schema, but your problem seems to be a general problem of ensuring
that the markup meets certain standards. You've chosen (or OAI mandates that
you choose) to use XSD to do this.
I have a similar problem in that my application accepts DDI documents to be
input the database so that a search can be performed on the variable level.
However, I've found that documents that validate to the DTD do not necessarily
provide high-enough quality markup for a search to be effective. My solution to
the problem is that I've started working on is an XSLT quality-checker
stylesheet to supplement DTD validation.
The advantages of this XSLT approach are:
- I can restrict attribute type even if the specifications do not.
- I can check validity, type, and number of ID references. Even XSD cannot do
this, and there's a lot more I can do via XSLT that XSD will never be able to
do.
- While this "XSLT validation" overlaps with DDI specification development,
it does not actually conflict with it. I will have no problems if and when the
DDI becomes an XSD. Using your approach, because a single XML document cannot
validate against two XSDs (unless namespaces are used), you would have to
abandon your XSD when an official DDI XSD came out which did not adopt all your
suggested changes.
- Because I'm not writing a DTD or XSD, I don't need pre-approval from the
Council, I can go ahead and start working.
- I'm effectively separating validation into mostly "routine validation" to
be handled by DTD/XSD, and a little bit of "custom validation" to be handled
by XSLT.
> I suppose we at Harvard could have setup our own host and isolated our
> own namespace for the xsd versions I developed here, but I personally
> think that would be quite counter productive and quite anti-community at
> large, thus the reason for donating them back to the DDI group and
> dealing with this subject as a group issue.
See previous paragraph.
The problem with a community is getting the community to agree that you're
right ;)
For my XSLT quality-checker stylesheet, I'm designing it in a modular way so
that each custom validation rule can be turned on or off. Some rules are
internal to my application, while others are general improvements. For those
general improvement rules which can be validated by an XSD but not a DTD, I'd
planning to translate those rules to XSD and then submit them as suggestions to
the DDI Group. For the general rules which cannot be validated by an XSD, I'd
release the XSLT itself to the group...
> However, I would suspect that these organizations do maintain some
> versioning control for tracking and documenting changes, archiving and
> managing the working versions of these specifications published through
> these organizations, Version Control does have application beyond
> Software Development. My point is just that Sourceforge really provides
> a low maintenance and no cost service with these capabilities.
> Certainly, if an institution could offer such services and the
> management of such services internally for the group they would be just
> as useful, but why not take advantage of such a service and spend such
> funds elsewhere.
>
> If it were the case that any of these standards organizations provided
> such services for the standards they publish, this would be as well, a
> viable solution.
That's an interesting suggestion.
> Just to make another small point, searches for DDI on w3c and xml.org
> result in the following:
>
> we get one fairly negative hit on w3c
> http://lists.w3.org/Archives/Public/www-forms/2000May/0004.html
Yes, I too think the DDI is bloated because it's trying to be too many things
to too many people. 271 tags....
> I'm concerned that without centralization and persistent locations for
> specs, such references on the sites of these Standards Organizations are
> of little value if not negative in promotional nature.
I'm concerned too, but I don't think these are problems that SourceForge
solves. In my opinion, these are process problems which manifest as technical
problems and bloat problems. SourceForge can track all the changes, but it
cannot correct the fact that there is no overall design philosophy to prevent
ad hoc changes and feature bloat.
> Yes, unfortunately, exposing your source under an OpenSource license
> does not necessarily constitute a "community" in any way. At the VDC, we
> are still working to establish a community, currently all of our
> development is in-house and as you point out, the cvs project site is
> not active beyond its CVS tree (the very core of a sourceforge site).
> Without a extensive inter-institutional developer community yet, we are
> in need of the involvement of other groups which could help to improve
> these aspects of our own project. My personal opinion is that Opensource
> projects need developers throughout a community for it to truly be
> called such. Directors of different groups may come to grand agreements,
> but without a solid developer community base, I'm afraid these are often
> somewhat fragile in nature.
I would love to have a developer community associated with the DDI project.
However, I think a serious obstacle to that in the orientation of the DDI
Group. From my point of view, a lot of the changes have been made without
taking into account of how difficult the changes would be to implement. It
seems to me that to the DDI Group, "difficult to implement" means "difficult to
mark up a study", whereas to me, a developer, it should also mean "difficult to
get a machine to process the resultant markup".
The newly added specification for aggregate data in the 2.0 DTD is an example
of this. While the conceptual cube model is sound, the actual technical
specification is not (in my opinion). The markup is difficult to produce, and
even more difficult to process. It seems that in its review process, the focus
was on whether this specification included all the desired attributes while
neglecting the question of "can this specification be altered so that it is
easy to implement?" There is currently a lack of a technical review process
which I hope will be addressed by the two new committees.
In any case, without an orientation change to align the needs of the group with
the needs of a developer community, I doubt that the latter will ever come to
fruition.
> I would like to point out that there is a large difference between
> trying to turn an in house project out to the community and trying to
> focus an already existing community project into a centralized location.
>
> I would look at any number of other sourceforge sites to see that there
> is a large range of variability in community involvement. Your right
> that just being on Sourceforge doesn't necessarily create community, but
> if that community already exists, are are good and free tools there that
> a community can easily take advantage of.
>
> I can look across my own involvement on Sourceforge and Apache to see
> dramatic variance in community involvement on these sites, simply to
> point out the above is true even for the one individual working on
> several very different projects:
>
> http://jakarta.apache.org/commons/sandbox/math (high)
> http://repast.sourceforge.net (high)
> http://repast-jellytag.sourceforge.net (low)
> http://thedata.sourceforge.net (low)
Building a community isn't easy. I run the Ann Arbor Java Users Group and it
takes a lot to keep up the participation level....
>...
> Thank you I-Lin for your opinion on the subject, you do point out a
> number of important issues related to Specification development vs.
> Software Development. I enjoy the discussion and sharing of opinion. :-)
Thank you as well, Mark. Your message has started a vigorous discussion here
within the ICPSR, and it's quite possible that you'll very soon see some of the
results ...