From ddi-users@icpsr.umich.edu Fri Nov 1 17:25:17 2002 From: ddi-users@icpsr.umich.edu (Karen Harker) Date: Fri, 01 Nov 2002 11:25:17 -0600 Subject: [DDI-users] Usage of DDI for biomedical research Message-ID: This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=_3B67C915.14758E69 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Thank you all for your input. I've been reading and saving these emails to digest them later. It may be a while before we can actually start planning such a project, but I hope to bring this infofrmation with me when we do. Thanks, again. Karen R. Harker, MLS UT Southwestern Medical Library 5323 Harry Hines Blvd. Dallas, TX 75390-9049 214-648-1698 http://www.swmed.edu/library/ >>> ikuo@icpsr.umich.edu 10/31/02 8:57:09 AM >>> The DDI is the closest I've been able to find, but I am not sure of its viability to describe the kind of research being conducted here. For instance, I imagine there will be a need to include the species of subjects (i.e. humans, drosophila, mouse, etc.), as well as some extension for microbiological information (i.e. use of microarrays, types of proteins, etc.). The species of the subject for the entire study can be placed in the tag under drosophila or, as someone suggested, in the tag under or I would probably use under Is anybody aware of an application of DDI to biological or biomedical research? Thank you very much for your assistance. Before you apply the DDI to bio research, you should be aware of a few of its shortcomings: A) DDI is heavy. There are a lot more tags and attributes in DDI than you are probably ever going to use. B) Despite attempts at making the language neutral, the DDI tag names and attribute names and tag structure still have a bias towards social science survey data. The unfamiliar terminology will make it uncomfortable for bio researchers to work with. C) The current implementation of the DDI is not extensible. You cannot add your own tags and attributes to the DDI to adapt it for biological or biomedical research. What are the advantages of using DDI? 1) DDI is already designed. You don't have to come up with your own tagging system 2) DDI is a standard 3) There are already a number of (free) tools available to work with DDI, such as www.nesstar.org NESSTAR PUBLISHER and NESSTAR EXPLORER. It is likely that more tools will be developed as DDI becomes more widely adopted. If I were doing this project, I'd construct a lean XML schema using bio-science vocabulary but with a structure based on DDI. Then, if I wanted to use DDI tools like NESSTAR's, I would write an XSLT style sheet used to convert the bio-science lean XML schema into DDI. In order make the writing of the XSLT style sheet straightforward, in the design of the bio-science XML I would keep the structure of the DDI but just excise all the unnecessary tags and attributes, change some tag and attribute names to something more appropriate to bio-science, and add just a limited number of extension tags. Karen R. Harker, MLS UT Southwestern Medical Library 5323 Harry Hines Blvd. Dallas, TX 75390-9049 214-648-1698 http://www.swmed.edu/library/ --=_3B67C915.14758E69 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Content-Description: HTML
Thank you all for your input. I've been reading and saving these emails to digest them later.  It may be a while before we can actually start planning such a project, but I hope to bring this infofrmation with me when we do.
 
Thanks, again.
 
 
Karen R. Harker, MLS
UT Southwestern Medical Library
5323 Harry Hines Blvd.
Dallas, TX  75390-9049
214-648-1698
http://www.swmed.edu/library/

>>> ikuo@icpsr.umich.edu 10/31/02 8:57:09 AM >>>

The DDI is the closest I've been able to find,  but I am not sure of its viability to describe the kind of research being conducted here. For instance, I imagine there will be a need to include the species of subjects (i.e. humans, drosophila, mouse, etc.), as well as some extension for microbiological information (i.e. use of microarrays, types of proteins, etc.). 

The species of the subject for the entire study can be placed in the <subject> tag under <stdyinfo>

<stdyinfo>
        <subject>drosophila</subject>
</stdyinfo>

or, as someone suggested, in the <universe> tag under <var> or <varGrp>

I would probably use <universe> under <varGrp>

Is anybody aware of an application of DDI to biological or biomedical research?
 
Thank you very much for your assistance.

Before you apply the DDI to bio research, you should be aware of a few of its shortcomings:
        A) DDI is heavy. There are a lot more tags and attributes in DDI than you are probably ever going to use.
        B) Despite attempts at making the language neutral, the DDI tag names and attribute names and tag structure still have a bias towards social science survey data. The unfamiliar terminology will make it uncomfortable for bio researchers to work with.
        C) The current implementation of the DDI is not extensible. You cannot add your own tags and attributes to the DDI to adapt it for biological or biomedical research.

What are the advantages of using DDI?
        1) DDI is already designed. You don't have to come up with your own tagging system
        2) DDI is a standard
        3) There are already a number of (free) tools available to work with DDI, such as www.nesstar.org NESSTAR PUBLISHER and NESSTAR EXPLORER. It is likely that more tools will be developed as DDI becomes more widely adopted.

If I were doing this project, I'd construct a lean XML schema using bio-science vocabulary but with a structure based on DDI. Then, if I wanted to use DDI tools like NESSTAR's,  I would write an XSLT style sheet used to convert the bio-science lean XML schema into DDI. In order make the writing of the XSLT style sheet straightforward, in the design of the bio-science XML I would keep the structure of the DDI but just excise all the unnecessary tags and attributes, change some tag and attribute names to something more appropriate to bio-science, and add just a limited number of extension tags.

Karen R. Harker, MLS
UT Southwestern Medical Library
5323 Harry Hines Blvd.
Dallas, TX  75390-9049
214-648-1698
http://www.swmed.edu/library/

--=_3B67C915.14758E69-- From ddi-users@icpsr.umich.edu Mon Nov 4 14:20:16 2002 From: ddi-users@icpsr.umich.edu (Andrew Dzhigo) Date: Mon, 04 Nov 2002 09:20:16 -0500 Subject: [DDI-users] Storing XML Message-ID: <1578a81534b9.1534b91578a8@Princeton.EDU> Hi, all I am an applications programmer for the Cultural Policy and the Arts National Data Archive (CPANDA). Here at CPANDA we use DDI DTD to create XML codebooks for the datasets we archive. With the number of datasets growing we are now facing the problem of storing and managing our XML files. At this stage of development we are investigating the possibilities of using some sort of XML-aware database management system. Currently, I am considering Xindice system (formerly dbXML) developed by Apache Software Foundation as a possible tool for our needs. However, I would very much like to know how other developers address the similar problem. What tools, database management systems and languages do they use? Any help would be greatly appreciated. With regards, Andrew Dzhigo Applications Programmer Cultural Policy and the Arts National Data Archive (CPANDA), Princeton University adzhigo@princeton.edu 1-609-258-7561 From ddi-users@icpsr.umich.edu Mon Nov 4 16:44:51 2002 From: ddi-users@icpsr.umich.edu (Mark R. Diggory) Date: Mon, 04 Nov 2002 11:44:51 -0500 Subject: [DDI-users] Storing XML References: <1578a81534b9.1534b91578a8@Princeton.EDU> Message-ID: <3DC6A403.40306@latte.harvard.edu> Hi Andrew, This is a shameless plug for a system we have been developing. We are working on an Opensource Digital Library System that utilizes the DDI are its primary storage XML format. This system is primarily for Social Science Reseach data such as that published at the ICPSR. It provides Indexing/Searching capabilities and server side data analysis tools for subsetting and manipulating the datasets. We are preparing for a software release towards the end of this year. Feel free to find out more about this project at the following sites: http://www.hmdc.harvard.edu http://www.thedata.org http://thedata.sourceforge.net We have a couple production systems running at Harvard. This is an older version of the "Virtual Data Center". But will be upgraded to our release version once it is completed. http://vdc-prod.hmdc.harvard.edu -Mark Diggory Project Manager / Software Engineer Harvard MIT Data Center http://www.hmdc.harvard.edu p.s. We currently archive the DDI's in a custom repository backed by Postgresql. I have been researching eXist http://exist.sourceforge.net or Xindice as a possible future Repository Implementations and have some rudimentary Tests/Implementations in the works. Andrew Dzhigo wrote: >Hi, all > >I am an applications programmer for the Cultural Policy and the Arts National Data Archive (CPANDA). Here at CPANDA we use DDI DTD to create XML codebooks for the datasets we archive. > >With the number of datasets growing we are now facing the problem of storing and managing our XML files. At this stage of development we are investigating the possibilities of using some sort of XML-aware database management system. Currently, I am considering Xindice system (formerly dbXML) developed by Apache Software Foundation as a possible tool for our needs. However, I would very much like to know how other developers address the similar problem. What tools, database management systems and languages do they use? > >Any help would be greatly appreciated. >With regards, >Andrew Dzhigo >Applications Programmer >Cultural Policy and the Arts >National Data Archive (CPANDA), >Princeton University >adzhigo@princeton.edu >1-609-258-7561 > >_______________________________________________ >DDI-users mailing list >DDI-users@icpsr.umich.edu >http://www.icpsr.umich.edu/mailman/listinfo/ddi-users > > From ddi-users@icpsr.umich.edu Mon Nov 4 18:17:31 2002 From: ddi-users@icpsr.umich.edu (I-Lin Kuo) Date: Mon, 04 Nov 2002 13:17:31 -0500 Subject: [DDI-users] Storing XML In-Reply-To: <1578a81534b9.1534b91578a8@Princeton.EDU> Message-ID: <5.1.0.14.0.20021104130646.00a13ec0@icpsr.umich.edu> Hi Andrew, Here at the Inter-University Consortium for Political and Social Research (ICPSR), we're converting all our existing codebooks into DDI format and developing a database to enable us to search for variables across all the different survey codebook DDIs. We are using Oracle 9i. The choice of Oracle 9i is due to the fact that we have a site license, but nonetheless, it has pretty good XML capabilities (Oracle 8i and SQL Server are far behind Oracle 9i in this respect, and would not have been suitable for our needs), including: - limited XPath capability - storing XML fragments in an Oracle native XML format (XMLType) - full-text indexing within a field of XMLType - output directly into XML or XSLT via Oracle's XSQL - ability to store XML directly using object technology without having to map tags and attributes to database fields I haven't looked at Xindice or Tamino, because 9i was good enough for us. At 09:20 AM 11/4/02 -0500, you wrote: >Hi, all > >I am an applications programmer for the Cultural Policy and the Arts >National Data Archive (CPANDA). Here at CPANDA we use DDI DTD to create >XML codebooks for the datasets we archive. > >With the number of datasets growing we are now facing the problem of >storing and managing our XML files. At this stage of development we are >investigating the possibilities of using some sort of XML-aware database >management system. Currently, I am considering Xindice system (formerly >dbXML) developed by Apache Software Foundation as a possible tool for our >needs. However, I would very much like to know how other developers >address the similar problem. What tools, database management systems and >languages do they use? > >Any help would be greatly appreciated. >With regards, >Andrew Dzhigo >Applications Programmer >Cultural Policy and the Arts >National Data Archive (CPANDA), >Princeton University >adzhigo@princeton.edu >1-609-258-7561 > >_______________________________________________ >DDI-users mailing list >DDI-users@icpsr.umich.edu >http://www.icpsr.umich.edu/mailman/listinfo/ddi-users From ddi-users@icpsr.umich.edu Mon Nov 4 21:38:13 2002 From: ddi-users@icpsr.umich.edu (Andrew Dzhigo) Date: Mon, 04 Nov 2002 16:38:13 -0500 Subject: [DDI-users] Storing XML Message-ID: <1a9a741ab450.1ab4501a9a74@Princeton.EDU> Thank you for your reply. I will definitely take a look at the Virtual Data Center web site. Since you have already done some research on eXist and Xindice, could you, please, share with me your opinion of these two systems? With regards, Andrew Dzhigo Applications Programmer Cultural Policy and the Arts National Data Archive (CPANDA), Princeton University adzhigo@princeton.edu 1-609-258-7561 ----- Original Message ----- From: "Mark R. Diggory" Date: Monday, November 4, 2002 11:44 am Subject: Re: [DDI-users] Storing XML > Hi Andrew, > > This is a shameless plug for a system we have been developing. > > We are working on an Opensource Digital Library System that > utilizes the > DDI are its primary storage XML format. This system is primarily > for > Social Science Reseach data such as that published at the ICPSR. > It > provides Indexing/Searching capabilities and server side data > analysis > tools for subsetting and manipulating the datasets. We are > preparing for > a software release towards the end of this year. > > Feel free to find out more about this project at the following sites: > > http://www.hmdc.harvard.edu > http://www.thedata.org > http://thedata.sourceforge.net > > We have a couple production systems running at Harvard. This is an > older > version of the "Virtual Data Center". But will be upgraded to our > release version once it is completed. > http://vdc-prod.hmdc.harvard.edu > > -Mark Diggory > Project Manager / Software Engineer > Harvard MIT Data Center > http://www.hmdc.harvard.edu > > p.s. We currently archive the DDI's in a custom repository backed > by > Postgresql. I have been researching eXist > http://exist.sourceforge.net > or Xindice as a possible future Repository Implementations and > have some > rudimentary Tests/Implementations in the works. > > > Andrew Dzhigo wrote: > > >Hi, all > > > >I am an applications programmer for the Cultural Policy and the > Arts National Data Archive (CPANDA). Here at CPANDA we use DDI DTD > to create XML codebooks for the datasets we archive. > > > >With the number of datasets growing we are now facing the problem > of storing and managing our XML files. At this stage of > development we are investigating the possibilities of using some > sort of XML-aware database management system. Currently, I am > considering Xindice system (formerly dbXML) developed by Apache > Software Foundation as a possible tool for our needs. However, I > would very much like to know how other developers address the > similar problem. What tools, database management systems and > languages do they use? > > > >Any help would be greatly appreciated. > >With regards, > >Andrew Dzhigo > >Applications Programmer > >Cultural Policy and the Arts > >National Data Archive (CPANDA), > >Princeton University > >adzhigo@princeton.edu > >1-609-258-7561 > > > >_______________________________________________ > >DDI-users mailing list > >DDI-users@icpsr.umich.edu > >http://www.icpsr.umich.edu/mailman/listinfo/ddi-users > > > > > > > > > _______________________________________________ > DDI-users mailing list > DDI-users@icpsr.umich.edu > http://www.icpsr.umich.edu/mailman/listinfo/ddi-users > From ddi-users@icpsr.umich.edu Tue Nov 5 20:26:46 2002 From: ddi-users@icpsr.umich.edu (Adrian Dusa) Date: Tue, 5 Nov 2002 22:26:46 +0200 Subject: [DDI-users] Storing XML In-Reply-To: <1578a81534b9.1534b91578a8@Princeton.EDU> References: <1578a81534b9.1534b91578a8@Princeton.EDU> Message-ID: <1036528006.3dc82986f145e@ns.roda.ro> Hi Andrew, If you use DDI DTD for your codebooks I suppose you know about the Nesstar system. If not, here is the link: www.nesstar.org Regards, Adrian ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Adrian Dusa (adi@roda.ro) Romanian Social Data Archive (www.roda.ro) 1, Schitu Magureanu Bd. 76625 Bucharest sector 5 Romania Tel./Fax: +40 (21) 312.66.18 Quoting Andrew Dzhigo : > Hi, all > > I am an applications programmer for the Cultural Policy and the Arts National > Data Archive (CPANDA). Here at CPANDA we use DDI DTD to create XML codebooks > for the datasets we archive. > > With the number of datasets growing we are now facing the problem of storing > and managing our XML files. At this stage of development we are investigating > the possibilities of using some sort of XML-aware database management system. > Currently, I am considering Xindice system (formerly dbXML) developed by > Apache Software Foundation as a possible tool for our needs. However, I would > very much like to know how other developers address the similar problem. What > tools, database management systems and languages do they use? > > Any help would be greatly appreciated. > With regards, > Andrew Dzhigo > Applications Programmer > Cultural Policy and the Arts > National Data Archive (CPANDA), > Princeton University > adzhigo@princeton.edu > 1-609-258-7561 > > _______________________________________________ > DDI-users mailing list > DDI-users@icpsr.umich.edu > http://www.icpsr.umich.edu/mailman/listinfo/ddi-users > ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ From ddi-users@icpsr.umich.edu Wed Nov 6 14:41:20 2002 From: ddi-users@icpsr.umich.edu (Joachim Wackerow) Date: Wed, 06 Nov 2002 15:41:20 +0100 Subject: [DDI-users] Storing XML References: <5.1.0.14.0.20021104130646.00a13ec0@icpsr.umich.edu> Message-ID: <3DC92A10.5020902@zuma-mannheim.de> Hi I-Lin, I am interested in how you use Oracle for storing the DDIs: - Do you store the whole DDI file of a study in one XMLType field? - Is it possible to update only some DDI fields inside the XMLType? - Is the XPath capability of Oracle enough for your needs? - How is the performance doing queries to the XMLType? Together with colleagues of the German Microdata Lab (ZUMA) I am working on a information system on microdata (MISSY). The system should provide information (not the data) on the German Microcensus (a yearly census of 1% of the population from the Federal Statistics Bureau) for interested social science researchers. The first plan was to store the DDI-compatible information in a relational database. In a first prototype we made a mapping of the database schema to some DDI tags. But a mapping of the whole DDI to the database schema seems to be very complicated, and very unefficient in transferring the information between the both data models. So we are considering to store the DDI XML files in a XML enabled database or in a native XML database. At ZUMA we have also Oracle. Generally the XML native databases are still in a fast development process. XQuery and XUpdate implementations are still rare. As I know, Oracle hasn't XQuery or XUpdate capabilities. Regards, Achim I-Lin Kuo wrote: > Hi Andrew, > > Here at the Inter-University Consortium for Political and Social > Research (ICPSR), we're converting all our existing codebooks into DDI > format and developing a database to enable us to search for variables > across all the different survey codebook DDIs. We are using Oracle 9i. > > The choice of Oracle 9i is due to the fact that we have a site license, > but nonetheless, it has pretty good XML capabilities (Oracle 8i and SQL > Server are far behind Oracle 9i in this respect, and would not have been > suitable for our needs), including: > > - limited XPath capability > - storing XML fragments in an Oracle native XML format (XMLType) > - full-text indexing within a field of XMLType > - output directly into XML or XSLT via Oracle's XSQL > - ability to store XML directly using object technology without > having to map tags and attributes to database fields > > I haven't looked at Xindice or Tamino, because 9i was good enough for us. > From ddi-users@icpsr.umich.edu Wed Nov 6 16:27:34 2002 From: ddi-users@icpsr.umich.edu (I-Lin Kuo) Date: Wed, 06 Nov 2002 11:27:34 -0500 Subject: [DDI-users] Storing XML In-Reply-To: <3DC92A10.5020902@zuma-mannheim.de> References: <5.1.0.14.0.20021104130646.00a13ec0@icpsr.umich.edu> Message-ID: <5.1.0.14.0.20021106111222.00a0f060@icpsr.umich.edu> --=====================_1507526==_.ALT Content-Type: text/plain; charset="us-ascii"; format=flowed Hi Joachim, Keep in mind that all this is still in the design/development phase and that this approach I'm outlining will only work on Oracle 9i. Neither Oracle 8i nor SQL Server have these capabilities. I don't know if the native XML databases have this ability, however. >- Do you store the whole DDI file of a study in one XMLType field? The entire DDI will be stored in files in a special area outside the DB and a link to the file will be stored in the DB. While the entire DDI can be stored in one XMLType field, that type of storage implies that a search result would return the entire DDI Document -- the study. For our project which searches for variables between various studies, we want a search to return the variable within the study. Therefore, that type of storage would be just plain wrong, and so instead, we store the variable information in an XMLType. While Oracle Text can do indexing within an XMLType via an XPath-like syntax, I'm uncertain about its performance, so I'm also pre-extracting certain data from the var and placing it in fields -- but I'm only doing this for a very few fields which I think will be stable as the DDI evolves. This design has the advantage that I retain all the information in the XMLType field without having to explicitly remap everything if the schema evolves. The other nice consequence is that I don't have to worry so much about not allocating wide enough fields.... > - Is it possible to update only some DDI fields inside the XMLType? No. Currently, the contents of the XMLType must be updated as a whole, I think, but I'm not sure. However, it doesn't apply to be because of the reason below: I have also made the conscious decision that only entire DDI documents are to be uploaded to our database, and therefore users will not be able to edit individual entries in the database without editing the entire document. This is to resolve synchronization problems that can arise regarding which version of the data -- the document or the database content -- is the master version. With this decision, the DDI document itself is always the master. > - Is the XPath capability of Oracle enough for your needs? If they are as advertised, then they are sufficient. I'll find out more as I get deeper into the coding. > - How is the performance doing queries to the XMLType? I'll find out once I have actual DDI documents to search on, but I don't anticipate any problems. As I sort of mentioned before, I'm pre-extracting the fields to be displayed in a search summary and I'm indexing the other search fields within the XMLType. This way, a search doesn't really involve querying within the XMLType. I only goI use the XMLType at the end to pull up search result details for display. >But a mapping of the whole DDI to the database schema seems to be very complicated, and very inefficient in transferring >the information between the both data models. When I started this project, I worried that I'd have to take the approach of mapping as you stated -- which seems too complicated and unmaintainable as the schema evolves. I was very happy when I learned of Oracle 9i's vastly improved XML capabilities so I would not have to do this. Basically, Oracle re-purposed their Object-Oriented Database technology to store XML documents, and it's a natural fit. I-Lin Kuo ICPSR --=====================_1507526==_.ALT Content-Type: text/html; charset="us-ascii" Hi Joachim,

Keep in mind that all this is still in the design/development phase and that this approach I'm outlining will only work on Oracle 9i. Neither Oracle 8i nor SQL Server have these capabilities. I don't know if the native XML databases have this ability, however.

>- Do you store the whole DDI file of a study in one XMLType field?
        The entire DDI will be stored in files in a special area outside the DB and a link to the file will be stored in the DB. While the entire DDI can be stored in one XMLType field, that type of storage implies that a search result would return the entire DDI Document -- the study. For our project which searches for variables between various studies, we want a search to return the variable within the study. Therefore, that type of storage would be just plain wrong, and so instead, we store the variable information <var> in an XMLType.
        While Oracle Text can do indexing within an XMLType via an XPath-like syntax, I'm uncertain about its performance, so I'm also pre-extracting certain data from the var and placing it in fields -- but I'm only doing this for a very few fields which I think will be stable as the DDI evolves. This design has the advantage that I retain all the information in the XMLType field without having to explicitly remap everything if the schema evolves. The other nice consequence is that I don't have to worry so much about not allocating wide enough fields....

> - Is it possible to update only some DDI fields inside the XMLType?
        No. Currently, the contents of the XMLType must be updated as a whole, I think, but I'm not sure. However, it doesn't apply to be because of the reason below:
        I have also made the conscious decision that only entire DDI documents are to be uploaded to our database, and therefore users will not be able to edit individual entries in the database without editing the entire document. This is to resolve synchronization problems that can arise regarding which version of the data -- the document or the database content -- is the master version. With this decision, the DDI document itself is always the master.

> - Is the XPath capability of Oracle enough for your needs?
        If they are as advertised, then they are sufficient. I'll find out more as I get deeper into the coding.

> - How is the performance doing queries to the XMLType?
        I'll find out once I have actual DDI documents to search on, but I don't anticipate any problems. As I sort of mentioned before, I'm pre-extracting the fields to be displayed in a search summary and I'm indexing the other search fields within the XMLType. This way, a search doesn't really involve querying within the XMLType. I only goI use the XMLType at the end to pull up search result details for display.

>But a mapping of the whole DDI to the database schema seems to be very complicated, and very inefficient in transferring
>the information between the both data models.
        When I started this project, I worried that I'd have to take the approach of mapping as you stated -- which seems too complicated and unmaintainable as the schema evolves. I was very happy when I learned of Oracle 9i's vastly improved XML capabilities so I would not have to do this. Basically, Oracle re-purposed their Object-Oriented Database technology to store XML documents, and it's a natural fit.

I-Lin Kuo
ICPSR
--=====================_1507526==_.ALT-- From ddi-users@icpsr.umich.edu Wed Nov 6 16:33:44 2002 From: ddi-users@icpsr.umich.edu (Mark R. Diggory) Date: Wed, 06 Nov 2002 11:33:44 -0500 Subject: [DDI-users] Storing XML References: <1a9a741ab450.1ab4501a9a74@Princeton.EDU> Message-ID: <3DC94468.8090706@latte.harvard.edu> Andrew, Both eXist and Xindice are developed off of the original dbxml codebase. Which I think is still available from the www.xmldb.org website somewhere. Originally, eXist was the first to move into the XML-RPC technology and away from CORBA, but then again Xindice has also made that move to XML-RPC within the last year. Both have the following support XML-RPC service runs as a server XML-DB Client that can be used to connect to both local database instances (internal to an application) and remote XML-RPC interfaces (running on either a local or remote server somewhere). 1.) In terms of features: Both support XQuery and a certain subset of XPath Xindice has XUpdate where with eXist it is still in development. Both Xindice and eXist have a number of "api's available to access it from within a webapplication (Cocoon XML Generator, JSP Taglib, XMLDB API). 2.) In terms of performance: eXist can support larger document sizes than Xindice, This has alot to do with the developer of eXist focusing on improving the "paging memory" implementation that both eXist and Xindice use from the old dbxml codebase. Xindice can support multiple databases in it configuration on one server. Basically your looking a two different development branches for the dbxml codebase. eXist development has moved towards an internal support library for XML databases (like dbm). Xindice has moved more towards being an independent service (like http). I think both packages have the same capabilities, it just each does a particular task better than the other. In my opinion, I like the performance of eXist, and I like the ease of installation and customization I can do to eXist over that of Xindice. However, I've been waiting on the development of XUpdate in eXist to really move forward with my projects. Currently, rather than install the application, I've been just using the jar libraries within my own web applications running on Apache Jakarta Tomcat. This way I can secure the access to it from within my webapplication instead of in the external operating system. I also make installation of my webapplications (the front end of the VDC) much simpler. In terms of installation, I'd say that eXist is "lighter" than Xindice. I've been able to drop the eXist Jars into a webapplication and fire up my own instance of the eXist db internally for my own use with minimal development on my part. I would recommend getting familiar with both of them through some experimentation. As a Opensource Development Advocate, I'd have to say both are novel and cutting edge solutions to the problem of XML Storage and retrieval. But, as Opensource projects, getting support is inherent in your getting into "the community" and not through some watered down "service contract" you'd have to pay cash for. Don't be afraid of getting involved with their user/developer lists or to open up the code and take a look. Cheers, Mark Diggory Harvard MIT Data Center Andrew Dzhigo wrote: >Thank you for your reply. I will definitely take a look at the Virtual Data Center web site. >Since you have already done some research on eXist and Xindice, could you, please, share with me >your opinion of these two systems? > >With regards, > >Andrew Dzhigo >Applications Programmer >Cultural Policy and the Arts >National Data Archive (CPANDA), >Princeton University >adzhigo@princeton.edu >1-609-258-7561 > >----- Original Message ----- >From: "Mark R. Diggory" >Date: Monday, November 4, 2002 11:44 am >Subject: Re: [DDI-users] Storing XML > > > >>Hi Andrew, >> >>This is a shameless plug for a system we have been developing. >> >>We are working on an Opensource Digital Library System that >>utilizes the >>DDI are its primary storage XML format. This system is primarily >>for >>Social Science Reseach data such as that published at the ICPSR. >>It >>provides Indexing/Searching capabilities and server side data >>analysis >>tools for subsetting and manipulating the datasets. We are >>preparing for >>a software release towards the end of this year. >> >>Feel free to find out more about this project at the following sites: >> >>http://www.hmdc.harvard.edu >>http://www.thedata.org >>http://thedata.sourceforge.net >> >>We have a couple production systems running at Harvard. This is an >>older >>version of the "Virtual Data Center". But will be upgraded to our >>release version once it is completed. >>http://vdc-prod.hmdc.harvard.edu >> >>-Mark Diggory >>Project Manager / Software Engineer >>Harvard MIT Data Center >>http://www.hmdc.harvard.edu >> >>p.s. We currently archive the DDI's in a custom repository backed >>by >>Postgresql. I have been researching eXist >>http://exist.sourceforge.net >>or Xindice as a possible future Repository Implementations and >>have some >>rudimentary Tests/Implementations in the works. >> >> >>Andrew Dzhigo wrote: >> >> >> >>>Hi, all >>> >>>I am an applications programmer for the Cultural Policy and the >>> >>> >>Arts National Data Archive (CPANDA). Here at CPANDA we use DDI DTD >>to create XML codebooks for the datasets we archive. >> >> >>>With the number of datasets growing we are now facing the problem >>> >>> >>of storing and managing our XML files. At this stage of >>development we are investigating the possibilities of using some >>sort of XML-aware database management system. Currently, I am >>considering Xindice system (formerly dbXML) developed by Apache >>Software Foundation as a possible tool for our needs. However, I >>would very much like to know how other developers address the >>similar problem. What tools, database management systems and >>languages do they use? >> >> >>>Any help would be greatly appreciated. >>>With regards, >>>Andrew Dzhigo >>>Applications Programmer >>>Cultural Policy and the Arts >>>National Data Archive (CPANDA), >>>Princeton University >>>adzhigo@princeton.edu >>>1-609-258-7561 >>> >>>_______________________________________________ >>>DDI-users mailing list >>>DDI-users@icpsr.umich.edu >>>http://www.icpsr.umich.edu/mailman/listinfo/ddi-users >>> >>> >>> >>> >> >> >>_______________________________________________ >>DDI-users mailing list >>DDI-users@icpsr.umich.edu >>http://www.icpsr.umich.edu/mailman/listinfo/ddi-users >> >> >> > >_______________________________________________ >DDI-users mailing list >DDI-users@icpsr.umich.edu >http://www.icpsr.umich.edu/mailman/listinfo/ddi-users > > From ddi-users@icpsr.umich.edu Fri Nov 8 19:08:46 2002 From: ddi-users@icpsr.umich.edu (Joachim Wackerow) Date: Fri, 08 Nov 2002 20:08:46 +0100 Subject: [DDI-users] Storing XML References: <5.1.0.14.0.20021104130646.00a13ec0@icpsr.umich.edu> <5.1.0.14.0.20021106111222.00a0f060@icpsr.umich.edu> Message-ID: <3DCC0BBE.1090308@zuma-mannheim.de> Hi I-Lin, thank you for your detailled answer to my Oracle questions. It's very interesting for me, now I got some more questions. Do you store the DDI files as external LOBs (BFILE)? Do you store the variable information in a XMLType as CLOB or in the object-relational way? Do I understand you right, that, with your approach, it is not possible to make a search in every DDI field, but only in the var field? For our project I'm wondering, if it would be nice to give the authors the possibility to update DDI elements in the database. So I read a bit in the documentation of Oracle Release 2 (9.2). As a new feature of XML DB is mentioned a piecewise update possibility of XML via XPath, if the XMLType instances are stored object-relationally (p. 3 [1]). Perhaps it would be possible to build an update feature with WebDAV for XMLSpy or similar tools. But if the XML schema changes, it is necessary to change the object-relational XMLType, not the XMLType stored as internal or external CLOB (Details to piecewise update p. 5-70 [2]). Propably your way is easier to realize: to edit the DDI file first and then to store it as a whole file in the database. In the new documentation is also mentioned the extraction of a XML fragment of a XMLType (p.4-21 [3]). If it's really as documented, than it would be possible, to get only a part of the DDI file stored in XMLType as a result of a query via an Xpath expression (example on p. 4-23 [3]). It sounds powerfull and not to far away from XML standards. Probably I will try these features in some tests. I-Lin Kuo wrote: > >- Do you store the whole DDI file of a study in one XMLType field? > The entire DDI will be stored in files in a special area outside > the DB and a link to the file will be stored in the DB. While the entire > DDI can be stored in one XMLType field, that type of storage implies > that a search result would return the entire DDI Document -- the study. > For our project which searches for variables between various studies, we > want a search to return the variable within the study. Therefore, that > type of storage would be just plain wrong, and so instead, we store the > variable information in an XMLType. > While Oracle Text can do indexing within an XMLType via an > XPath-like syntax, I'm uncertain about its performance, so I'm also > pre-extracting certain data from the var and placing it in fields -- but > I'm only doing this for a very few fields which I think will be stable > as the DDI evolves. This design has the advantage that I retain all the > information in the XMLType field without having to explicitly remap > everything if the schema evolves. The other nice consequence is that I > don't have to worry so much about not allocating wide enough fields.... > > > - Is it possible to update only some DDI fields inside the XMLType? > No. Currently, the contents of the XMLType must be updated as a > whole, I think, but I'm not sure. However, it doesn't apply to be > because of the reason below: > I have also made the conscious decision that only entire DDI > documents are to be uploaded to our database, and therefore users will > not be able to edit individual entries in the database without editing > the entire document. This is to resolve synchronization problems that > can arise regarding which version of the data -- the document or the > database content -- is the master version. With this decision, the DDI > document itself is always the master. > > > - Is the XPath capability of Oracle enough for your needs? > If they are as advertised, then they are sufficient. I'll find > out more as I get deeper into the coding. > > > - How is the performance doing queries to the XMLType? > I'll find out once I have actual DDI documents to search on, but > I don't anticipate any problems. As I sort of mentioned before, I'm > pre-extracting the fields to be displayed in a search summary and I'm > indexing the other search fields within the XMLType. This way, a search > doesn't really involve querying within the XMLType. I only goI use the > XMLType at the end to pull up search result details for display. Regards, Achim [1] Oracle XML DB : Key Features in Oracle9i Database Release 2 http://otn.oracle.com/tech/xml/xmldb/pdf/xmldb92tfovw.pdf [2] Oracle9i XML Database Developer's Guide -- Oracle XML DB http://otn.oracle.com/docs/products/oracle9i/doc_library/release2/appdev.920/a96620.pdf PS: I'm not sure, if this Oracle details are of interest for other DDI mailing list participants. From ddi-users@icpsr.umich.edu Wed Nov 27 15:27:45 2002 From: ddi-users@icpsr.umich.edu (Julien Barnier) Date: Wed, 27 Nov 2002 16:27:45 +0100 Subject: [DDI-users] Examples of DDI referenced survey Message-ID: <20021127162745.00001ff2.barnier@iresco.fr> Hi, I work in the new french national social sciences data centre (the Quetelet centre), and we are planning to use the DDI standard to reference our surveys. We currently work on the codebook description, but there are some parts of the standard, in particular in the fourth section ("Variable description"), which are not totally clear to us. So, I would like to know if there are some examples of surveys using DDI which would be freely accessible to us, just in order to look at some tags and improve our understanding of their function and structure. Thanking you in advance for any information, Sincerely, --=20 Julien Barnier Centre Quetelet - CNRS UMS 2419 Iresco, 59 rue Pouchet 75498 Paris Cedex 17 T=E9l. : 01.40.25.12.24 From ddi-users@icpsr.umich.edu Wed Nov 27 18:45:06 2002 From: ddi-users@icpsr.umich.edu (Sanda Ionescu) Date: Wed, 27 Nov 2002 13:45:06 -0500 Subject: [DDI-users] Re: Centre Quetelet's request In-Reply-To: <200211271701.gARH14c11693@icpsr.umich.edu> Message-ID: <5.1.0.14.0.20021127133955.01fdcd00@icpsr.umich.edu> Hi, Julien. On our website we have 12 DDI codebooks posted at http://www.icpsr.umich.edu/DDI/SAMPLES/index.html. You cannot view the XML markup on the site, but you can download the XML files (by clicking on the download option), and then view them on your own computer with an XML editor. The variables sections are fully marked up - and you'll find examples of almost anything. Let me know if you have any problems, or if you have questions about how we've used the markup. Good luck, Sanda. Sanda Ionescu Research Associate Inter-university Consortium for Political and Social Research (ICPSR) 311 Maynard St. Ann Arbor, MI 48104-2211 Phone: (734) 998-9895 Fax (734) 998-9889