[DDI-users] Storing XML

I-Lin Kuo ddi-users@icpsr.umich.edu
Wed, 06 Nov 2002 11:27:34 -0500


--=====================_1507526==_.ALT
Content-Type: text/plain; charset="us-ascii"; format=flowed

Hi Joachim,

Keep in mind that all this is still in the design/development phase and 
that this approach I'm outlining will only work on Oracle 9i. Neither 
Oracle 8i nor SQL Server have these capabilities. I don't know if the 
native XML databases have this ability, however.

 >- Do you store the whole DDI file of a study in one XMLType field?
         The entire DDI will be stored in files in a special area outside 
the DB and a link to the file will be stored in the DB. While the entire 
DDI can be stored in one XMLType field, that type of storage implies that a 
search result would return the entire DDI Document -- the study. For our 
project which searches for variables between various studies, we want a 
search to return the variable within the study. Therefore, that type of 
storage would be just plain wrong, and so instead, we store the variable 
information <var> in an XMLType.
         While Oracle Text can do indexing within an XMLType via an 
XPath-like syntax, I'm uncertain about its performance, so I'm also 
pre-extracting certain data from the var and placing it in fields -- but 
I'm only doing this for a very few fields which I think will be stable as 
the DDI evolves. This design has the advantage that I retain all the 
information in the XMLType field without having to explicitly remap 
everything if the schema evolves. The other nice consequence is that I 
don't have to worry so much about not allocating wide enough fields....

 > - Is it possible to update only some DDI fields inside the XMLType?
         No. Currently, the contents of the XMLType must be updated as a 
whole, I think, but I'm not sure. However, it doesn't apply to be because 
of the reason below:
         I have also made the conscious decision that only entire DDI 
documents are to be uploaded to our database, and therefore users will not 
be able to edit individual entries in the database without editing the 
entire document. This is to resolve synchronization problems that can arise 
regarding which version of the data -- the document or the database content 
-- is the master version. With this decision, the DDI document itself is 
always the master.

 > - Is the XPath capability of Oracle enough for your needs?
         If they are as advertised, then they are sufficient. I'll find out 
more as I get deeper into the coding.

 > - How is the performance doing queries to the XMLType?
         I'll find out once I have actual DDI documents to search on, but I 
don't anticipate any problems. As I sort of mentioned before, I'm 
pre-extracting the fields to be displayed in a search summary and I'm 
indexing the other search fields within the XMLType. This way, a search 
doesn't really involve querying within the XMLType. I only goI use the 
XMLType at the end to pull up search result details for display.

 >But a mapping of the whole DDI to the database schema seems to be very 
complicated, and very inefficient in transferring
 >the information between the both data models.
         When I started this project, I worried that I'd have to take the 
approach of mapping as you stated -- which seems too complicated and 
unmaintainable as the schema evolves. I was very happy when I learned of 
Oracle 9i's vastly improved XML capabilities so I would not have to do 
this. Basically, Oracle re-purposed their Object-Oriented Database 
technology to store XML documents, and it's a natural fit.

I-Lin Kuo
ICPSR

--=====================_1507526==_.ALT
Content-Type: text/html; charset="us-ascii"

<html>
Hi Joachim,<br><br>
Keep in mind that all this is still in the design/development phase and
that this approach I'm outlining will only work on Oracle 9i. Neither
Oracle 8i nor SQL Server have these capabilities. I don't know if the
native XML databases have this ability, however.<br><br>
&gt;- Do you store the whole DDI file of a study in one XMLType
field?<br>
<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>The entire
DDI will be stored in files in a special area outside the DB and a link
to the file will be stored in the DB. While the entire DDI can be stored
in one XMLType field, that type of storage implies that a search result
would <b>return the entire DDI Document</b> -- the study. For our project
which searches for variables between various studies, we want a search to
<b>return the variable</b> within the study. Therefore, that type of
storage would be just plain wrong, and so instead, we store the variable
information &lt;var&gt; in an XMLType.<br>
<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>While
Oracle Text can do indexing within an XMLType via an XPath-like syntax,
I'm uncertain about its performance, so I'm also pre-extracting certain
data from the var and placing it in fields -- but I'm only doing this for
a very few fields which I think will be stable as the DDI evolves. This
design has the advantage that I retain all the information in the XMLType
field without having to explicitly remap <i>everything </i>if the schema
evolves. The other nice consequence is that I don't have to worry so much
about not allocating wide enough fields....<br><br>
&gt; - Is it possible to update only some DDI fields inside the
XMLType?<br>
<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>No.
Currently, the contents of the XMLType must be updated as a whole, I
think, but I'm not sure. However, it doesn't apply to be because of the
reason below:<br>
<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>I have
also made the conscious decision that only entire DDI documents are to be
uploaded to our database, and therefore users will not be able to edit
individual entries in the database without editing the entire document.
This is to resolve synchronization problems that can arise regarding
which version of the data -- the document or the database content -- is
the master version. With this decision, the DDI document itself is always
the master.<br><br>
&gt; - Is the XPath capability of Oracle enough for your needs?<br>
<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>If they
are as advertised, then they are sufficient. I'll find out more as I get
deeper into the coding.<br><br>
&gt; - How is the performance doing queries to the XMLType?<br>
<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>I'll find
out once I have actual DDI documents to search on, but I don't anticipate
any problems. As I sort of mentioned before, I'm pre-extracting the
fields to be displayed in a search summary and I'm indexing the other
search fields within the XMLType. This way, a search doesn't really
involve querying within the XMLType. I only goI use the XMLType at the
end to pull up search result details for display.<br><br>
&gt;But a mapping of the whole DDI to the database schema seems to be
very complicated, and very inefficient in transferring <br>
&gt;the information between the both data models. <br>
<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>When I
started this project, I worried that I'd have to take the approach of
mapping as you stated -- which seems too complicated and unmaintainable
as the schema evolves. I was very happy when I learned of Oracle 9i's
vastly improved XML capabilities so I would not have to do this.
Basically, Oracle re-purposed their Object-Oriented Database technology
to store XML documents, and it's a natural fit.<br><br>
I-Lin Kuo<br>
ICPSR<br>
</html>

--=====================_1507526==_.ALT--