[DDI-users] Storing XML
I-Lin Kuo
ddi-users@icpsr.umich.edu
Wed, 06 Nov 2002 11:27:34 -0500
--=====================_1507526==_.ALT
Content-Type: text/plain; charset="us-ascii"; format=flowed
Hi Joachim,
Keep in mind that all this is still in the design/development phase and
that this approach I'm outlining will only work on Oracle 9i. Neither
Oracle 8i nor SQL Server have these capabilities. I don't know if the
native XML databases have this ability, however.
>- Do you store the whole DDI file of a study in one XMLType field?
The entire DDI will be stored in files in a special area outside
the DB and a link to the file will be stored in the DB. While the entire
DDI can be stored in one XMLType field, that type of storage implies that a
search result would return the entire DDI Document -- the study. For our
project which searches for variables between various studies, we want a
search to return the variable within the study. Therefore, that type of
storage would be just plain wrong, and so instead, we store the variable
information <var> in an XMLType.
While Oracle Text can do indexing within an XMLType via an
XPath-like syntax, I'm uncertain about its performance, so I'm also
pre-extracting certain data from the var and placing it in fields -- but
I'm only doing this for a very few fields which I think will be stable as
the DDI evolves. This design has the advantage that I retain all the
information in the XMLType field without having to explicitly remap
everything if the schema evolves. The other nice consequence is that I
don't have to worry so much about not allocating wide enough fields....
> - Is it possible to update only some DDI fields inside the XMLType?
No. Currently, the contents of the XMLType must be updated as a
whole, I think, but I'm not sure. However, it doesn't apply to be because
of the reason below:
I have also made the conscious decision that only entire DDI
documents are to be uploaded to our database, and therefore users will not
be able to edit individual entries in the database without editing the
entire document. This is to resolve synchronization problems that can arise
regarding which version of the data -- the document or the database content
-- is the master version. With this decision, the DDI document itself is
always the master.
> - Is the XPath capability of Oracle enough for your needs?
If they are as advertised, then they are sufficient. I'll find out
more as I get deeper into the coding.
> - How is the performance doing queries to the XMLType?
I'll find out once I have actual DDI documents to search on, but I
don't anticipate any problems. As I sort of mentioned before, I'm
pre-extracting the fields to be displayed in a search summary and I'm
indexing the other search fields within the XMLType. This way, a search
doesn't really involve querying within the XMLType. I only goI use the
XMLType at the end to pull up search result details for display.
>But a mapping of the whole DDI to the database schema seems to be very
complicated, and very inefficient in transferring
>the information between the both data models.
When I started this project, I worried that I'd have to take the
approach of mapping as you stated -- which seems too complicated and
unmaintainable as the schema evolves. I was very happy when I learned of
Oracle 9i's vastly improved XML capabilities so I would not have to do
this. Basically, Oracle re-purposed their Object-Oriented Database
technology to store XML documents, and it's a natural fit.
I-Lin Kuo
ICPSR
--=====================_1507526==_.ALT
Content-Type: text/html; charset="us-ascii"
<html>
Hi Joachim,<br><br>
Keep in mind that all this is still in the design/development phase and
that this approach I'm outlining will only work on Oracle 9i. Neither
Oracle 8i nor SQL Server have these capabilities. I don't know if the
native XML databases have this ability, however.<br><br>
>- Do you store the whole DDI file of a study in one XMLType
field?<br>
<x-tab> </x-tab>The entire
DDI will be stored in files in a special area outside the DB and a link
to the file will be stored in the DB. While the entire DDI can be stored
in one XMLType field, that type of storage implies that a search result
would <b>return the entire DDI Document</b> -- the study. For our project
which searches for variables between various studies, we want a search to
<b>return the variable</b> within the study. Therefore, that type of
storage would be just plain wrong, and so instead, we store the variable
information <var> in an XMLType.<br>
<x-tab> </x-tab>While
Oracle Text can do indexing within an XMLType via an XPath-like syntax,
I'm uncertain about its performance, so I'm also pre-extracting certain
data from the var and placing it in fields -- but I'm only doing this for
a very few fields which I think will be stable as the DDI evolves. This
design has the advantage that I retain all the information in the XMLType
field without having to explicitly remap <i>everything </i>if the schema
evolves. The other nice consequence is that I don't have to worry so much
about not allocating wide enough fields....<br><br>
> - Is it possible to update only some DDI fields inside the
XMLType?<br>
<x-tab> </x-tab>No.
Currently, the contents of the XMLType must be updated as a whole, I
think, but I'm not sure. However, it doesn't apply to be because of the
reason below:<br>
<x-tab> </x-tab>I have
also made the conscious decision that only entire DDI documents are to be
uploaded to our database, and therefore users will not be able to edit
individual entries in the database without editing the entire document.
This is to resolve synchronization problems that can arise regarding
which version of the data -- the document or the database content -- is
the master version. With this decision, the DDI document itself is always
the master.<br><br>
> - Is the XPath capability of Oracle enough for your needs?<br>
<x-tab> </x-tab>If they
are as advertised, then they are sufficient. I'll find out more as I get
deeper into the coding.<br><br>
> - How is the performance doing queries to the XMLType?<br>
<x-tab> </x-tab>I'll find
out once I have actual DDI documents to search on, but I don't anticipate
any problems. As I sort of mentioned before, I'm pre-extracting the
fields to be displayed in a search summary and I'm indexing the other
search fields within the XMLType. This way, a search doesn't really
involve querying within the XMLType. I only goI use the XMLType at the
end to pull up search result details for display.<br><br>
>But a mapping of the whole DDI to the database schema seems to be
very complicated, and very inefficient in transferring <br>
>the information between the both data models. <br>
<x-tab> </x-tab>When I
started this project, I worried that I'd have to take the approach of
mapping as you stated -- which seems too complicated and unmaintainable
as the schema evolves. I was very happy when I learned of Oracle 9i's
vastly improved XML capabilities so I would not have to do this.
Basically, Oracle re-purposed their Object-Oriented Database technology
to store XML documents, and it's a natural fit.<br><br>
I-Lin Kuo<br>
ICPSR<br>
</html>
--=====================_1507526==_.ALT--