my recent reads..

Synchronising two directory trees

I've released an update to Chang Liu's tree-sync.pl script on CPAN (currently tree-sync-2.2.pl).

I had some problems with the original script, so this is a re-write that uses an algorithm based on File::Find module.

I've only been testing on Windows and Linux, but so far so good - it solves my immiedate issue which was to have a way of maintaining a synchronised "backup" copy of various filesystems.

PS: As of Oct-2008, the tree-sync project is now on github. Use this if you want to contribute to development. Of course, releases will still be distributed for use on CPAN.
read more and comment..

XSL Transforms in the database

Previously, I wrote on how to extract XPath refs from an arbitrary XML document. Well, you can actually do this inside a database too - specifically Oracle 9i/10g with built-in XMLDB support.

Say we have XML data and XSL templates stored in a simple table:


CREATE TABLE x1 (item varchar(25) primary key, xml xmltype);

Where the data is stored with item="data", and the XSL template to extract paths to text is stored as item="xsl-to-text", then our transform may be executed as simply as this:
select 
XMLTransform(
xml,
(select xml from x1 where item='xsl-to-text')
).getstringval() into v_out_text
from x1 where item='data';
dbms_output.put_line(v_out_text);

A full sample script is available here.
read more and comment..

Extracting XPath refs from an XML document

I was inspired by a recent post in the XMLDB Forum to look at the question of how to extract a complete list of XPaths and the associated text node values from an arbitrary XML file. I looked into an XSLT approach which I'll describe here.

Say we have an XML file like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<Library>
<Books>
<Book>
<Author>
<Last>Perry</Last>
<First>Anne</First>
</Author>
<Title>Long Spoon Lane</Title>
</Book>
</Books>
<Members>
<Member>
<Name>Paul</Name>
<Joined>2005-11-01</Joined>
</Member>
</Members>
</Library>

And our objective is to produce a listing like this:
/Library/Books/Book/Author/Last():Perry
/Library/Books/Book/Author/First():Anne
/Library/Books/Book/Title():Long Spoon Lane
/Library/Members/Member/Name():Paul
/Library/Members/Member/Joined():2005-11-01

After some investigation and reference to sites like Path Tracing and the XSLT 1.0 spec I arrived at what I think is the simplest xsl possible:
<?xml version="1.0" encoding="windows-1252" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>

<xsl:strip-space elements = "*" />

<xsl:template match="text()">

<xsl:for-each select="ancestor-or-self::*">
<xsl:text>/</xsl:text>
<xsl:value-of select="name()" />
</xsl:for-each>

<xsl:text>():</xsl:text>
<xsl:value-of select="." />
<xsl:text>&#xA;</xsl:text>

<xsl:apply-templates/>

</xsl:template>

</xsl:stylesheet>

What is going on here?

Well, firstly note that we strip-spaces and then match on all text() nodes - this ensures we skip all the pure whitespace nodes.

The magic that generates the XPath is the the "for-each" over all "ancestor-or-self" elements which generates the XPath identifier. Then we simply add the text value on the end.

A variation on the XSL template that produces an XML structure instead of text is as follows. It really varies just in terms of output formatting:
<?xml version="1.0" encoding="windows-1252" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>

<xsl:strip-space elements = "*" />

<xsl:template match="/">
<items>
<xsl:apply-templates/>
</items>
</xsl:template>

<xsl:template match="text()">
<item>
<path>
<xsl:for-each select="ancestor-or-self::*">
<xsl:text>/</xsl:text>
<xsl:value-of select="name()" />
</xsl:for-each>
</path>
<value>
<xsl:value-of select="." />
<xsl:apply-templates/>
</value>
</item>
</xsl:template>

</xsl:stylesheet>

read more and comment..

Handling namespaces with DBMS_XMLDOM

The PL/SQL package dbms_xmlgen has been around for a while now, but it sort of suffers from a lack of doc and examples. Getting it to handle namespaces properly is a good example - it seems like it should be a bit more intelligent than it actually is!

Let's say we wanted to generate this:

<?xml version="1.0"?>
<a:gadgets xmlns:a="uri:a" xmlns:b="uri:b" >
<b:phones/>
</a:gadgets>

Part of the solution would look like this (enough to illustrate some key points):
...
V_CURRENT_EL := DBMS_XMLDOM.CREATEELEMENT(doc => V_DOMDOCUMENT, tagName => 'gadgets', ns => 'uri:a');
DBMS_XMLDOM.SETATTRIBUTE(elem => V_CURRENT_EL, name => 'xmlns:a', newvalue => 'uri:a');
DBMS_XMLDOM.SETATTRIBUTE(elem => V_CURRENT_EL, name => 'xmlns:b', newvalue => 'uri:b');
V_CURRENT_CHILD_NODE := DBMS_XMLDOM.makeNode(V_CURRENT_EL);
DBMS_XMLDOM.SETPREFIX(n => V_CURRENT_CHILD_NODE, prefix => 'a');
V_CURRENT_NODE := DBMS_XMLDOM.APPENDCHILD(n => V_CURRENT_NODE, newChild => V_CURRENT_CHILD_NODE);
...
Here are the few guidelines I've "inferred" about DBMS_XMLDOM behaviour (note that this is all my reverse-engineered understanding, so I may be well off base!):

1. namespaceURI/ns params of createDocument, createElement methods just declare the namespace of the entity and have no practical impact on the generated xml

So the xml output would be the same even if you changed line 1 of the code above to:
V_CURRENT_EL := DBMS_XMLDOM.CREATEELEMENT(doc => V_DOMDOCUMENT, tagName => 'gadgets', ns => 'someotheruri');
NB: you can verify the namespace is internally altered using DBMS_XMLDOM.GETNAMESPACE

2. to set the xmlns attributes of the document, you need to explicitly create the attribute. You need lines 3 and 4 of the code above.

3. to cause a node or attribute to be serialised with a namespace prefix, you need to explicitly request it with the SETPREFIX call e.g.
DBMS_XMLDOM.SETPREFIX(n => V_CURRENT_CHILD_NODE, prefix => 'a');  


4. however, the namespace prefix will only be used in the serialised xml if you also set a namespaceURI/ns (from point 1).


5. there is no validation or verification that the namespace prefix you set (point 3) matches a namespace you have declared (points 1 and 2).


As you can see, DBMS_XMLDOM is really just a very thin wraper to programmatically generate XML at the lowest level. It leaves much of the "intelligence" for you to provide!

Bearing that in mind, you probably can think of many cases where using DBMS_XMLDOM is not the right answer, but is sort of brute force/naive.

If you have data in Oracle and want to produce some complex XML output, there are many smarter alternatives, such as defining an xmltype view over the source data and then doing an xsl transform into the desired document format. Aside from being much less tiresome than DBMS_XMLDOM, it has the benefit of separating presentation (the template) from the data. When you need to change the output format, you just change the template rather than hacking away at the DBMS_XMLDOM code.
read more and comment..