my recent reads..

Extracting XPath refs from an XML document

I was inspired by a recent post in the XMLDB Forum to look at the question of how to extract a complete list of XPaths and the associated text node values from an arbitrary XML file. I looked into an XSLT approach which I'll describe here.

Say we have an XML file like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<Library>
<Books>
<Book>
<Author>
<Last>Perry</Last>
<First>Anne</First>
</Author>
<Title>Long Spoon Lane</Title>
</Book>
</Books>
<Members>
<Member>
<Name>Paul</Name>
<Joined>2005-11-01</Joined>
</Member>
</Members>
</Library>

And our objective is to produce a listing like this:
/Library/Books/Book/Author/Last():Perry
/Library/Books/Book/Author/First():Anne
/Library/Books/Book/Title():Long Spoon Lane
/Library/Members/Member/Name():Paul
/Library/Members/Member/Joined():2005-11-01

After some investigation and reference to sites like Path Tracing and the XSLT 1.0 spec I arrived at what I think is the simplest xsl possible:
<?xml version="1.0" encoding="windows-1252" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>

<xsl:strip-space elements = "*" />

<xsl:template match="text()">

<xsl:for-each select="ancestor-or-self::*">
<xsl:text>/</xsl:text>
<xsl:value-of select="name()" />
</xsl:for-each>

<xsl:text>():</xsl:text>
<xsl:value-of select="." />
<xsl:text>&#xA;</xsl:text>

<xsl:apply-templates/>

</xsl:template>

</xsl:stylesheet>

What is going on here?

Well, firstly note that we strip-spaces and then match on all text() nodes - this ensures we skip all the pure whitespace nodes.

The magic that generates the XPath is the the "for-each" over all "ancestor-or-self" elements which generates the XPath identifier. Then we simply add the text value on the end.

A variation on the XSL template that produces an XML structure instead of text is as follows. It really varies just in terms of output formatting:
<?xml version="1.0" encoding="windows-1252" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>

<xsl:strip-space elements = "*" />

<xsl:template match="/">
<items>
<xsl:apply-templates/>
</items>
</xsl:template>

<xsl:template match="text()">
<item>
<path>
<xsl:for-each select="ancestor-or-self::*">
<xsl:text>/</xsl:text>
<xsl:value-of select="name()" />
</xsl:for-each>
</path>
<value>
<xsl:value-of select="." />
<xsl:apply-templates/>
</value>
</item>
</xsl:template>

</xsl:stylesheet>

read more and comment..

Handling namespaces with DBMS_XMLDOM

The PL/SQL package dbms_xmlgen has been around for a while now, but it sort of suffers from a lack of doc and examples. Getting it to handle namespaces properly is a good example - it seems like it should be a bit more intelligent than it actually is!

Let's say we wanted to generate this:

<?xml version="1.0"?>
<a:gadgets xmlns:a="uri:a" xmlns:b="uri:b" >
<b:phones/>
</a:gadgets>

Part of the solution would look like this (enough to illustrate some key points):
...
V_CURRENT_EL := DBMS_XMLDOM.CREATEELEMENT(doc => V_DOMDOCUMENT, tagName => 'gadgets', ns => 'uri:a');
DBMS_XMLDOM.SETATTRIBUTE(elem => V_CURRENT_EL, name => 'xmlns:a', newvalue => 'uri:a');
DBMS_XMLDOM.SETATTRIBUTE(elem => V_CURRENT_EL, name => 'xmlns:b', newvalue => 'uri:b');
V_CURRENT_CHILD_NODE := DBMS_XMLDOM.makeNode(V_CURRENT_EL);
DBMS_XMLDOM.SETPREFIX(n => V_CURRENT_CHILD_NODE, prefix => 'a');
V_CURRENT_NODE := DBMS_XMLDOM.APPENDCHILD(n => V_CURRENT_NODE, newChild => V_CURRENT_CHILD_NODE);
...
Here are the few guidelines I've "inferred" about DBMS_XMLDOM behaviour (note that this is all my reverse-engineered understanding, so I may be well off base!):

1. namespaceURI/ns params of createDocument, createElement methods just declare the namespace of the entity and have no practical impact on the generated xml

So the xml output would be the same even if you changed line 1 of the code above to:
V_CURRENT_EL := DBMS_XMLDOM.CREATEELEMENT(doc => V_DOMDOCUMENT, tagName => 'gadgets', ns => 'someotheruri');
NB: you can verify the namespace is internally altered using DBMS_XMLDOM.GETNAMESPACE

2. to set the xmlns attributes of the document, you need to explicitly create the attribute. You need lines 3 and 4 of the code above.

3. to cause a node or attribute to be serialised with a namespace prefix, you need to explicitly request it with the SETPREFIX call e.g.
DBMS_XMLDOM.SETPREFIX(n => V_CURRENT_CHILD_NODE, prefix => 'a');  


4. however, the namespace prefix will only be used in the serialised xml if you also set a namespaceURI/ns (from point 1).


5. there is no validation or verification that the namespace prefix you set (point 3) matches a namespace you have declared (points 1 and 2).


As you can see, DBMS_XMLDOM is really just a very thin wraper to programmatically generate XML at the lowest level. It leaves much of the "intelligence" for you to provide!

Bearing that in mind, you probably can think of many cases where using DBMS_XMLDOM is not the right answer, but is sort of brute force/naive.

If you have data in Oracle and want to produce some complex XML output, there are many smarter alternatives, such as defining an xmltype view over the source data and then doing an xsl transform into the desired document format. Aside from being much less tiresome than DBMS_XMLDOM, it has the benefit of separating presentation (the template) from the data. When you need to change the output format, you just change the template rather than hacking away at the DBMS_XMLDOM code.
read more and comment..

Complex SOAP::Lite requests - my rules for SOAP::Sanity!

Previously, I mentioned I'd come back to more complex request and response structures with SOAP::Lite.

Frankly, I haven't posted because I can't avoid the feeling that there's still a little more to unravel. Did I mention that good documentation is sparse? ;) Byrne and others have posted some good stuff (see for example the majordojo soaplite archive and this hands-on tour at builder.com), but mostly you'll find the experiences shared go along lines like "...after hacking around for a bit, I found that this worked...".

But is it possible to try and pin down at least a couple of guidelines for using SOAP::Data? Well, so far I can't claim to solving it all, but I am able to share a few anchors I've tried to plant for my own sanity!

My Rules for SOAP::Sanity


In the following "rules", $soap is a pre-initialised SOAP::Lite object, as in:
my $soap = SOAP::Lite->uri ( $serviceNs ) -> proxy ( $serviceUrl );

1. The value of a SOAP::Data element becomes the content of the XML entity.


It may seem bleeding obvious. Nevertheless, get this idea fixed in you head and it will help for more complex structures.

So if we are calling a "getHoroscope" service with the following request structure:
<getHoroscope>
<sign>Aries</sign>
</getHoroscope>

"Aries" is the value, i.e. the content, of the XML entity called "sign". Thus our request will look like this:
$data = SOAP::Data->name("sign" => 'Aries');
$som = $soap->getHoroscope( $data );

2. To create a hiearchy of entities, use references to a SOAP::Data structure.


In (1), the content of the entity was a simple string ("Aries"). Here we consider the case where we need the content to encapsulate more XML elements rather than just a string. For example a request with this structure:
<getHoroscope>
<astrology>
<sign>Aries</sign>
</astrology>
</getHoroscope>

Here "astrology" has an XML child element rather than a string value.
To achieve this, we set the value of the "astrology" element as a reference to the "sign" SOAP::Data object:
$data = SOAP::Data->name("astrology" =>
\SOAP::Data->name("sign" => 'Aries')
);
$som = $soap->getHoroscope( $data );


3. To handle multiple child entities, encapsuate as reference to a SOAP::Data collection.


In this case, we need our "astrology" element to have multiple children, for example:
<getHoroscope>
<astrology>
<sign>Aries</sign>
<sign>Pisces</sign>
</astrology>
</getHoroscope>

So a simple variation on (2). To achieve this, we collect the "Aries" and "Pisces" elements as a collection within an anonymous SOAP::Data object. We pass a reference to this object as the value of the "astrology" item.
$data = SOAP::Data->name("astrology" => 
\SOAP::Data->value(
SOAP::Data->name("sign" => 'Aries'),
SOAP::Data->name("sign" => 'Pisces')
)
);
$som = $soap->getHoroscope( $data );

4. Clearly distinguish method name structures from data.


This is perhaps just a style and clarity consideration. In the examples above, the method has been implicitly dispatched ("getHoroscope").
If you prefer (or need) to pass the method information to a SOAP::Lite call, I like to keep the method information distinct from the method data.

So for example, item (3) can be re-written (including some additional namespace handling) as:
$data = SOAP::Data->name("astrology" => 
\SOAP::Data->value(
SOAP::Data->name("sign" => 'Aries'),
SOAP::Data->name("sign" => 'Pisces')
)
);
$som = $soap->call(
SOAP::Data->name('x:getHoroscope')->attr({'xmlns:x' => $serviceNs})
=> $data
);

I prefer to read this than have it all mangled together.

That brings me to the end of my list of rules! I am by no means confident that there aren't more useful guidelines to be added, or that in fact the ones I have proposed above will even stand the test of time.

Nevertheless, with these four ideas clearly in mind, I find I have a fair chance of sitting down to write a complex SOAP::Lite call correctly the first time, rather than the trial and error approach I used to be very familiar with!
read more and comment..

Safe OCCI createStatelessConnectionPool usage

I've been working a bit with Oracle's OCCI (the C++ API for Oracle Database), and stateless connection pools in particular.

I've noticed a particular behaviour that's important to be aware of when creating the connection pool (using the oracle::occi::Environment method "createStatelessConnectionPool"). The problem is that this call will fail if you have some kind of conenction or TNS error, leaving you with an unusable and invalid pool.

To give a concrete example, if you create a connection pool like this:

scPool = env->createStatelessConnectionPool(...

what I find is that if the database is down (for example), this call with throw a TNS error like ORA-12541: TNS:no listener, and the scPool object is invalid (but not a null reference).

if you attempt to use the pool thereafter, e.g.:
if (scPool) cout << "MaxConn=" << scPool->getMaxConnections() << endl;

then firstly "if (scPool) .." wont protect you (because its not null), and the getMaxConnections method call will throw an uncatchable segmentation fault (this is Linux x86 I'm using)

The workaround of course is to null your scPool if you catch the TNS error, and then if you want a robust application that must keep running even if the connection pool is not created, everytime you try and get a connection from the pool you should also first check to see if you have a pool object to use (and if not, try and create it again).

Tortuous to say the least!

I would have expected that the desired behaviour should be for createStatelessConnectionPool to return a valid connection pool even if connections are not possible at that point in time, and that for the TNS errors to be only thrown if and when you try and get a connection from the pool.

Anyone have a view? ... bug, ER or expected?

12-Feb, an update: I've since discovered that this behaviour is true only if you set a "minumum connections" >0 for the pool. If you set "minumum connections"==0, then the behaviour is as I would expect - the pool is created without error, but you may hit an error when attempting to get a connection from the pool.
read more and comment..