Handling Nulls in XML Sources

Reading XML is very straightforward when everything is present, all the elements are populated, and the entire XML “tree” exists in the current document that you are reading. However, when elements are missing from the document entirely, are “empty” (by XML definition), or present but formally indicated as null, things can get tricky. Nulls can be confusing enough as it is! I’ll try here to clarify the null issues you should be considering in your ETL processes, and outline specifically how such things are managed in DataStage when you use XMLInput…

For starters, let’s leave the “Repeating Element” (the one you mark as “key”) out of this blog entry. It has its own important considerations. We’ll concentrate for now on standard elements that may or may not appear in your document, may be empty, or may be formally marked as null (element and its tags exist, but it is empty with attribute xsi:nil=”true”). Let’s look at some examples.

Imagine the details of a document about products. It might have an element called “color.” Color could have a value:

<product>
<number>123</number>
<color>blue</blue>
<location>A</location>
.
.
.
</product>

 

Color could be “empty”:

 

<product>
<number>123</number>
<color></color> [or <color/> ]
<location>A</location>
.
.
.
</product>

 

Color could be formally null:

 

<product>
<number>123</number>
<color xsi:nil=”true”></color>
<location>A</location>
.
.
.

</product>

 

…or be entirely missing (for a given product):

 

<product>
<number>123</number>
<location>A</location>
.
.
.
</product>

 

What constitutes a null value? When is it blank? When your tool reads and parses this XML document, what do you want it to do?

DataStage gives us several options. There are two check boxes in the Transformation tab of the Output Link. By default, these boxes are unchecked:

Replace NULLs with empty values [blanks]
Replace empty values with NULLs

Without doing anything, empty elements will be treated as blanks, and null indicated or missing elements will be treated as null.

If you only check “Replace NULLs with empty values,” nothing will be null anymore….

…and if you only check “Replace empty values with NULLs,” more of your rows will have nulls for this column, because anything that was blank in the default scenario will now be null.

Finally, if you check both boxes, only the empty elements will be reteived as null, because the missing elements and the formally null ones will be treated as blank.

Clear as mud, I know. 😉

Last week I worked with a customer who wanted to separate the xsi:nil=”true” elements from the ones that were “missing”. They wanted the xsi:nil=”true” to become null columns, and the ones that were truly missing to be assigned an actual string value of “MISSING.”. I’ll follow this up with an entry on using XSLT to perform subtle column assignments like this one.

-e

…test line added to validate a new RSS reader…

Advertisements
Posted in datastage, etl, xml. Tags: . 3 Comments »

3 Responses to “Handling Nulls in XML Sources”

  1. Stewart Hanna Says:

    Ernie,

    I enjoy reading your posts. I always find I learn something new, but more importantly, something relevant.

    Regards,

    Stewart

  2. Mahesh Says:

    Hello Ernie:
    Your blogs have proved to be very informative. I have used DataStage for a number of years but primarily worked with Server jobs. In my current enviroment we mostly use parallel jobs and with one of the source systems the only type of output that is available is XML. I have no experience working with XML but have been doing some snooping around to get some ideas on how to tackle XML. Would you be able to point to any primers on the topic as well as some advanced case studies you ahve come across.
    I understand that this is a lot to ask but figured it is worth trying.

    Thanks again for your blogs!

    Mahesh

    • dsrealtime Says:

      Hmm…that’s a question with a huge number of answers. XML in general has a ton of resources on the web, or in books. I think you can learn all you need to know to at least be dangerous, simply by poking around on the web, reading articles, and finding blogs and sites dedicated solely to XML, especially from a DataStage perspective. If you are intent on designing XML, and putting together XML Schema (xsd’s), that’s a whole other story. Peruse some of the leading XML tool’s web sites also, such as Altova’s for XMLSpy. They aren’t the only vendor in that space though. You probably will find 100’s if you search on “XML Design.” I’d be remiss if I didn’t also point you to http://www.w3c.org, and http://www.oasis-open.org. The specs and things there can at times be overwhelming, but you should at least know that they are there, have them as a reference when necessary. There are some very bright people in our industry who live and breathe with this stuff. On occasion, espeically when two vendors implementations disagree, it helps to know where to turn to make an interpretation. I got involved with XML around release 4 of DataStage when we decided to add some primitive capabilties to the product for reading and writing XML documents. Only a few customers ever used it back then (8 to 9 years ago). It’s been very interesting to watch the growth of xml over the years, as a vehicle for data transport and also for storage. It’s certainly lived up to its reputation for being a simpler medium for delivering data to remote heterogeneous platforms that is both machine and human readable, complete with meaningful meta data. It still has it’s detractors, of course, and well documented pros and cons, but that’s ok. I only care that we data integration professionals have to deal with it, read it, write it, decipher it and manipulate it as needed in a myriad of ways….

      -e


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: