Always use an Input Link AND an Output Link with the XML Stage

Another way to say this is to “avoid using the XML Stage to perform i/o.” The XML Stage is capable of reading an xml document directly (a feature in the Parser Step) and is also able to write to a new document on disk in the Composer Step. However, while it may seem simpler to do that initially, it makes your Jobs and Stage designs less flexible and less re-usable. You should have an Input link that feeds XML to the XML Stage when you are “reading” or parsing xml (and of course you will have output links that send the results downstream), and you should have an Ouput link that sends your completed XML document(s) downstream when you are writing XML (and of course you will have input links that feed in the source data).

Let’s see why.

When you are first learning the XML Stage, it seems convenient to just “put in the name of the xml document” and keep going. The Parser Step allows you to specify the filename directly (or it can be parameterized), and then you continue with the assignment of the Document Root. Similarly, when creating a new XML document, the Composer Step allows you specify the actual document to be written to disk.

Then someone comes along and says “Our application is changing. The xml documents we currently read from disk will now be coming from MQ Series…” …or maybe “…from a relational table” …or “from hadoop”…. Well, you can’t just “change the Stage type at the end of the link” in that case. You have to “add” the link, and then make what could potentially be extensive changes to your Assembly. While not especially difficult once you are familiar with the Stage, if you have moved on to other projects, or have been promoted and are no longer supporting the Job, a less experienced DataStage developer will be challenged.

So…when using the Parser Step, use one of the options that describes your incoming content as either coming in directly as content (from a column in an upstream Stage), or as a set of filenames (best use case when reading xml documents from disk, especially when you have a whole lot of them in a single sub-directory [see also Reading XML Content as a Source ] )


The same thing is true for writing XML. Send your xml content downstream — whether you write it to a sequential file, or to DB2, or to MQ Series or some other target, the logic and coding of your XML Stage remains the same! In the Composer Step, choose the “Pass as String” option and then in the Output Step, map the “composer result” to a single large column (I like to call mine “xmlContent”) that has a longvarchar datatype and some abitrary long length like 99999. While there may be times when this can’t be easily done, or when you need to use the option for long binary strings (Pass as Large Object), for many/most use cases, this will work great.


Get in the habit of always using Input and Output Links with the XML Stage. Your future maintenance and changes/adaptions will be cleaner, and you can take better advantage of features such as Shared Containers for your xml transformation logic.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: