Incorporating Java classes into your DataStage Jobs

Java comes up a lot when we talk about “real time.”   Not that Java in particular has any special dibbs on the term, but frequently when a site is interested in things like Service Oriented Architecture (SOA), Web Services, messaging, and XML, they are often also interested in Java, J2EE, Application Servers and other things related to Sun’s language standard. 

Integrating Java with your ETL processing becomes the next logical discussion, whether “real time” even applies.   There may be some functionality, some existing algorithms worth re-using, some remote java-oriented or java managed system or message queue that contains valuable source data (or would be a valuable target), that you’d like to integrate into a data integration flow.   DataStage can easily be extended to include your Java functionality or take advantage of your Java experience.

There are two Stages that used to be referred to as JavaPack that are included with DataStage:  JavaClient and JavaTransformer.   Both allow you to integrate the functionality of a java class into the flow of a DataStage Job.   JavaClient is used for a sources or targets (only an output link or only an input link), and the JavaTransformer is used for row-by-row processing where you have something you’d like to invoke for each row that passes through.

DataStage provides a simple API for including java classes into your Jobs.  This API allows your class to directly interact with the DataStage engine at run-time — to obtain meta data about the columns and links that exist in the current executing job, and to read and write rows from and to those links when called upon to do so.   You define several special methods in your class, such as Process(), that the engine calls whenever it needs a row, or is giving your class control because it’s ready to give you a row.  Within that method you have various calls to make, such as readRow [from an input link] and writeRow [to an output link].    You can control what comes in and goes out, and also process rejections based on logic in your class.  Other than that, your class can do whatever it wants……read messages from JMS queues, invoke remote EJBs….whatever.  

The JavaPack is very well documented, with examples and descriptions of all the API calls.    However, I’ve included an additional example here for anyone who is interested, including java class, source, .dsx and usage notes.    Have fun!

-ernie

btw…I haven’t exactly figured out yet how to best get the names of the files below represented here on this blog, but if you save them from here, each file except the Readme begins with “ExamineRows” and should be ExamineRows.dsx (for the export), ExamineRows.java (for the Source) and ExamineRows.class for the actual compiled class.   I haven’t had a chance to re-try it after downloading from here, so worst case, you’ll need to recompile the class yourself in your environment.  Otherwise, it should run in v8 “as is”.  See the file at the Readme link for details on the expected classpath in the Job, etc., and read the annotations in the Job itself after you import it.  -e

Examine Rows Class, Examine Rows Java Source, Examine Rows Readme, Examine Rows DataStage Export

XML with embedded XPath in the content…

Wow.  I haven’t written in awhile.    Been off in the land of metadata for the past few months, or heads down on some other projects.  No excuse — some juicy real-time/xml/java issues have been coming up lately and are piled here on my desk to be entered here.   I also have to finally figure out how to do attachments so that I can share some DataStage Jobs and other stuff.

 Ran into an interesting one today that I haven’t seen before, and I see a lot of XML issues.   Guess there’s always something new around the corner, and in reality I’ve only scratched the surface of what XML has to offer….and unless my teammates at IBM and/or our customers run into something, I’m not likely to see it.

 Anyway, this one is pretty cool.   I was given an xml document whose content contains references that point to “other” xml content in the same document.   In this case, it is a node with an attibute containing XPath pointing to another node somewhere (presumably above) in the document.   This attribute is present if-and-only-if all the other sub-elements of the node are missing.  If so, the processing needs to locate the other node and retrieve the values found there in order to populate all the values of the node containing this “reference” attribute.  A sort of “recursive” lookup into the same xml document.

I haven’t entirely worked out the solution using DataStage yet, although the strategy is fairly clear.   First retrieve the node, check for the reference attribute, and if present, pull out the XPath.   This XPath then needs to be used further on in the flow to pull some “other” content from the same XML document.  Compare for nulls and then take the populated content before moving forward…  A bit tedious, but do-able, since DataStage lets me throw XPath around and dynamically create and use Stylesheets.

Anyone else encountered XML like this with internal references?

Ernie

Follow

Get every new post delivered to your Inbox.