Best Practices and Techniques for the “New” XML Stage

Hi Everyone…

It’s been awhile since I’ve posted anything.   A certain amount of “blog” fatigue is to blame, but also because I like to post things that are (as much as possible) proven, time-honored (and not release dependent).   Many/most of the techniques I write about here are ones that I’ve spent many hours helping customers and colleagues implement in real situations.

It’s time I write about the XML Stage.  It is not-so-new anymore, but still feels new as it has had some very important xsd handling features added to it in the last few releases.   This week I will start posting suggestions and tips for using the XML Stage to read and write xml documents using DataStage.  

I’ll start with a pointer to a valuable RedBook that came out last year regarding the XML Stage.  I had the pleasure of reviewing the material as the authors put it together.  It is a great place to start when learning to work with this important Information Server capability.

XML Stage Redbook

Ernie

…and a link to the first “New” XML Stage post…

Establish Meaningful Link Names when using the XML Stage!

Lineage for RDBMS “Views”

Hi Everyone…

Someone asked me yesterday about being able to perform lineage in Metadata Workbench on a database “View”. It dawned on me that I may have never created a post on this very important subject. Formal Database views, as created in the rdbms catalog via a CREATE VIEW sql statement, are fully supported for Data Lineage purposes by Information Server and the Metadata Workbench.

The key is in the method of import.

When you import your rdbms catalog information via Connector (in 8.7 Metadata Asset Manager, or in any 8.x release via DataStage, Information Analyzer, or FastTrack), the views are imported and get their own icon for display purposes within the “Hosts” tree (or Implemented Data Resources if you are in 8.7). The details of the view are available for display, and will show the SQL used in the CREATE VIEW statement to originally create it.

More importantly, when you perform “Automated Services” (a.k.a. “stitching”, or in 8.7, “Detect Associations”), Metadata Workbench will parse thru the SQL of the CREATE VIEW and establish data lineage connections to the “source tables” of the view! Once this is done, you will have lineage for the view back to its source tables, and of course, anything that is upstream from those tables, or downstream from the view itself!

Ernie

New RedBook for XML Stage is available!

The new redbook is available for the enhanced XML capabilities introduced by the “XML Stage” in Release 8.5 in October of 2010. It represents a lot of hard work by my colleagues who work with, developed, and tested this enhanced way of processing XML content in an ETL tool. Congrats to then entire authoring team, the reviewers, and the people who made publication of the Redbook possible — and congrats to the rest of of us who now have another excellent resource for reading and writing complex XML using DataStage, QualityStage, and Information Server!

You will find this new redbook here:

http://www.redbooks.ibm.com/abstracts/sg247987.html?Open

Ernie

New YouTube Channel created for Information Server content…

Hi Everyone…

Check out this new YouTube channel…it’s first entry is a recording I put together to illustrate how to publish a DataStage or QualityStage Job as a Service…

http://www.youtube.com/channel/UCVFAoFT_zaVF_JWHGz-8d5w?feature=guide

My colleagues in product management and marketing are managing the channel and encouraging myself and others to put together all kinds of videos….demos of new and exciting features, or recordings that illustrate “how to” do something. I hope it is a resource that we all find useful going forward.

Ernie

Check out this “can’t miss” recording of metadata mgmt and governance in action!

One of our IBM partners, Compact (www.compactbi.eu and www.compactbi.com ), has published a video recording on their web site that illustrates their MetaDex solution. MetaDex, in combination with Information Server Business Glossary and Metadata Workbench, enables metadata management for additional technologies that are outside of Information Server. Parsing for independent ETL tools and complex SQL scripting are just part of what MetaDex offers.

This is an excellent video that is worth eight minutes of your time. It highlights the functionality of Business Glossary and Metadata Workbench and how they work together along with MetaDex to provide a strong solution in support of your governance objectives. Enjoy!

Ernie

The recording is specifically at: http://compactbi.eu/solutions/metadex

Actional Diagnostics…great to use an “old friend” again…

Hi All…

Just a quick note. I had the pleasure today of finally getting around to installing Actional Diagonostics. This is the latest release of what used to be called “SOAPScope”…it’s been awhile since I’ve had a need for it…but it was perfect, still providing all the great things that it did in its earlier implementations.

This is something I’ve been meaning to do for a long time, but haven’t had the chance. Mindreef was acquired by Progress Software several years ago, and what was originally “SOAPScope” has been rebranded. I don’t have any specifics as to the other Actional offerings that are connected, but I can say that the experience was excellent. The download and install went very smoothly, and I was invoking a service within minutes. The screens appear to be the same, although they have probably included new functionality that I have yet to explore.

If you need an easy to use testing tool for your services, put this one on your list. There are many of them out there….all good tools….. I happen to have always liked this one because it offers a good compromise — it will appeal to users like myself who are comfortable with xml and http protocols, yet also be easily adopted by users who don’t want to be exposed to xml and simply want an easy-to-use GUI. Especially nice in Actional Diagnostics is the ability to perform load testing, where you can easily create multiple threads (thus simulating multiple users) invoking your service in concurrent fashion.

Bravo to the team who is still supporting this.

You can find the download details at http://web.progress.com/en/actional/

Ernie

New developerWorks article on DataStage and new XML Stage!

Hi all…

My esteemed colleagues on the xml development team have published a great article on the new XML Stage in 8.5….enjoy!

devWorks article on the New XML Stage!

Ernie

Reviewing the Advanced Tab in the Metadata Workbench

Hi all…

Just thought I’d throw in a quick review of the important (imho) links at the Advanced tab…..some of these factoids are buried in my other posts, but I needed to have a cheat-sheet for myself and others. Here is is:

Automated Services. This option brings up the dialog that runs the parsing or “stitching” process for the detailed metadata you have in your DataStage Jobs or Connector-imported rdbms views. It does a lot of stuff, takes time the first time you run it (if you have a ton of metadata), and should be scheduled during off-hours. After the first run against a particular project, it uses a change recognition mechanism to only pick up Jobs that have been updated. Note the “checked” DS Projects carefully. Only select those that are really critical, and once checked, don’t “uncheck” — as you will see from the warnings, this will “remove” all parsing history. Ultimately, this step is the one that reviews the Jobs, connects them via common information found in Stages, etc. See my other posts for how the connection of Jobs to each other is determined.

Stage Binding. When all else fails, you can connect two stages to each other. Use this when, for some reason, two Jobs won’t connect, or when the rules for connecting them can’t be met. I’ve needed this with some custom Stage or Operator implementations, and when I am using a technique that prevents automatic connection. Imagine having a Sequential Stage at the end of a Job that is writing out some xml content — and then I’m using the XML Stage in the next Job to read that content. There isn’t much in common between those Jobs, but I still want lineage to run directly thru them…

Data Item Binding. This provides a “manual” binding of particular Stages to Database Tables and Data Files (see other posts for what those are, how they are created, and how they are different from “DataStage Table Definitions”). Use this when you are unable to get Database Alias to work as you expect and you simply want to “bolt” a particular Database Table or Data File to a Stage in one of your Jobs to complete the lineage picture.

Data Source Identity. Use this when, for whatever reason, you want to link two identical tables for lineage purposes. Reasons? Two people might have imported the same metadata accidentally and you don’t want to delete it….or you might have the “design” information from an ERwin model and also have the “actual” table information from the rdbms catalog. There are many valid reasons. This link let’s you relate tables together. They must have the same name — the option here lets you relate the “Schemas” of two different databases. Identical tables within those schemas will become linked for lineage reporting — and therefore, also linked to whatever those individual tables connect to for lineage.

Database Alias. This option establishes the connection between an abstract string in a DataStage Stage (Server name, DSN name, etc., as defined by the relational stage) and the “Host/Database” combination that was actually imported. Database Tables in Metadata Workbench are typically “actual” tables — but in DataStage, like any well designed application, the “name” is a placeholder. This assigns the “placeholder” to the host and database. The schema.tablename used in the Stage will then be matched against the Host/Database set of Tables to create a lineage connection. The list presented at this option will be entirely empty until you perform Automated Services. Then it will be populated with each StageType and “server string” combination that it finds in your Jobs.

Hope this helps understand these options.

Ernie

Follow

Get every new post delivered to your Inbox.

Join 52 other followers