Last night I was reminded about a series of blog entries I’ve wanted to make concerning the InfoSphere Metadata Workbench and how to get the most out of its Data Lineage capabilities. The Workbench is very powerful — it illustrates relationships between processes, business concepts, people, databases, columns, data files, and much, much more. Combined with Business Glossary, it gives you reporting capabilities for the casual business user as well as the (often) more technical dedicated metadata researcher.
I’ve had a variety of entries about Workbench in the past two years (see the table of contents link in the top right, and find the metadata section), but nothing on “getting started”. As Metadata Workbench starts to support more and more objects, knowing certain skills and techniques becomes that much more important. This is especially true when trying to gain the most from Metadata Workbench when it is being used to illustrate Business Terms, Stewards, FastTrack Mappings, DataStage Jobs, Tables and Files, External ETL Tools, scripts and processes, operational metadata data and a vast list of other data integration artifacts.
Many of you who start with Metadata Workbench begin with DataStage/QualityStage Jobs.
So I will start there.
Once you have mastered lineage with DataStage, and its combination with other objects, you can then easily move on to other concepts for non-DataStage metadata, which I will also cover in this series of blog entries. If you are using Metadata Workbench and are not a DataStage user, stay tuned. As we progress I will take a tour through Extensions, Extension Mappings, Extended Data Sources and all other such concepts.
Start with your favorite, reasonably complex DataStage Job. Maybe one with a lookup or a Join, a reasonable sequence of Stages (8 to 10 or so) and preferably a single “major” target. Since you are learning about the Workbench, you should be familiar, even intimate, with this Job. That will help as you learn the various ways to navigate through the user interface, because you will know what to expect at each particular dialog, report or screen.
[this first "getting started" assumes that you have NEVER performed Automated Services against your DataStage Project....if you have, it's ok, but you might not get the same results as I am outlining below -- you may get more metadata than I am describing in this initial learning step. ...and if you don't know what I'm talking about (yet), that's ok too...]
Log into the Metadata Workbench and notice the “Engine” pull down at the left. This is the list of your DataStage Servers and their Projects. Open up the project, it’s folders, and find your Job. Click directly on it. Scroll up and down in the detailed page that appears. there is the main page with the picture of the Job (click on it and you will get an expanded view in a new window of what the Job looks like). The metadata you are viewing is up-to-date from the last moment you or a developer saved the Job in the DS Designer. Also there is a very important listing of the Stage types in the Job, along with their icon. Note below you have many “expandable” sections for things like Job Operational metadata…..investigate the options.
Now click on the “main” target Stage of this Job. This brings you to a similar looking detail page, this one for the “Stage.” Look around, but don’t click anything — when you are ready, select “Data Lineage” at the upper right. As you do so, consider “where you are standing” (you are on a “Stage”) and what sort of lineage you would like to see. As you will discover, knowing “where you are” when you start your lineage is very important.
The default option at the next dialog is “Where did this come from”. Ignore the three checked boxes for now and click “Create Report”. This will comb through ALL the possible resources for “where” data for the “stage you started on” came from. Look thru the list. Note also the highlighted line. Move it up and down. This highlight bar lets you select EXACTLY which resource you’d like to see for your actual report. The “total” collection of lineage resources is in front of you right now — you will select which one you want for a detailed source-to-target report. This is often a point of confusion because the highlight bar is not always obvious. Data lineage doesn’t show you “ALL” the sources — just the path to/from the ones that you select [we'll contrast this in a later entry with Business Lineage, which DOES provide a summary of ALL sources or ALL target from a particular resource].
Look at the bottom of the page. Find the button labeled “Display Final Assets”. Click it. The list of objects above should get much smaller. Most likely, it should just show the source stage for this Job, or maybe its ultimate source as well as a lookup source stage or a source for a Join. Pick the primary source stage for the Job and then click “Show Textual” Report.
Review the result. The textual report isn’t as pretty, but it tends to be more scalable. Scroll up and down, and note what you see on the left, and the Job details you see on the right. Everything is hyperlinked. Now find the little triangle towards the top left of this center pane where your report is (it’s called Report Selection or similar) and click on it. That should expose again the “assets” page. Now you can try “Show Graphical”. When you get there, play with it. Grab some white space around the diagram and move the whole thing around…..try the zoom bar in the upper left. Click on the various icons in the lineage and then right mouse on one of the stages and find “open details in new window”. That will bring you back to a detailed viewing page and the process starts again.
What happens if you choose the target stage of your original Job (the first stage you selected earlier) and ask for “Data Lineage” and select “Where does this go to”? If you haven’t done Automated Services as I’ve noted above, you should likely receive “No assets found” or “No data for the report”. This is because it’s the “final” target — there isn’t anything else. “Where did this come from” will yield a similar result if you happen to be “sitting” on a source when you start your lineage exercise.
If you practice this, you should become very familiar with the lineage report user interface, and will have a strong base for moving forward with more complex, and deeper, scenarios.
Next entry: Linking Jobs together……
(link to next post in this series: Linking Jobs )
Ernie