Last night I was reminded about a series of blog entries I’ve wanted to make concerning the InfoSphere Metadata Workbench and how to get the most out of its Data Lineage capabilities. The Workbench is very powerful — it illustrates relationships between processes, business concepts, people, databases, columns, data files, and much, much more. Combined with Business Glossary, it gives you reporting capabilities for the casual business user as well as the (often) more technical dedicated metadata researcher.
I’ve had a variety of entries about Workbench in the past two years (see the table of contents link in the top right, and find the metadata section), but nothing on “getting started”. As Metadata Workbench starts to support more and more objects, knowing certain skills and techniques becomes that much more important. This is especially true when trying to gain the most from Metadata Workbench when it is being used to illustrate Business Terms, Stewards, FastTrack Mappings, DataStage Jobs, Tables and Files, External ETL Tools, scripts and processes, operational metadata data and a vast list of other data integration artifacts.
Many of you who start with Metadata Workbench begin with DataStage/QualityStage Jobs.
[Note ..this particular post applies mostly to pre-8.7 Metadata Workbench, but it is still worth reading initially, even if you are on a newer release of the workbench...the remaining posts in the series discuss the critical things needed for linking jobs together and linking jobs to their source and target table or file objects]
So I will start there.
Once you have mastered lineage with DataStage, and its combination with other objects, you can then easily move on to other concepts for non-DataStage metadata, which I will also cover in this series of blog entries. If you are using Metadata Workbench and are not a DataStage user, stay tuned. As we progress I will take a tour through Extensions, Extension Mappings, Extended Data Sources and all other such concepts.
MAKE SURE YOU START learning about lineage and Metadata Workbench using a small number of Jobs. NO MORE THAN 10 – 15. Any more and the results will be overwhelming or confusing, and will prevent you from understanding some very important and critical rules. Find 10 or so Jobs, probably in one application, and in one folder or related folders, that have things in common. Jobs that share datasets or are part of an overall flow. Get to know these Jobs before ever opening the Workbench to review them. Eventually you will be comfortable using the Workbench to review metadata on Jobs you have never seen — but that’s a poor way to learn the power of the tooling.
From those Jobs, pick a reasonably complex one. Maybe one with a lookup or a Join, a reasonable sequence of Stages (8 to 10 or so) and preferably a single “major” target. Since you are learning about the Workbench, you should be familiar, even intimate, with this Job. That will help as you learn the various ways to navigate through the user interface, because you will know what to expect at each particular dialog, report or screen.
[this first "getting started" assumes that you have NEVER performed Automated Services against your DataStage Project....if you have, it's ok, but you might not get the same results as I am outlining below -- you may get more metadata than I am describing in this initial learning step. ...and if you don't know what I'm talking about (yet), that's ok too...]
Log into the Metadata Workbench and notice the “Engine” pull down at the left. This is the list of your DataStage Servers and their Projects. Open up the project, it’s folders, and find your Job. Click directly on it. Scroll up and down in the detailed page that appears. there is the main page with the picture of the Job (click on it and you will get an expanded view in a new window of what the Job looks like). The metadata you are viewing is up-to-date from the last moment you or a developer saved the Job in the DS Designer. Also there is a very important listing of the Stage types in the Job, along with their icon. Note below you have many “expandable” sections for things like Job Operational metadata…..investigate the options.
Now click on the “main” target Stage of this Job. This brings you to a similar looking detail page, this one for the “Stage.” Look around, but don’t click anything — when you are ready, select “Data Lineage” at the upper right. As you do so, consider “where you are standing” (you are on a “Stage”) and what sort of lineage you would like to see. As you will discover, knowing “where you are” when you start your lineage is very important.
[If you are using 8.7 or higher, at this point you should soon see a single graphic, probably just one big icon in the middle of the page. This is the lineage for THIS Job --- all by itself (unless you've already done some other work in lineage, in which case you will get other things linked to it). Look around and then click on the "Expand" link that is on the Job itself. This brings up a detailed page "for that Job". Look around... "grab" an empty part of the screen with your left mouse button and move the picture around; zoom up and down (there is a little bar at the top left for this). Click on the other buttons that show you both a "mini" edition of your lineage as well as a key for the kinds of lineage that are displayed. Then move to the next post in this series (Linking Jobs) ]
The default option at the next dialog is “Where did this come from”. Ignore the three checked boxes for now and click “Create Report”. This will comb through ALL the possible resources for “where” data for the “stage you started on” came from. Look thru the list. Note also the highlighted line. Move it up and down. This highlight bar lets you select EXACTLY which resource you’d like to see for your actual report. The “total” collection of lineage resources is in front of you right now — you will select which one you want for a detailed source-to-target report. This is often a point of confusion because the highlight bar is not always obvious. Data lineage doesn’t show you “ALL” the sources — just the path to/from the ones that you select [we'll contrast this in a later entry with Business Lineage, which DOES provide a summary of ALL sources or ALL target from a particular resource].
Look at the bottom of the page. Find the button labeled “Display Final Assets”. Click it. The list of objects above should get much smaller. Most likely, it should just show the source stage for this Job, or maybe its ultimate source as well as a lookup source stage or a source for a Join. Pick the primary source stage for the Job and then click “Show Textual” Report.
Review the result. The textual report isn’t as pretty, but it tends to be more scalable. Scroll up and down, and note what you see on the left, and the Job details you see on the right. Everything is hyperlinked. Now find the little triangle towards the top left of this center pane where your report is (it’s called Report Selection or similar) and click on it. That should expose again the “assets” page. Now you can try “Show Graphical”. When you get there, play with it. Grab some white space around the diagram and move the whole thing around…..try the zoom bar in the upper left. Click on the various icons in the lineage and then right mouse on one of the stages and find “open details in new window”. That will bring you back to a detailed viewing page and the process starts again.
What happens if you choose the target stage of your original Job (the first stage you selected earlier) and ask for “Data Lineage” and select “Where does this go to”? If you haven’t done Automated Services as I’ve noted above, you should likely receive “No assets found” or “No data for the report”. This is because it’s the “final” target — there isn’t anything else. “Where did this come from” will yield a similar result if you happen to be “sitting” on a source when you start your lineage exercise.
If you practice this, you should become very familiar with the lineage report user interface, and will have a strong base for moving forward with more complex, and deeper, scenarios.
Next entry: Linking Jobs together……
(link to next post in this series: Linking Jobs )