At this point you should be familiar with Shared Tables, and understand how data lineage works within, and among, multiple DataStage Jobs. Stage to Stage lineage is very useful, and may be all that you require. It’s also powerful, though to go beyond this and connect your Jobs to the tables that you have imported from various places.
[click here for the prior post in this series, Linking your DataStage Jobs together ]
One of the key reasons for including Shared Tables in lineage is for “Business Lineage”. This is a high level summary of lineage that doesn’t illustrate the lower level transformation details…just the “key” sources and targets (files, tables, reports, etc.) along the way. Another is to connect your Jobs to “external, non Information Server assets.” DataStage doesn’t write directly to a (for example) Cognos report….it writes to a table somewhere, and that table is then read by the business intelligence tool. The connection thru that table is critical for accurate data lineage reporting. Here is how Metadata Workbench makes the connection between a DataStage Job and a table…
Shared Tables have a four (4) part name: Host/Database/Schema/Tablename (see other entries in the table of contents on Shared Tables for more details on the names of Shared Tables or alternatives for importing them).
Relational Stages in DataStage typically use two (2) parts to identify a particular table. “Server” or “Database” or “DSN” name, and tablename. The Server/DSN/Database name is usually in a dedicated property within the Stage. The Tablename might be in a dedicated property, or it could also be embedded in User Defined SQL. Any of them might be hard coded or established via Job Parameters.
The first thing Metadata Workbench needs to do is to “map” the abstract “Server/Database/DSN” name to a particular “Host” and “Database” combination in the list of Shared Tables. Like any application, DataStage is just pointing to some abstract “string” when trying to find a database. An ODBC DSN, for example, “might” be the name of the database, but it could also just be “myODBCdatabase”, which really points to (in the ODBC definition) a DB2 table called HRMAIN. Even if we use the string HRMAIN in the “Server” property of the Stage, we still need a way to identify the particular host that we are considering for data lineage. This is done via the “Database Alias” link in the “Advanced” tab of the Metadata Workbench.
Go to the Advanced Tab after signing into the Workbench. Perform Automated Services for one of your projects (this could take a long time if it’s the first time you are doing it). When it finishes, click on “Database Alias”. Look carefully at the values there. These are the “strings” that are used in your various Jobs to identify databases. Pick the string that is appropriate for the Stage Type that you are working with and slide your cursor over to the right. The “Add” button will allow you to select the desired host/database combination that this abstract string should connect to. In the example noted above, I might assign the string (alias) myODBCdatabase to the host/database combination of QR2H004/HRMAIN. QR2H0004 with Database HRMAIN must be something I have already imported and is viewable in the left navigation pane of the Workbench, or in the Repository Management tab of the Information Server Web Console.
Save it (button on the lower right).
The next time you perform Automated Services, whenever that Stage type with that particular string (myODBCdatabase) is found, Metadata Workbench will use QR2H004/HRMAIN, combined with the fully qualified tablename* in the Stage, to match to a particular Shared Table that has been imported previously.
[* Note… if your tablename is NOT fully qualified, Metadata Workbench will, for most RDBMS Stage Types, pick up and use the value of the Username in order to identify your table]
Are you using Job Parameters? For Design based lineage, Metadata Workbench is smart, and will use the “default values” of those Job Parameters when finding alias “strings” and also when obtaining the fully qualified (schema.tablename) tablenames to use in the linking.
Are you using $PROJDEF? Run the ProcessEnvVars shell or bat file inside of /IBM/InformationServer/ASBNode/bin to obtain the project definitions for use in this algorithm.
Operational Metadata (a whole separate blog entry is needed to discuss OMD) is used to populate the Job Parameter values from “run time”…otherwise the same rules apply. If you are just starting, get to know lineage w/o worrying about OMD. That’s an advanced topic. Understand how database alias works by performing Automated Services, reviewing the Database Alias page in the Workbench, doing the assignment and running Automated Services again. Once it is complete, go to the actual table using the navigator frame at the left (open the Host tree and find the table) and right mouse and select “data lineage”. If it is a source, select “where does this go to”…..if it is a target, select “where did this come from”…… and validate your results.