Connecting Shared Tables to your DataStage/QualityStage Jobs

At this point you should be familiar with Shared Tables, and understand how data lineage works within, and among, multiple DataStage Jobs. Stage to Stage lineage is very useful, and may be all that you require. It’s also powerful, though to go beyond this and connect your Jobs to the tables that you have imported from various places.

[click here for the prior post in this series, Linking your DataStage Jobs together ]

Why?

One of the key reasons for including Shared Tables in lineage is for “Business Lineage”. This is a high level summary of lineage that doesn’t illustrate the lower level transformation details…just the “key” sources and targets (files, tables, reports, etc.) along the way. Another is to connect your Jobs to “external, non Information Server assets.” DataStage doesn’t write directly to a (for example) Cognos report….it writes to a table somewhere, and that table is then read by the business intelligence tool. The connection thru that table is critical for accurate data lineage reporting. Here is how Metadata Workbench makes the connection between a DataStage Job and a table…

Shared Tables have a four (4) part name: Host/Database/Schema/Tablename (see other entries in the table of contents on Shared Tables for more details on the names of Shared Tables or alternatives for importing them).

Relational Stages in DataStage typically use two (2) parts to identify a particular table. “Server” or “Database” or “DSN” name, and tablename. The Server/DSN/Database name is usually in a dedicated property within the Stage. The Tablename might be in a dedicated property, or it could also be embedded in User Defined SQL. Any of them might be hard coded or established via Job Parameters.

The first thing Metadata Workbench needs to do is to “map” the abstract “Server/Database/DSN” name to a particular “Host” and “Database” combination in the list of Shared Tables. Like any application, DataStage is just pointing to some abstract “string” when trying to find a database. An ODBC DSN, for example, “might” be the name of the database, but it could also just be “myODBCdatabase”, which really points to (in the ODBC definition) a DB2 table called HRMAIN. Even if we use the string HRMAIN in the “Server” property of the Stage, we still need a way to identify the particular host that we are considering for data lineage. This is done via the “Database Alias” link in the “Advanced” tab of the Metadata Workbench.

Go to the Advanced Tab after signing into the Workbench. Perform Automated Services for one of your projects (this could take a long time if it’s the first time you are doing it). When it finishes, click on “Database Alias”. Look carefully at the values there. These are the “strings” that are used in your various Jobs to identify databases. Pick the string that is appropriate for the Stage Type that you are working with and slide your cursor over to the right. The “Add” button will allow you to select the desired host/database combination that this abstract string should connect to. In the example noted above, I might assign the string (alias) myODBCdatabase to the host/database combination of QR2H004/HRMAIN. QR2H0004 with Database HRMAIN must be something I have already imported and is viewable in the left navigation pane of the Workbench, or in the Repository Management tab of the Information Server Web Console.

Save it (button on the lower right).

The next time you perform Automated Services, whenever that Stage type with that particular string (myODBCdatabase) is found, Metadata Workbench will use QR2H004/HRMAIN, combined with the fully qualified tablename* in the Stage, to match to a particular Shared Table that has been imported previously.

[* Note… if your tablename is NOT fully qualified, Metadata Workbench will, for most RDBMS Stage Types, pick up and use the value of the Username in order to identify your table]

Are you using Job Parameters? For Design based lineage, Metadata Workbench is smart, and will use the “default values” of those Job Parameters when finding alias “strings” and also when obtaining the fully qualified (schema.tablename) tablenames to use in the linking.

Are you using $PROJDEF? Run the ProcessEnvVars shell or bat file inside of /IBM/InformationServer/ASBNode/bin to obtain the project definitions for use in this algorithm.

Operational Metadata (a whole separate blog entry is needed to discuss OMD) is used to populate the Job Parameter values from “run time”…otherwise the same rules apply. If you are just starting, get to know lineage w/o worrying about OMD. That’s an advanced topic. Understand how database alias works by performing Automated Services, reviewing the Database Alias page in the Workbench, doing the assignment and running Automated Services again. Once it is complete, go to the actual table using the navigator frame at the left (open the Host tree and find the table) and right mouse and select “data lineage”. If it is a source, select “where does this go to”…..if it is a target, select “where did this come from”…… and validate your results.

Ernie

Advertisements

4 Responses to “Connecting Shared Tables to your DataStage/QualityStage Jobs”

  1. sega Says:

    Hi Ernie,
    Is Operational metadata automatically included in the metadata workbench once you tick the “Generate Operational Metadata”? I asked this because when I tried running a data lineage and tick Operational Data and no result was found ,same as user-defined data but when Design data was chosen, it has result.

    • dsrealtime Says:

      Not automatically, no. It creates xml files that are at the server, and you have to then load those into the repository. Details are in the documentation; let me know if you run into issues configuring it.

      Ernie

  2. uma Says:

    Erin,

    How do i pull SAP Tables metadata into Metadataworkbench ?

    • dsrealtime Says:

      The easiest way is via DataStage, using the various SAP Packs for things like ABAB, IDOCS or BAPIs…and then using the feature in DataStage to convert the resulting DataStage Table Definitions to Tables in Metadata Workbench (right mouse on the tabledef in DataStage and select “Shared Table Creation Wizard”). For IDOCS, the separate interface that comes with the SAP Pack allows you to directly load the IDOCS into the repository — then they show up in Metadata Workbench as their own model type, with their own icons, etc. Lastly, if you don’t have DataStage, and you are using Information Server 9.1, you can use the istool workbench -generate command and create your own tables of any kind by creating and loading .csv files.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: