Hi all…
Just thought I’d throw in a quick review of the important (imho) links at the Advanced tab…..some of these factoids are buried in my other posts, but I needed to have a cheat-sheet for myself and others. Here is is:
Automated Services. This option brings up the dialog that runs the parsing or “stitching” process for the detailed metadata you have in your DataStage Jobs or Connector-imported rdbms views. It does a lot of stuff, takes time the first time you run it (if you have a ton of metadata), and should be scheduled during off-hours. After the first run against a particular project, it uses a change recognition mechanism to only pick up Jobs that have been updated. Note the “checked” DS Projects carefully. Only select those that are really critical, and once checked, don’t “uncheck” — as you will see from the warnings, this will “remove” all parsing history. Ultimately, this step is the one that reviews the Jobs, connects them via common information found in Stages, etc. See my other posts for how the connection of Jobs to each other is determined.
Stage Binding. When all else fails, you can connect two stages to each other. Use this when, for some reason, two Jobs won’t connect, or when the rules for connecting them can’t be met. I’ve needed this with some custom Stage or Operator implementations, and when I am using a technique that prevents automatic connection. Imagine having a Sequential Stage at the end of a Job that is writing out some xml content — and then I’m using the XML Stage in the next Job to read that content. There isn’t much in common between those Jobs, but I still want lineage to run directly thru them…
Data Item Binding. This provides a “manual” binding of particular Stages to Database Tables and Data Files (see other posts for what those are, how they are created, and how they are different from “DataStage Table Definitions”). Use this when you are unable to get Database Alias to work as you expect and you simply want to “bolt” a particular Database Table or Data File to a Stage in one of your Jobs to complete the lineage picture.
Data Source Identity. Use this when, for whatever reason, you want to link two identical tables for lineage purposes. Reasons? Two people might have imported the same metadata accidentally and you don’t want to delete it….or you might have the “design” information from an ERwin model and also have the “actual” table information from the rdbms catalog. There are many valid reasons. This link let’s you relate tables together. They must have the same name — the option here lets you relate the “Schemas” of two different databases. Identical tables within those schemas will become linked for lineage reporting — and therefore, also linked to whatever those individual tables connect to for lineage.
Database Alias. This option establishes the connection between an abstract string in a DataStage Stage (Server name, DSN name, etc., as defined by the relational stage) and the “Host/Database” combination that was actually imported. Database Tables in Metadata Workbench are typically “actual” tables — but in DataStage, like any well designed application, the “name” is a placeholder. This assigns the “placeholder” to the host and database. The schema.tablename used in the Stage will then be matched against the Host/Database set of Tables to create a lineage connection. The list presented at this option will be entirely empty until you perform Automated Services. Then it will be populated with each StageType and “server string” combination that it finds in your Jobs.
Hope this helps understand these options.
Ernie

