Creating Data File objects from inside of DataStage

A seldom used object in Metadata Workbench is a “Data File”. It is not as common because it has to be manually created. Database Tables are created whenever you use a Connector or other bridge to import relational tables from a database. Data Files, however, can only be created manually, using the istool workbench generate feature, or from inside of the DataStage/QualityStage Designer.

Why create Data Files?

A Data File is the object available in the Metadata Workbench that represents flat files, .csv files or DataSets. It is able to connect to the Sequential Stage or Dataset Stage for data lineage purposes. A Data File object might be used also for pure governance reasons — a special transaction file might be defined by a particular Business Term, or you might want to assign a Steward to the Data File object — the subject matter expert on one particular file. Of course, if you are a DataStage user, you probably use regular Sequential Table Definitions all the time. Data Files are similar but are more “fixed” — they are designed to represent a specfic flat file, on a given machine, and in a particular sub-directory, as opposed to being a general metadata mapping with proper column offsets for any file that matches the selected schema.

The simplest way to create a formal Data File is to start with a DataStage Table Definition. You may already have one that was created when you imported a sequential file, or can easily create one using the “Save” button on any column list within most Stages. Once you have the Table Definition, double click on it. Review all of the tabs across the top. Pay special attention to the “Locator” Tab. Click on that one. Look at its detail properties. Values at the Locator tab control the creation of Data Files or Database Tables.

Set the pull-down option at the top to “Sequential”. If that value is not already in your pull-down list, type it in… Towards the bottom you will see an entry for the Data Collection — put in the name you want for your file. Close the Table Definition.

Now put your cursor on that Table Definition in the “tree”. Right mouse and select “Shared Table Creation Wizard”. When that dialog opens, click Next. Then open the pull-down dialog and select “create new”, and click Next. Notice the properties at this new page….you have the Filename, the Host (pick a machine or enter a new one) and Path. Make the filename the SAME as what you have hard coded in your Sequential or Dataset Stage, or the filename of any fully expanded Job Parameter default values that you are passing into it. Then set the “Path” value to the fully qualified path of the expanded Job Paramters or what you have in the same filename property. For example, if your filename in the Stage looks like this:

/tmp/myfile/#myfilename# …and #myfilename# has a default value of mySequentialFile.txt

Then use mySequentialFile.txt as the Filename and /tmp/myfile (without the final slash) for the path. Now you will have a Data File inside of Metadata Workbench that you can govern with Steward and Term assignments, and it also will stitch to the Stages that use its name in hard coded fashion or expanded Job Parameters for Design time or Operational lineage.



4 Responses to “Creating Data File objects from inside of DataStage”

  1. Krishna Says:

    Every time we run a DataStage job, the input file name is different as YYYYMMDD and location code is part of the file name. How can we handle this in lineage based on Design & OMD?
    Sometimes the directory path of the file also changes. We handle all these thru parameters. We are on 11.3.


    • dsrealtime Says:

      This is one of the great things about 11.3 and higher. Operational Metadata will always and automatically show the target files, with their runtime names, as slightly greyed out virtual assets. Some sites have hundreds of these every day, with suffixes on their filenames that go to the minute…no need to import those via Metadata Asset Manager. They are just “there”, or can be filtered out if you un-select Operational Metadata. For governance purposes, create “one” edition of your sequential file in the repository (simplest way is to right click on a Sequential Table Definition in DataStage and choose “Shared Table Creation Wizard” and walk thru that dialog, after making sure that the top level pull down on the Locator tab for your table says “Sequential” [you can type in Sequential if it doesn’t appear in the pulldown]) that doesn’t have a unique date suffix — perhaps something like MyOutputFileDateStamp.txt, and set the DEFAULT value of your Job Parameters to THAT filename and appropriate path. Now your design lineage will illustrate the governable “model” file, and OMD will display all the actual files that were written.

      • Krishna Says:

        Thanks Ernie for your quick response!

        I have a couple of follow up questions/comments –
        Is there no way to select Operational Metadata and still make IGC use “MyOutputFileDateStamp.txt” as one file in lineage diagrams (instead of hundreds and thousands of files)? I thought about renaming each file before loading, but we can’t use the pattern matching on sequential stage in that case.
        Why I am asking this – when someone is looking for the lineage on a target table that’s populated by a series of DataStage jobs (using tables and flat files as sources), they might see hundreds of files as source.

        The other reason is we would like to see the record counts in IGC by using OMD. So far, we are able to see only the start and end times of job runs and not record counts (we can’t see the buttons/options we were seeing in Metadata Workbench). Still working with IBM on this.


      • dsrealtime Says:

        Operational metadata in IGC is there for complete lineage purposes. It doesn’t contain detailed row counts — one idea is to steer your users towards the Operations Console for detailed Job runtime info. As for the lineage question, its not exactly clear what you are looking for. It sounds like you are wishing that for OMD, you would only see one “template” file? If that’s the case, I would probably change the default values so that they illustrate the “concept” for this time-stamp based target, and then educate the users to “uncheck” the operational metadata selection box. -ernie

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: