This entry is one of many in a series that describes the InfoSphere Open IGC API, which allows you to define your own objects for information governance using InfoSphere Information Server and the Information Governance Catalog.
Previous post in this series:
Defining Lineage Flows (Part 2)
Original post in this series:
Open IGC is here!
In the previous post, we reviewed how you define a formal lineage “flow” — first by defining the “inventory” of assets that will be established as sources and targets in your exact flow specification, and then the flowUnit xml node that explicitly states what will be the “source” and “target” for each point to point connection.
Assets, as we reviewed, might be Data Files, Database Columns, objects from other Bundles — anything that is able to participate in a lineage report within the Information Governance Catalog. We looked at the how you define the hierarchy, identifying (for example), the Host, Database, Schema, table and specific column name for a Database Column that has been formally imported into the repository (at an earlier time, via Metadata Asset Manager or other mechanism).
(as a reminder, here is the hierarchy that identifies a Database Column to be used in a flowUnit)
(and here is a flow unit that includes that column)
But what happens if you haven’t yet imported that Database Table and its columns? What if this is a temporary table, with a dynamic, time-stamp generated name, and you don’t care about ever formally importing it into the repository for governance purposes? What if you simply made a typo in your code, picked up the wrong name from somewhere in your program, or were given mis-information by the tool whose lineage you are recording? Open IGC supports the idea of a “Virtual Asset”. This allows you to define the objects that will be seen in a lineage report as a source or a target, but without any concern about whether they actually exist in the repository. These assets appear in the lineage diagram, but will be slightly greyed out, to indicate their status as a “Virtual Asset”.
In the first screen shot above, Database Column mycol1 doesn’t really exist. I have never imported it. It is used for illustration purposes, but could also easily be a column in a temporary table that only exists for a given run of the application. Note that it still appears in the lineage report, but with a slightly greyed out appearance. All the details from the definition above will appear in the report…the “red” box in the screen shot below (the top source icon on the left) identifies the “Virtual Asset”:
This Virtual Asset is viewed here in lineage, and you can even click on it directly and go to its detail page. However, it is considered “non governable”. This means that you can’t assign Terms or Stewards to it, or use it in Collection, or anything else related to governance. It is a tool to assist you in enabling lineage, providing additional insight where needed regarding the flow of your data. If it is truly an important asset, then it makes sense to formally import it and give it a full definition in the flow xml.
If an asset is found (using its name based object identity) in the repository, then it appears in a clear font and fully colored icon, without being greyed out. The green box (the bottom icon on the left in the lineage picture) identifies a “real” asset. This is an asset that truly exists in the repository that was imported earlier by formal means, and is fully governable (searchable, can be assigned Terms and Stewards, etc.).
Virtual Assets can be created for native IGC objects or for assets that you have created with your bundles. They are a powerful mechanism for illustrating lineage quickly and simply, without worrying about whether metadata has been formally imported or defined elsewhere. Later on, if metadata is imported and matches your flow XML, the Virtual Asset will become “real” in each lineage report.
Virtual Assets allow you to illustrate objects in lineage that don’t require governance, but need to be shown so that users fully understand the big picture for your overall data flows. They enable you to more quickly get your lineage solutions up and running for all IGC users.