Lately I have been working with a lot of sites who are interested in “Extensions”. Extensions are simple ways of defining new objects within Information Server, and/or tying them together for data lineage purposes.
Extensions come in two different flavors. There are Extended Data Sources, which are the equivalent of defining your own tables, columns, files, or other “things” that you want to appear as individual “icons” in your lineage diagrams. The others are called Extension Mapping Documents, which are the specifications that define sources and targets (along with other useful metadata properties) and describe the “lineage” that will be drawn by the Metadata Workbench when performing any type of lineage reporting.
Why create them? Doesn’t Information Server allow imports of tables, columns and files, and other artifacts in our environments? Doesn’t DataStage provide me with data lineage, describing complex flows of data?
The answer to that question depends largely on what you are trying to accomplish with your Information Governance objectives. If you are only narrowly concerned about the DataStage Jobs in your application, and the datamarts that they flow to, there may not be a need for Extensions. However, many of you are expanding your horizons beyond just DataStage, and looking at all of the other elements of your enterprise that need tracking, management, oversight, and governance. Such sites are looking to include in their lineage ALL of their objects — not just the tables and columns defined in their relational databases, but also the legacy objects, the message queues, the green screens, the CICS transactions or even the illustration of “people”, so that Tweets and other social media feeds can be shown as the “source” in a lineage diagram that ends up in Hadoop! Those same sites also need to outline the processes that move and transform data, whether they are DataStage, another ETL tool, shells, FTP scripts, java or other 3gl programs.
Every one of those objects may be important to lineage, especially when there is a need to provide detailed source information to upper management. Equally, those objects also demand governance — such as being assigned Stewards, becoming associated with business concepts and Terms, or shown as “Implementing” a particular data quality “Policy” or “Rule”. Further, such objects benefit by being categorized, labeled, or otherwise organized into Collections that make them more useful to everyone who is in need of further definition and deeper understanding. Anyone who “touches” a piece of data, whether it is for development, evaluating a report, or making a crucial decision will benefit by the addition of Extensions.
Several years ago I talked about Extensions as a way of defining an external Web Service (). This is just one example of a flow, outside of normal ETL, that has value in being tracked and managed. I have worked with many customers who have defined other ETL tools for lineage, with or without DataStage. Always the goal is to provide more insight to decision makers who need to know where things come from, how they were calculated, who the experts are (and more).
Building Extensions requires first thinking far outside the box — and looking at “all” the metadata that is important to your data integration efforts. What is the metadata that will be meaningful to those business users? Certainly also, there is the need for impact analysis and providing value to your developers who want to answer questions such as “which processes use this table?” Which processes will be affected if we make changes to this MQ Series queue definition?
These are some of the key reasons “why” people are creating Extensions. There is a lot of “built-in” metadata that exists within Information Server. However, you can extract even MORE value from your Information Server investment by adding new objects and new capabilities to the collection of metadata that you are already successfully managing.
Next post will suggest ways to decide which extensions you need, and then we’ll dive into how to create them and what you should consider…