12 Responses to “What exactly is Data Lineage?”

  1. Bill Conniff Says:

    One caveat I can see with respect to data lineage is there may be intellectual property involved in a calculation somewhere upstream from the report. In that case, the lineage might intentionally be less complete and useful than it otherwise would be.

    • dsrealtime Says:

      Hi Bill…

      Thanks for the thought. Yes — this can happen. The key thing for the metadata administrator [or “the one wearing the hat who is responsible for the lineage research” 🙂 ] is to be able to at least document that the intellectual property exists, whether it is a black box, company sensitive calcuation, an embedded “purchased” solution, or an external web service (etc.). There needs to be a way in the lineage tooling to represent this artifact (and represent it graphically as noted), and if nothing more, have a URL to it, a phone number, a “steward” responsible for it, or something so that the details can be obtained if/when absolutely necessary.

      Ernie

  2. Darren Peirce Says:

    I have always wondered how lineage deals with some of the more complex, yet very real, data management approaches. E.g. if data is stored in an object, subject, predicate form or other similar forms where the “meta-data” i.e. what class of items are being dealt with, is held as data. In this case dozens of individual items may be managed in the same set of columns. I presume this makes it hard to specify rules, such as: when dealing with “agent contracts” (predicate) xyz rules are applied to the data, while when dealing with “direct customers”, abc rules are applied to the data.

    In the simple case, metadata lineage makes perfect sense and should certainly be encouraged, I am just wondering how in practice some of the more complex nuances of data are supported by lineage?

    • dsrealtime Says:

      Thanks Darren. Good point. Some people’s “data” is really “metadata,” and the degree to which you have to “drill down” to find the actual details can be extensive. I’ve been finding that good metadata management and data lineage can often take some creative “artwork” and brainstorming….and a key part of that brainstorming is determing “who” the lineage is for, and when do you draw the line between “lineage” and management of metadata and simply “going to the tool or product or applicationand opening it up”. Two different banks offered use cases that are helpful here. In one case they already had a legacy home grown metadata management application keeps low level “rule” detail in a relational table…. but they didn’t have an up-to-date distribution system (old 3270 green screen stuff), or a way to inter-relate the legacy system with new objects blossoming all around them, from ERwin models to newly aquired database objects and transformation tools. So a hybrid is being put together…..some of their tangible objects are being represented directly, and where needed, a URL dumps out “metadata” from their generic table with appropriate filtering and stylesheet display………. the other use case more clearly outlines what “degree” of metadata is required to to be useful. Their concern is 1000’s of mainframe data sets that are managed by many 100’s of COBOL programs. They no longer have the intellectual knowledge of what happens to individual “fields” in the copy books of those programs, nor anyone who would be able to, in a brief glance, even comprehend what happens at the field level. But simply knowing “which” COBOL program moves “which” files to and fro (and which files are source by which other files and systems) would save them days of combing thru JCL. Consequently, the “black box” that they represent in lineage doesn’t need to be too detailed. If someone needs to know exactly what the MOVE statement looks like in the COBOL code, they can go directly to their source management system and look at the code itself (assuming it still exists — I’ve met sites that don’t have the source anymore either!)

      Ernie

  3. rupesh Says:

    Very nice explanation

  4. Barbara Nichols Says:

    Love it. What is meta data? “Wherever you are” – ‘look up’ – that’s meta data to you!

  5. Martin Says:

    If you are interested in sql data lineage try http://sqldep.com.

  6. Amit Dhiman Says:

    Awesome explanation. Thanks for sharing. I kept searching and read more articles but clearified here.

  7. P Says:

    Hi Ernie….Apologies couldnt find the right topic to post this query.

    Please can you share some insights/case studies/Demos available for creation of Information Governance Dashboards. IBM Knowledge Centre has vast information and doesnt show any demos/lab exercises as such.

    Please share in case you have any reference links for any of above.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: