Apache Atlas: GET-ting familiar with the REST API

Hi everyone.  Just posted the second in a series or recordings related to Apache Atlas, the Open Source initiative for metadata management and governance for hadoop.  Many of you have been asking about how to get metadata “out” of Apache Atlas so that you can load it into IGC or other repositories, or just use it for special governance reporting purposes.   In this recording we take a quick look at some of the key “GET” functions of the Apache Atlas REST API, and how you can easily do testing and prototyping of these calls using only your browser.   –ernie




Check out this “Recipe” for integrating Oracle ODI metadata into IGC!

Hi Everyone…

An IBM colleague has published an excellent use case on constructing an OpenIGC bundle  and publishing metadata and lineage for ETL processes represented by Oracle ODI.  She very nicely shows how to illustrate important structures and properties of a 3rd party ETL tool.   Ultimately, this leads to publishing of actual metadata instances so that IGC users can perform lineage reports and also “govern” (assign Terms, Stewards, etc.) their critical metadata.




Apache Atlas: “your first look!”

Hi Everyone.

Just finished uploading the initial video in a series of recordings concerning Apache Atlas, the evolving open source initiative for metadata management and governance in hadoop.

This recording is primarily designed for viewers who aren’t comfortable doing their own builds of open source solutions and also need some guidance on how to get started with vmware images that are available for download.  It introduces the concept and helps validate what needs to be done so that the viewer can be successful with available Apache Atlas resources on the web.  It starts with the download of existing images at the Hortonworks web site, and helps validate your environment so that you can continue with tutorials that are on the Hortonworks site, and/or start playing and exploring on your own.  This is the first in a series of recordings on Apache Atlas that share early experiences and discoveries regarding this important open source initiative for governance and metadata management in hadoop.

Recording can be found at:  https://youtu.be/C4lf_EFduqU

IBM Partners with Creative Solutions Using Open IGC !

Many of you come to these pages to understand how to extend the Information Server repository and use the various Information Governance Catalog APIs to enhance your users’ experiences and increase your governance capabilities.   But for some of you, there are too many interfaces, not enough time, not enough resources (or the right skilled resources) to complete the effort.   Please let me introduce you to various trusted IBM partners who have been trained on, and are using,  Open IGC and related techniques to help customers around the world reach their information governance goals.  Many of these partners have built formal “bridges” from various 3rd party tools, to automate the metadata import process, and most of them also offer expert consulting on IGC and governance strategies in general.

To our partners…thank you for your efforts to spread the word about Open IGC and for helping our customers make even greater progress towards their governance objectives.

To our customers…I invite you to visit these partners’ web pages, ask them about how they can assist you with Open IGC and IGC issues in general, and challenge them to further expand their offerings to extend the repository for all your governance needs.

To our future partnersif you have built or are building a creative solution for achieving governance with the Information Governance Catalog, reach out to myself or my IBM teammates around the world so that we can introduce your efforts to the overall IGC community and ensure your listing is on this page.

Thank you!      –ernie


Compact Solutions  http://www.compactbi.com/solutions/data-lineage/


INFORMATION-ASSET, LLC  http://information-asset.com/


Lucid  http://www.lucidtechsol.com/get-stronger-data-governance-with-lucids-ibm-info-server-enhancements/

Lucid Logo





Manta  https://mantatools.com


Orion http://www.orionic.com/solutions_for_ibm_igc/


Prolifics  http://www.prolifics.com/solutions/information-management-analytics







Evolving Atlas…

Apache Atlas is continuing to evolve, and quite quickly (see an earlier post about Atlas, including links to this open source initiative and other valuable commentary… Apache Atlas…a Common Metadata Initiative with “legs” ?).    Going beyond merely storage and process-based metadata, the Apache project is poised to introduce the ability to define a business taxonomy that increases common understanding and further defines assets across the enterprise.  The important inclusion of business vocabularies ensures that information governance incorporates the needs of ALL members of an organization, and not just IT.

As Apache Atlas takes on greater roles and open source accelerates its uptake, we can foresee a future where Atlas is called upon whenever and wherever data is accessed.  In her latest blog, Mandy Chessell floats the idea of a Connector Framework for Apache Atlas [http://www.ibmbigdatahub.com/blog/insightout-role-apache-atlas-open-metadata-ecosystem.]   Connectors of all kinds can access Atlas at the exact moment that they harvest or act upon data, with the ability to make decisions using everything that Atlas has to offer — ownership, location, data quality statistics, lineage, usage requirements and rules, and more.    This allows Apache Atlas to be more “intimate” with the data integration life-cycle and able to deliver governance rules that have real “teeth”.   –ernie.

Apache Atlas…a Common Metadata Initiative with “legs” ?

Lately there has been increasing discussions about Apache Atlas, (http://atlas.incubator.apache.org), an open source initiative for metadata and governance services.

Standards in the technology industry come and go.   Some make it and enjoy wide adoption; others do not, failing early or never really blossoming to their full potential.   Our industry is littered with examples that had promise but withered away because
vendors were unable to agree on common semantics or unwilling to let go of (or expose) proprietary intellectual material.  Meta models represent a significant investment, and often competitive leadership.  No one wants to yield hard earned territory, or potentially give away the “golden key” to their solution.   Standards like xmi, cwm, and others in the data integration and business intelligence space never fully delivered the nirvana that people hoped for.   They lacked the commitment, weren’t pushed hard enough by customers writing the checks, and capability wise, typically considered by many as nothing more than a “checkbox” requirement.  Certainly, competing vendors in niche data integration areas couldn’t stomach having their meta models shared interchangeably.

The climate for this is changing now.   Thanks to big data and open source, and trends such as the adoption of Hadoop in everyone’s sandbox (even if not in production).   Not participating, or flat-out ignoring open source, is no longer acceptable.   Being “open” is no longer a vendor liability, but a competitive advantage.   Not being open is a path to extinction. For these reasons and more, Apache Atlas is poised to be a major force in the drive for common information governance and metadata management.

Please take the time to read the blogs from two of our highly respected colleagues here at IBM (IBM Fellow Tim Vincent and Distinguished Engineer Mandy Chessell) regarding Apache Atlas and what it will mean for our industry:


That’s all wonderful news on the potential for Apache Atlas.  What does it mean for the InfoSphere  Information Governance Catalog (IGC)?

Along with other contributors from the vendor and user community, IBM is committed to the success of Apache Atlas.   Although still early in its incubator status at Apache, Atlas is already being implemented at customer sites for their hadoop based assets.   And while Atlas is not specifically limited to hadoop, today this is the primary domain where it plays and will mature.

In the meantime, Information Server customers using IGC want to use Apache Atlas to help federate the metadata in their hadoop distributions so that it participates in their enterprise governance ecosystem.   Atlas shows evidence of eventually supporting distributed and clustered configurations, but sites are looking to do this right now — by bringing Atlas metadata directly into the Catalog.   OpenIGC, the methodology and API for extending the IGC repository, makes this possible today.  Several customers, as well as IBM, are looking into how the two can be integrated.  Each technology supports a robust REST API, and describes similar constructs that can be illustrated in each, either directly or by extending the underlying default models.  Pulling Atlas metadata into IGC allows it to immediately participate in data lineage reporting, be assigned to subject matter experts and Stewards, related to data quality statistics, and to be connected to approved policies for data management and governance.   Sites can immediately reap the benefits of IGC in combination with their hadoop based Atlas investments, while still looking to the future and the benefits that Atlas holds for even deeper governance capabilities and participation by a vast number of vendors and technology owners.

Lots to do, and lots to keep track of!  But many things that can be done “right now” to take advantage of, and garner insight, into the future.  Stay tuned.  Atlas is moving towards becoming a standard with legs we can all stand on….


Sample Bundles

This entry is one of many in a series that describes the InfoSphere Open IGC API, which allows you to define your own objects for information governance using InfoSphere Information Server and the Information Governance Catalog.

Previous post in this series:
Updating Your Bundle

Original post in this series:
Open IGC is here!

Here are some sample bundles for you! These bundles correspond to the use cases that I have been describing within this blog series (see “Original post” above). Each .zip file contains a directory structure that is formatted as I described in one of my early posts on bundle design (Open IGC: Defining a new bundle!). These bundles are for demonstration and learning purposes only. There are no warranties or certified methodologies implied.

Each bundle is complete with the asset_type_descriptor, along with several instance publishing upload files and one or more flow model uploads (if applicable to the use case). I have tried to include examples of various techniques, some of which I have already reviewed in these posts, or intend to in the near future. The values for various string properties are fictitious, and in some cases, just repeated and copied in the interest of more quickly building the example. This is especially true with the asset_ids (attribute ID= in the publishing and flow upload xmls), whose values are fairly random. These xml documents were crafted by hand — a good way to start testing — but ultimately, most of you will probably generate these unique identifiers programmatically. The prior posts in this series are enough to help you take these examples, register their bundles and upload their assets and lineage specifications. Then you can play with the instances within IGC, add new ones, update property values via the user interface or with new xml’s, and get further inspired to build your own!

Let me know if you have any problems accessing these zip file, or if you have any further questions about their use. — and let me know if you would like to also share your own creative bundles!


Note: This site doesn’t allow me to upload .zip files, so the files at these url’s have been renamed with “.ppt” as an additional suffix. Just rename them after download. They are normal .zip files.

Messaging Use Case


Abstract “Access Control” Use Case


Transformation Tool Use Case