Lately there has been increasing discussions about Apache Atlas, (http://atlas.incubator.apache.org), an open source initiative for metadata and governance services.
Standards in the technology industry come and go. Some make it and enjoy wide adoption; others do not, failing early or never really blossoming to their full potential. Our industry is littered with examples that had promise but withered away because
vendors were unable to agree on common semantics or unwilling to let go of (or expose) proprietary intellectual material. Meta models represent a significant investment, and often competitive leadership. No one wants to yield hard earned territory, or potentially give away the “golden key” to their solution. Standards like xmi, cwm, and others in the data integration and business intelligence space never fully delivered the nirvana that people hoped for. They lacked the commitment, weren’t pushed hard enough by customers writing the checks, and capability wise, typically considered by many as nothing more than a “checkbox” requirement. Certainly, competing vendors in niche data integration areas couldn’t stomach having their meta models shared interchangeably.
The climate for this is changing now. Thanks to big data and open source, and trends such as the adoption of Hadoop in everyone’s sandbox (even if not in production). Not participating, or flat-out ignoring open source, is no longer acceptable. Being “open” is no longer a vendor liability, but a competitive advantage. Not being open is a path to extinction. For these reasons and more, Apache Atlas is poised to be a major force in the drive for common information governance and metadata management.
Please take the time to read the blogs from two of our highly respected colleagues here at IBM (IBM Fellow Tim Vincent and Distinguished Engineer Mandy Chessell) regarding Apache Atlas and what it will mean for our industry:
That’s all wonderful news on the potential for Apache Atlas. What does it mean for the InfoSphere Information Governance Catalog (IGC)?
Along with other contributors from the vendor and user community, IBM is committed to the success of Apache Atlas. Although still early in its incubator status at Apache, Atlas is already being implemented at customer sites for their hadoop based assets. And while Atlas is not specifically limited to hadoop, today this is the primary domain where it plays and will mature.
In the meantime, Information Server customers using IGC want to use Apache Atlas to help federate the metadata in their hadoop distributions so that it participates in their enterprise governance ecosystem. Atlas shows evidence of eventually supporting distributed and clustered configurations, but sites are looking to do this right now — by bringing Atlas metadata directly into the Catalog. OpenIGC, the methodology and API for extending the IGC repository, makes this possible today. Several customers, as well as IBM, are looking into how the two can be integrated. Each technology supports a robust REST API, and describes similar constructs that can be illustrated in each, either directly or by extending the underlying default models. Pulling Atlas metadata into IGC allows it to immediately participate in data lineage reporting, be assigned to subject matter experts and Stewards, related to data quality statistics, and to be connected to approved policies for data management and governance. Sites can immediately reap the benefits of IGC in combination with their hadoop based Atlas investments, while still looking to the future and the benefits that Atlas holds for even deeper governance capabilities and participation by a vast number of vendors and technology owners.
Lots to do, and lots to keep track of! But many things that can be done “right now” to take advantage of, and garner insight, into the future. Stay tuned. Atlas is moving towards becoming a standard with legs we can all stand on….