THINK 2018 Roundup

Hi everyone!

Just returned from a great week at our annual user conference in Las Vegas!   This was a much larger event than in the past, as it encompassed far more of the IBM portfolio than just Information Server or Analytics.    The halls and sessions across the venue were more crowded, but there was a lot more to learn!   Besides entirely new topics on the technology and issues facing our business, there were sessions on the use and integration of all parts of Information Server and other offerings across the IBM and partner portfolio.  I always learn more things about this platform at this event and enjoy hearing about our customers’ successes while providing insight and assistance where I can.

Specific to the Unified Governance and Integration area, THINK 2018 was a major opportunity to showcase Information Server release 11.7.    There were demos and sessions on everything from the integration of structured and un-structured data to the new Data Flow Designer for DataStage and machine learning.  In the next few weeks I will post details on my experiences with these new capabilities, especially as they relate to governance and metadata management.

Another exciting moment at this year’s conference was the opportunity to see and hear more about the continued evolution of Apache Atlas.  Members of the Apache Atlas team (IBMers, partners and customers) conducted a hands-on-lab that highlighted the progress being made on Open Metadata.  This included a powerful use case incorporating Apache Atlas and Ranger for dynamic data masking, and also illustrated the integration of Apache Atlas with the Infosphere Information Governance Catalog.  This is a working “proof point” for how independent repositories can share metadata using the Open Metadata APIs.   The team also participated in a panel discussion to discuss the value of ODPi, where all of us interested in governance can contribute to the success of Open Metadata.

RedguideSigning

Several members of the team participated in a book signing for a new “Redguide” that they recently authored, which reviews important use cases for Open Metadata and brings us up to date on the current progress of the Apache Atlas initiative!   This is a MUST READ for all of us who are passionate about metadata and governance!

–ernie

 

Check out the links below for further information and details.

Apache Atlas…open source metadata and governance!     http://atlas.apache.org/

The new Redguide, The Journey Continues  http://www.redbooks.ibm.com/Abstracts/redp5486.html?Open

Open Metadata (…at the Apache Atlas Wiki)

https://cwiki.apache.org/confluence/display/ATLAS/Open+Metadata+and+Governance

ODPi Project for Data Governance  https://www.odpi.org/projects/data-governance-pmc

 

Advertisements

Lost in translation?

editor’s note:  It gives me great pleasure to introduce Beate Porst, a good friend and colleague, who is the Offering Manager for DataStage and other parts of the Information Server platform.  Beate will be sharing her insights into Unified Governance and Integration, based on many years of experience with this platform and the issues surrounding data transformation and management.  Today she introduces some of the key new capabilities of Information Server v11.7.  Please welcome Beate to dsrealtime!   –ernie

How IBM Information Server v11.7 could have saved NASA’s 125-million dollar Mars orbiter from becoming lost.

We all know the slogan: Measure twice, cut once. What if we do but don’t know the context of our data?

That is what happened to NASA in 1999. While using the right numbers, their 125-million-dollar Mars orbiter was designed to use the metric system but mission control performed course corrections using the imperial system. This resulted in a too low altitude and contact to the orbiter was lost. An embarrassing moment for NASA.

But it wasn’t the only incident. In 2003, German and Swiss engineers started to build a bridge over the river Rhine in the border town of Laufenburg. Each country started to build the bridge on their side with the goal to meet in the middle. So the plan. Engineers used “sea level” as the reference point. Problem is that sea level in Germany is based on the North Sea where in Switzerland it is based on the Mediterranean, resulting in a 27cm difference. Now, builders in Germany knew the difference but apparently not whether to add or subtract that difference from their base. So they made the wrong choice.

Bridge_Waa

Historical documents show that using out of context, incomplete or inaccurate data has caused problems ever since mankind started to develop different units of measurement.

Now the question is how can you avoid costly incidents such as the above and successfully conquer your data problems and how can IBM Information Server help you in that journey?

Whether you want to build a bridge, send an orbiter to Mars or simply try to identify new markets, you will only be as good as the data you use. This means, it must be complete, in context, trusted and easily accessible in order to drive insights. As if this isn’t challenging enough, your competitiveness also depends on your organizations ability to quickly adapt to changing conditions.

For more than a decade, IBM InfoSphere Information Server has been one of the market-leading platforms for data integration and governance. Users have relied on its powerful and scalable integration, quality and governance capabilities to deliver trusted information to their mission critical business initiatives.

John Muir once wrote: “The power of imagination makes us infinite”.  We have applied our power of imagination to once again reinvent the Information Server platform.

As business agility depends on the flexibility, autonomy, competency, and productiveness of the tools that power your business, we have infused Information Server’s newest release with a number of game changing inventions which include deeper insights into the context and relationship amongst your data, increased automation for your users to complete their work faster and saver, and more flexibility workloads for higher resource optimization. All of those are aimed at making your business more successful when tackling your most challenging data problems.

Let’s look at 4 of those game changing inventions and how they are going to help  your business:

  1. Contextual Search: Out of context data was the leading cause of error for NASA’s failed mission. The new contextual search feature called Enterprise Search provides your users with the context to avoid such costly mistakes. It greatly simplifies and accelerates the understanding, integration, and governance of enterprise data. Users can visually search, explore and easily gain insights through an enriched search experience powered by a knowledge graph. The graph provides context, insight and visibility across enterprise information giving you a much better understanding and awareness of how data is related, linked, and used.
  2. Cognitive Design: Getting trusted data to your end users quickly is an imperative. This process starts with your integration design environment. To help address your data integration, transformation or curation needs quickly, Information Server V11.7 now includes a brand new versatile designer, called DataStage™ Flow Designer. It features an intuitive, modern, and secure interface accessible to all users through a no-install, browser-based experience, accelerating your users’ productivity through automatic schema propagation, highlighted design errors, powerful type ahead search as well as full backwards compatibility to the desktop version of the DataStage™ Designer.
  3. Hybrid Execution: Data Warehouse optimization is one of the leading use cases to address growing data volumes while simplifying and accelerating data analytics. Once again, Information Server V11.7 has strengthened its ability to run on Hadoop with a set of novel features to more efficiently operationalize your Data Lake environment. Amongst those, is an industry unique hybrid execution feature which lets you balance integration workloads across a Hadoop and non-Hadoop environment aimed at minimizing data movements and optimizing your integration resources.
  4. Automation powered by machine learning: Poor data quality is known to cost businesses millions of dollars each year. The inadvertent use of different units of measurements for the Mars orbiter was ultimately a data quality problem. However, the high manual work combined with exponential data growth continues to be an inhibitor for businesses to maintain high data quality. To counter this, Information Server V11.7 is further automating the data quality process, by underpinning data discovery and classification with machine learning, so that you can spent your time focusing on your business goals. The two innovative aspects are:

Automation rules which lets business users define graphical rules which then automatically apply data rule definitions and quality dimensions to data sets based business term assignments and

One-click automated discovery which enables discovery and analysis of all data from a connection in one click providing easy and fast analysis of hundreds or thousands of data sets

Don’t want to get lost in translation? Choose IBM Information Server V11.7 for your next data project.

Apache Atlas Update: Have you been watching?

It has been awhile since I’ve written anything.  Time to “catch up!”

A lot has been happening in the world of metadata management and governance.   We are now seeing many real life use cases, as machine learning, intelligent data classifications, graph database technology and more are being applied to the information governance domain.    Efforts for standardization in the metadata and governance space are moving forward also.  For this post, let’s take a look at Apache Atlas.

Apache Atlas continues to mature, celebrating several major milestones in 2017.  Shortly after its second birthday (Apache Atlas was launched as an incubator project in May of 2015), Apache Atlas graduated to a top level project status signifying that the project’s community and products have been well-governed under the Apache Software Foundation’s (ASF) meritocratic process and principles.  This is evidence of the hard work performed by the collective Apache Atlas team that Apache Atlas is increasingly ready for real world implementations.  Of course, that milestone, while worthy of recognition, is just one of the many steps Atlas is taking, and continues to make, going forward.  Here are other significant developments for Apache Atlas this year:

  • Introduction of OMRS and its other complementary APIs.  OMRS is a key part of the Open Metadata framework that introduces the notion of repository metadata sharing and access.  In the true spirit of Apache communities, Apache Atlas is not alone in the world of enabling information governance; sharing of metadata between diverse metadata repositories can now be realized, in addition to simpler federation of metadata across multiple Atlas repositories.
  • New common models for critical types of metadata.  To facilitate metadata sharing via OMRS, and to establish a more widely adaptable set of asset definitions, it was agreed by the Atlas team that a common definition for data structures, processes, and other data asset attributes.  This helps facilitate metadata sharing by increasing the likelihood that integrators building interfaces to Atlas will choose a common type definition for their content instead of designing their own custom types while providing extension points if needed.
  • New Glossary Model.  A detailed new glossary model was designed (and API implemented) for a stronger semantic layer.  Business concepts and their relationships are the cornerstone of disciplined information governance.
  • Streamlining of the Apache Atlas infrastructure.   The underlying graph database implementation was upgraded to take maximum advantage of JanusGraph, itself becoming the leading standard for open source graph engines.
  • Continued/ongoing clean-up of the install and build procedures.  Considering the wider adoption of Apache Atlas throughout the governance community, Atlas team has enhanced test suites to assure that the new functionality added is well tested and the build and install processes are more streamlined..  For example, packaging and building Apache Atlas within Docker containers.
  • The number of new Committers!  Apache, as everyone knows (or should know), is a meritocracy.  This means that recognition and influence is determined by an acknowledged investment of time, effort, and contributions.  Formal recognition as a committer requires many months of hard work to moving a project forward.  Congratulations to all the new Committers this year!    Even more important, the increase in Committers and contributors overall is yet another illustration of how Apache Atlas is growing in importance and general industry awareness.
  • The Virtual Data Connector use case.  Self service data exploration environments need to provide an integrated view of data from many different systems and organizations.  Access is needed in order to discover new uses and interesting patterns in the data.   The VDC project aims to provide a single endpoint for accessing data that presents a virtualized view of the data assets with the appropriate data security.  This is accomplished by extending the integration of Apache Atlas with Apache Ranger via the tag-based security access introduced in Apache Atlas in 2016, in order to provide security access based on both the classification tags (eg PII and SPI tags, subject area of the data etc.)  An additional plug-in is added to Apache Atlas to control access to metadata based on whether an end-user is allowed to discover a data sources’ metadata.

So….it’s been a very busy year for Apache Atlas.  While most of these capabilities have already been developed and are being tested, they will become generally available in the upcoming Apache Atlas v1.0 which will be a huge milestone release for the community. The project is maturing, and gaining increased attention across the industry, in the information governance space, and beyond.   The code continues to mature, with increase in adoption and variety of applications every week.   The critical mass of industry expertise contributing to Apache Atlas continues to grow.    Start watching!   Start playing!  Join in and help Apache Atlas reach its next set of milestones!

–Ernie

 

Main Apache Atlas web site
Atlas Wiki

Links to specific Apache Atlas Topics

Open Metadata and Governance
Link to more details on OMRS
Building Out the Open Metadata Typesystem
Virtual Data Connector

New Governance Blog covering IGC

Hi everyone…. here’s a pointer to another IGC and Governance resource written by some of my IBM colleagues…..  this post includes details on the advantages of using OpenIGC to extend governance to any kind of assets…   https://ibm.co/2AGpdaq .   Happy reading!

Ernie

Explore the Benefits of Information Governance with the IGC Trial

Earlier today we released the first implementation of the InfoSphere Information Governance Catalog Trial!   This is a downloadable module that lets you quickly and easily get a closer look at the Information Governance Catalog (IGC), complete with real pre-loaded metadata, business terms, and lineage.    It is a modified Docker-based implementation, and is not intended for production use or full-blown Information Server capability, but it allows you and your team to explore IGC, work with its features, and realize how you can achieve your governance objectives for common understanding, data quality monitoring, and data lineage.   Tutorials and videos will also guide you along the way.   The links below go into far more detail and lead also to the formal download page…     Good luck and enjoy!

Ernie

Main IGC Trial download page and introduction…  https://www.ibm.com/us-en/marketplace/information-governance-catalog

Overview of the IGC Trial and its benefits…  https://www.linkedin.com/pulse/fast-track-your-data-information-governance-catalog-rakesh-ranjan

Insightful post from Marc Haber, IGC Offering Manager… http://www.ibmbigdatahub.com/blog/how-take-next-step-information-governance-now

IBM and Hortonworks!

Hi everyone…

Some exciting recent news, if you haven’t seen it yet…announced a few days ago at the DataWorks Summit/Hadoop Summit in San Jose, a new relationship between IBM and Hortonworks!   Read about it here to learn how IBM and Hortonworks are partnering to further the efforts of our customers to expand their big data solutions.

http://www-03.ibm.com/press/us/en/pressrelease/52572.wss?platform=hootsuite

More important for this blogger is the increased attention this brings to Apache Atlas.  Apache Atlas, if you aren’t already familiar, is an evolving open source approach to enterprise information governance, metadata management, and lineage […go here for a general overview:  https://hortonworks.com/apache/atlas/ ].   One highlight from news above draws particular attention to the contributions IBM and Hortonworks are making to this effort:

“Partnering On Apache

As part of their wide-ranging partnership, the companies will also team to advance the development of Unified Governance (IBM BigIntegrate, IBM BigQuality and IBM Information Governance Catalog) on the Apache Atlas open platform. Information Governance Catalog) on the Apache Atlas open platform. …”

It’s all a work-in-progress, but this is significant news that will hopefully accelerate the initiative.   Have any of you started working heavily with Atlas?   Which release?  Are you using it exclusively with Hadoop, or externally?   Have you interchanged metadata with Atlas and IGC?  Considering it?    Share your experiences!

Ernie

Related posts:

Evolving Atlas…

 

 

 

Re-defining Data Lineage

Well..not so much “re-defining” as re-fining, and adding clarity to the definition and the discussion.  Please find the time to review this excellent blog entry by my IBM colleague, Distinguished Engineer and thought leader, Mandy Chessell…  https://poimnotes.blog/2017/03/19/understanding-the-origin-of-data/

–ernie