Data Quality and Transformation as a Service

In your ETL jobs, do you ever perform “Lookups” to validate account numbers, confirm product codes, or return discount rates?  (I’m sure the answer is “of course you do”)  Are you certifying address lines as part of your batch ETL processes?    (for many of you, another “of course”)  How complex is the logic you are using?  How much work have you done to establish such lookups, certify addresses, and test the functionality?   …and maintain it?   What data sources, security issues and other techniques have you invested in?

Have you thought about how valuable those data integration activities might be to the rest of your organization?

Turn them into a “Service.” One that is easily re-usable, not only for the batch work you are doing today via ETL, but for your real-time applications — your java development teams building a portal, or .NET groups setting up front ends for remote devices.   Or perhaps for other applications that are performing internal communications using your company’s choice of enterprise service bus.

This is where the Information Services Director comes into play for information Server — providing the ability to publish DataStage, QualityStage, SQL queries and stored procedures as Services. Take the “guts” of your data integration activity (Lookups, Transforms, etc.) and “publish” them as a Service.   A “Service” that is supported by industry standards such as Web Services (SOAP over HTTP) and other protocols, along with the necessary artifacts that include a built-in directory and automatically generated WSDL that illustrates the metadata for your creation.

Services that focus on data integration are often most successful when they are built, documented, and maintained by the teams who truly understand the data. In many organizations, this is the same team that has been living with the data migration, transformation, and data warehousing applications.  They have invested time and energy researching the models, the legacy systems, and the oddities of the data under-the-covers.  They’ve built extensive transformations using ETL tooling. Why not exploit those skills, expertise, and business rule investments for benefits beyond the decision support systems?

I’m not talking about new stuff, bleeding edge creations, or upcoming technology.   DataStage, via “RTI,” has been doing this for 4+ years.  Web Services are increasingly mainstream.   Data integration is still at the core, and ETL tools have proven their mettle at simplifying the management of data access technology.  It’s a shame to see it only used for batch.   

Ernie

ps… if you are interesting in reading further about this subject, check out this article on Information as a Service: Data Cleansing Pattern. I was honored to be asked to play a small role in developing this article, written by some of my esteemed IBM teammates. -e

Information as a Service: Data Cleansing Pattern