Data Quality and Transformation as a Service

In your ETL jobs, do you ever perform “Lookups” to validate account numbers, confirm product codes, or return discount rates?  (I’m sure the answer is “of course you do”)  Are you certifying address lines as part of your batch ETL processes?    (for many of you, another “of course”)  How complex is the logic you are using?  How much work have you done to establish such lookups, certify addresses, and test the functionality?   …and maintain it?   What data sources, security issues and other techniques have you invested in?

Have you thought about how valuable those data integration activities might be to the rest of your organization?

Turn them into a “Service.” One that is easily re-usable, not only for the batch work you are doing today via ETL, but for your real-time applications — your java development teams building a portal, or .NET groups setting up front ends for remote devices.   Or perhaps for other applications that are performing internal communications using your company’s choice of enterprise service bus.

This is where the Information Services Director comes into play for information Server — providing the ability to publish DataStage, QualityStage, SQL queries and stored procedures as Services. Take the “guts” of your data integration activity (Lookups, Transforms, etc.) and “publish” them as a Service.   A “Service” that is supported by industry standards such as Web Services (SOAP over HTTP) and other protocols, along with the necessary artifacts that include a built-in directory and automatically generated WSDL that illustrates the metadata for your creation.

Services that focus on data integration are often most successful when they are built, documented, and maintained by the teams who truly understand the data. In many organizations, this is the same team that has been living with the data migration, transformation, and data warehousing applications.  They have invested time and energy researching the models, the legacy systems, and the oddities of the data under-the-covers.  They’ve built extensive transformations using ETL tooling. Why not exploit those skills, expertise, and business rule investments for benefits beyond the decision support systems?

I’m not talking about new stuff, bleeding edge creations, or upcoming technology.   DataStage, via “RTI,” has been doing this for 4+ years.  Web Services are increasingly mainstream.   Data integration is still at the core, and ETL tools have proven their mettle at simplifying the management of data access technology.  It’s a shame to see it only used for batch.   

Ernie

ps… if you are interesting in reading further about this subject, check out this article on Information as a Service: Data Cleansing Pattern. I was honored to be asked to play a small role in developing this article, written by some of my esteemed IBM teammates. -e

Information as a Service: Data Cleansing Pattern

Advertisements

17 Responses to “Data Quality and Transformation as a Service”

  1. Sam Says:

    Hi Ernie,

    I have list of question on WISD’s published thorugh DataStage WebSphere Information Server, please help me if you can:

    – can we authenticate the calls made to this wisd, by any way (any kind of security) ?
    – Null handling, if i don’t pass input parameter/message to WISD which do the lookup and get the data, it gives error; is there any way to get rid of that error ?
    – Is there any way of logging the calls made to this sevice (for maintainence purposes) ?

    Hope, you can help me these doubts, thanks in advcance.

    • dsrealtime Says:

      Indeed! ISD supports HTTPS and SSL and basic WS-Security security features easily, with a simple check-box during design, and because ISD services are deployed on WebSphere Application Server, you can establish very deep levels of WS-Security if necessary. Nulls can and should be handled inside the Job….but it may depend on the error….there are some types of data issues that get stopped at the SOAP level, long before ISD receives it…..some even at the client level, depending on the tool that you are using (assuming you are using SOAP). Please share details of the error you ae getting.

      ISD uses the normal WebSphere stack for deployment of its services. Tivoli provides integrated tools to do monitoring of such services.

      Ernie

  2. Sam Says:

    Thanks a lot for your time and answers. If you can explain following things to me in detail, it would be a great help as i am novice in this area.

    – for security as you explained, is there any documentation as i have all the documentation for 8.0.1 (but was not able to find how to implement that when calling Web Services outside datastage; like if i am calling web service publised on ISD thorugh eclipse for testing). Any way i can define while creating WISD to authenticate who is calling it.

    – for Null: when i call the service thorugh ecilpse to test and in service (i call a job to do lookup where input is not null) and if i pass no value and invoke service it gives me following error:
    exception: javax.ejb.EJBException: nested exception is: javax.transaction.TransactionRolledbackException: CORBA TRANSACTION_ROLLEDBACK 0x0 No; nested exception is: org.omg.CORBA.TRANSACTION_ROLLEDBACK: “javax.transaction.TransactionRolledbackException: ; nested exception is: javax.ejb.EJBException: nested exception is: com.ascential.asb.agent.HandlerException: Job samtestproj.jp_Wisd_get_Info.1236904303669.PipeReceiver Aborted. vmcid: 0x0 minor code: 0 completed: No”

    – for logging purpose, i will do my homework of finding more on it and then get back to you if i have any doubts.

    Thanks a lot, hoping to get some more thoughts from you on my doubts.

    – Sam

    • dsrealtime Says:

      Hi…

      The features for built in security are in the 8.1 release, which became available about six months ago. In 8.0 you can do any kind of security also, but have to do it at the WAS layer. I can dig up a document for you on this that I worked on with our engineering team.

      As for nulls, are you sending up the entire input message as null, or just one of the column values? If so, what binding are you using? Watch the DS Job in the DS Director. Does the instance stay running, or does the job abort. The error below is fairly generic…it’s hard to say if the job is not liking the nulls, or if the nulls aren’t even getting to the DS Job. And just a thought — why are you passing nulls? If the properties aren’t filled at all, why make the invokation?

      Ernie

  3. Sam Says:

    Thanks again for timely response…!

    Can you help me with some documentation on Security in DS 8.0 for WebServices, at what levels we can implement and types of authentication. I really need this.
    My Email: sensual_sturdy@yahoo.com

    I tried handling the null part in my DS job and it worked fine for me, will get back to you if i get any more doubts on it.

    Thanks a lot !
    – Sam

  4. Sam Says:

    Thanks a lot, that document helped a lot and i have also implemented small java codes in java transformer, it all works… wonderful…!

    will come back to you in case of any other questions or help…

    Thanks again !
    – Sam

  5. Alberto Says:

    Hello Sam,

    I see you refer to some document that helped you resolve the issue you mentioned. I’m having the same problem, at least the same error message. It would be great if you could send me that document you used to solve the issue.

    Thanks in advance

    • Sam Says:

      Hi Alberto,

      Apologies for replying late, please let me know your email-id where i can forward you the document which helped me and also that was given to me by Ernie(The Man) himself and it his document only.

      Thanks…!
      – Sam

      • Pradeep Says:

        Hi SAM and ALberto,

        I am having the same problem that you mentioned above. Appreciate if you can send me teh document taht helped you resolve the issue. You can send the coument to pradeep.amboji@live.com

        Thanks!
        Pradeep

  6. CC Says:

    Hi Ernie,

    I’d like to know if there are any considerations (apart from the obvious performance one) when it comes to publishing large DataStage jobs as Web Services through ISD – I’m primarily interested in the Topology III jobs which have both the WISD Input and WISD Output stages?

    What kind of constraints or issues (apart from the issue of potentially degrading performance) may be encountered as the number of stages within the job increases – is there a limit at which there are too many stages for the job to be successfully published as a Web Service through ISD?

    Thanks,
    CC

    • dsrealtime Says:

      Great question CC. Ultimately, there isn’t any particular limit, and once the job is loaded up into memory, performance is unlilkely to be an issue. There are numerous things to be conscious of with really large Jobs and ISD, but the biggest ones are (1) Start up time and (2) resource consumption. Start up time is somewhat alleviated if you stick with fully “always on” Jobs (those with WisdInput and WisdOutput Stage types. However, it can still be lengthy, especially with extremely large EE Jobs that have tons of Stages, lots of lookups that might need to be loaded, etc. etc. Two minutes of start up time may be fine for a batch job that runs in 45 minutes and processes 1/2 Terabyte in parallel, but will be unacceptable if you are spawning the first job when the initial transaction comes in from an online portal. During debugging, you will probably need to adjust the timings in the WISD stages until they are high enough not to receive an error on initialization. But if you have at least a minimum of “1” instance of the Job for the WISD Operation, and you keep it running or start it long before the first request comes in, you should be fine. Resource consumption is the bigger one. Remember that an always on EE job that is really large is going to have a lot of osh processes running. They take up resources…..sure, they might be resting for much of the time, but they still have to be maintained. If you have only one really large Job, fine…but if you have 3, 4, 10, 20….it’s like running a whole set of concurrent batch jobs. You are going to start putting pressure on the machine. Keep that in mind when deploying large jobs…how many of them, and how many instances……and also, what is your config file? Single node is the best and easiest to manage with ISD Jobs…..if you need multi-node, keep in mind that it will also multiple the number of overall processes. Be sure you have enough CPUs to sustain the Jobs you need. Server Jobs are good here for special techniques, as they take up far fewer processes, regardless of their size…..there are other caveats there, of course, but they should be considered when evaluating your real time requirements.

      Ernie

      • CC Says:

        Hi Ernie,

        I really appreciate your comments – and particularly the speed with which they were provided – it’s Google-esque!

        Initialisation is not an issue as deployment will occur well before the first request comes in, and at least one job instance will be up and running and ready to go.

        Not sure yet as to resources – it may or may not be an issue, but I will be thinking about it.

        As you mentioned, I did notice the increasing spawning of osh processes as stages were added to a job, and immeidately wondered if there was a way to change the job design to reduce the number of osh processes that get spawned when the job is deployed by ISD.

        My initial thought was to convert most of the stages from parallel execution mode to sequential (jobs are all in EE using parallel canvas), but that didn’t seem to have any real impact on the spawning of osh processes.

        As you suggested, I’ll have to look into whether it’s configured for multi or single node.

        Not sure what the Server job approach you mentioned entails – can Server jobs also be deployed as Web Services through ISD?

        Thanks,
        CC

      • dsrealtime Says:

        …send me an email at eostic@us.ibm.com and it will be easier to comment further…but in brief, yes, Server Jobs are fully supported also for use with ISD….

        Ernie

        Ernie Ostic Product Specialist Cell: (617) 331 8238

  7. Digvijay Says:

    Hi,
    I am trying to deploy the QS job in the Information Service Director and during deployment i am seeing issue like below.It is a EJB service and provided as a sample in the IBM Product.Please advise.
    java.rmi.ServerException: RemoteException occurred in server thread; nested exception is:
    java.rmi.RemoteException:
    >> SERVER (id=4773e3aa, host=iishost.equ.com) TRACE START:
    >> java.rmi.RemoteException: ; nested exception is:
    javax.ejb.EJBException: com.ascential.asb.agent.config.AgentNotConfiguredException: Server “durin.torolab.ibm.com:31531” does not have an agent configured.
    at com.ascential.asb.agent.config.server.impl.AgentConfigurationImpl.getAgentMetaData(AgentConfigurationImpl.java:339)
    at com.ascential.asb.agent.config.server.impl.AgentConfigurationBean.getAgentMetaData(AgentConfigurationBean.java:257)
    at com.ascential.asb.agent.config.server.EJSRemoteStatelessAgentConfiguration_c53aed69.getAgentMetaData(Unknown Source)
    at com.ascential.asb.agent.config.server._AgentConfigurationRemote_Stub.getAgentMetaData(_AgentConfigurationRemote_Stub.java:809)
    at com.ascential.asb.agent.config.ejb.EJBAgentConfiguration.getAgentMetaData(EJBAgentConfiguration.java:551)
    at com.ascential.rti.design.server.impl.RTIDeployBean.enableProvider(RTIDeployBean.java:885)
    at com.ascential.rti.design.server.impl.RTIDeployBean.enableOperation(RTIDeployBean.java:816)
    at com.ascential.rti.design.server.impl.RTIDeployBean.enableService(RTIDeployBean.java:782)
    at com.ascential.rti.design.server.impl.RTIDeployBean.registerAndStartServices(RTIDeployBean.java:599)
    at com.ascential.rti.design.server.impl.RTIDeployBean.createRuntime(RTIDeployBean.java:377)
    at com.ascential.rti.design.server.impl.RTIDeployBean.deployer(RTIDeployBean.java:177)
    at com.ascential.rti.design.server.impl.RTIDeployBean.deployApplication(RTIDeployBean.java:506)
    at com.ascential.rti.design.server.EJSLocalStatelessRTIDeploy_57d641e8.deployApplication(Unknown Source)
    at com.ascential.rti.design.server.impl.RTIDesignImpl.deployApplication(RTIDesignImpl.java:1694)
    at com.ascential.rti.design.server.impl.RTIDesignBean.deployApplication(RTIDesignBean.java:638)
    at com.ascential.rti.design.server.EJSRemoteStatelessRTIDesign_815b722b.deployApplication(Unknown Source)
    at com.ascential.rti.design.server._EJSRemoteStatelessRTIDesign_815b722b_Tie.deployApplication(_EJSRemoteStatelessRTIDesign_815b722b_Tie.java:964)
    at com.ascential.rti.design.server._EJSRemoteStatelessRTIDesign_815b722b_Tie._invoke(_EJSRemoteStatelessRTIDesign_815b722b_Tie.java:164)
    at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:622)
    at com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:475)

    • dsrealtime Says:

      Wow. I missed this….so sorry. I hope you have resolved it by now……from this error it is very hard to tell what might be the issue…..naturally, the ASBAgent needs to be running, but that’s true for any deployment. EJB is an advanced binding. I will assume that you have tried other simple jobs and have them working perfectly with the SOAP over HTTP binding? That is a very important pre-requisite. Always deploy jobs first with the SOAP over HTTP binding to make sure they are working as you expect, and to have a simpler test env that you can validate with tools like Actional Diagnostics or SOAP UI……and always have a simple test Job (ISDInput to Transformer to ISDOutput) to test that your entire environment is valid before moving on to more complex scenarios and Jobs such as QualityStage.

      Ernie

  8. Anita Says:

    Hi,
    I have deployed job as web service through Information Service Director.
    But when we are exposing service through MDM, its throwing following exception.
    Please help me in the same.
    Trial 1-
    javax.ejb.EJBException: nested exception is: javax.transaction.TransactionRolledbackException: CORBA TRANSACTION_ROLLEDBACK 0x0 No; nested exception is:
    org.omg.CORBA.TRANSACTION_ROLLEDBACK: javax.transaction.TransactionRolledbackException: ; nested exception is:
    javax.ejb.EJBException: com.ascential.asb.agent.HandlerException: Job CAC_RealTime_Integration_AVI.1323767452841 Aborted.
    at java.lang.Throwable.(Throwable.java:67)
    at com.ascential.asb.agent.AgentException.(AgentException.java:47)
    at com.ascential.asb.agent.HandlerException.(HandlerException.java:74)
    at com.ascential.asb.agent.HandlerException.(HandlerException.java:40)
    at com.ascential.asb.agent.handler.datastage.PipeReceiver.processNotification(PipeReceiver.java:305)
    at com.ascential.asb.agent.handler.datastage.NotificationListener.processNotification(NotificationListener.java:196)
    at com.ascential.asb.agent.handler.datastage.NotificationListener.run(NotificationListener.java:134)
    vmcid: 0x0 minor code: 0 completed: No

    Trial 2-
    javax.ejb.EJBException: nested exception is: javax.transaction.TransactionRolledbackException: CORBA TRANSACTION_ROLLEDBACK 0x0 No; nested exception is:
    org.omg.CORBA.TRANSACTION_ROLLEDBACK: javax.transaction.TransactionRolledbackException: ; nested exception is:
    javax.ejb.EJBException: com.ascential.asb.agent.HandlerException: DataStage job CAC_RealTime_Integration_AVI.1323767452840 stopped unexpectedly.
    at com.ascential.asb.agent.handler.datastage.PipeReceiver.handleResponses(PipeReceiver.java:183)
    at com.ascential.asb.agent.handler.datastage.PipeReceiver.getResponse(PipeReceiver.java:129)
    at com.ascential.asb.agent.handler.datastage.PipeReceiver.run(PipeReceiver.java:89)
    vmcid: 0x0 minor code: 0 completed: No
    Thanks,
    Anita


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: