Kicking off Transformation Jobs or functions via Command-line is an important feature, but it is equally important that we are able to launch such processing via Web Service or other service-oriented invocation.
Command line utilities are generally text based. They require that you learn the vendor’s proprietary syntax. They may require passwords and userids. You may have to worry about clear-text or be sure there are ways to hide credential information. You probably need to know about shell scripting to take maximum advantage of the Command language. You may also need to know things like Job or Map names, need to understand the parameter structure, or the administrative naming conventions for the Transformation tooling (Project Names, Folders, etc.). You probably need to know, as a developer, the hostname where the Transformation is running, or at least the engine you are connecting to for launching the process.
Command line utitilies are powerful…. there are times, however, when a simpler invocation method is needed. Like when the calling application doesn’t have easy access to scripts or the operating system command line. Or when the skills of the developer(s) establishing the invocation do not include scripting or details of the Tranformation tool’s command line syntax. Or when the ultimate location of the tranformation function being launched is unknown or moves frequently. Service invocations using industry standards help abstract all of that….Servers can be anywhere, and underlying awareness of the Tranformation tooling is hidden from the invoking client. Datatypes for Job Parameters are less painful to handle, and authentication and/or encryption can be handled at the transport layer.
A financial site I’ve been working with uses Web Services to kick off DataStage ETL Jobs from within a portal application. The authors of the portal application are Visual Basic .NET experts. They don’t know the first thing about DataStage, are under deadlines like the rest of us, and needed the ability to start an ETL Job to refresh a datamart as the result of a user pushing a button or choosing an option. Using the RTI module for DataStage (release 7.x), the .NET development team merely consumes the automatically published WSDL definition to include this functionality. Within minutes, they have the functionality included in their application. No scripts to maintain, no complex awareness of DataStage. For all the .NET developers know, the refresh process might be home-grown C++ code. That’s the value of using the SOA industry standards. The .NET developers don’t want to know and they don’t have to know that ultimately, it’s a DataStage Job doing the work. Meanwhile, the skilled DataStage ETL team, who really understands the data, is maintaining the Jobs, the rules, the access to the source and target data structures, and publishing the functionality (as a Web Service, including the WSDL “contract”) for use by the rest of the enterprise.
Command line control (in DataStage we generally call it the Job Control API or Job Control Command language, a.k.a. dsjob) in transformation tools is
great – but increasingly we need simpler methods that exploit the new standards.