ETL vs EAI for Real Time …revisiting an old subject…

Here’s a topic I hear less and less about these days. There are many reasons for this, the most likely being that the lines continue to blur between these technologies, especially now that various independent vendors in each space have been absorbed or morphed into other things. However, it raises its head every once in awhile, and even if classic “ETL” (extraction-transformation-load) and “EAI” (enterprise application integration) are under the same tool umbrella, it’s likely that choices will still need to be made as to the use of underlying features and capabilities.

Thought I’d share some thoughts that were discussed on a recent conference call with a site who is working thru the “ETL vs EAI for Real Time” decision tree. These are some of the major issues to consider (no particular priority attached to their order):

Protocol/medium. MQ Series? JMS? Tibco? Socket? Named Pipe? Sonic? MSMQ? other? Many of the EAI tools that still exist have more of these protocols “in the box.” ETL tools — not always. How hard or easy is it to extend your choice of tool.

The “shape” of the source. Is it relational in nature (rows and columns)? or hierarchical (tree like structure)? XML? COBOL with multiple record types, OCCURS, variable lengths? …or fixed Format? Both ETL and EAI typically handle these, consuming metadata from XML Schema, COBOL, etc. Exceptions would be things like SWIFT or EDI and EDIFACT, where EAI-style tools such as IBM WebSphere Transformation Extender typically have legacy history and metadata import capabilities. Shape of the source is important, but becomes even more significant in light of the next item.

The “shape” of the target. Is it relational (rows and columns)? Tables in an rdbms? Flat file? …or is it also XML with a complex multi-path hierarchy or COBOL with OCCURS depending on, variable length and mulitiple record types, or SWIFT, EDI, EDIFACT, HIPPA, etc.? ETL tools are best at performing relational work. Most can all do hierarchical as well — but often at the price of complexity. If the source and the target are hierarchical, how many transformations of this type are you doing? Just a few, or 98% of your project?

Data Volume.ETL tools excel here. Parallelism and grid work is a given. Being able to manage huge volumes of data, multiple terabytes, in batch, are where they are regularly exercised. EAI tools typically do not fare well in huge volume scenarios, or at least not without crafting parallelism and other configurations that are out-of-the-box with tools like DataStage.

Units of Work.Transactional paradigms. Delete from one rdbms table and write to another. Commit. If either resource fails, rollback everything. Delete from one queue and write to another, or to a relational table. Commit. If either resource fails, rollback. ETL tools can do some of this and do it well, but are often limited in the choices of queues and databases supported. In EAI tools, the semantics are usually simpler and more at the “core” of any EAI tool, richer in resource support (which databases, queuing systems, etc.) and often provide more flexibility and control to react to failures.

Skills.Beauty is in the eyes of the beholder…and ease of use/learning curve is no different. Still, ETL tools, while far from being designed for ‘end-users,’ have typically appealed to “data professionals” with technical skills but not necessarily hard core programming skills. EAI tooling is usually camped in the more technical domain. There is a lot of grey area and overlap here, but the history is clear — EAI tools have a rich development history of providing ensured delivery for binary EDI formats running 24 by 7 — and ETL tools have a rich history of providing consumable transformations for large volumes of data to support business intelligence initiatives. Enough said.

Long ago I recall a great write-up from an analyst who concluded that most companies require both technologies because the goals of each tool are so different. Years later, many companies have both technologies in their toolbox — so it may not be a purchasing decision as much as an implementation one. There are no wrong answers here — just answers that might lead to a bumpier road down the line.

Good luck as you review your variables.

Ernie

Advertisements

3 Responses to “ETL vs EAI for Real Time …revisiting an old subject…”

  1. Bill Conniff Says:

    I have been reading your blog articles on ETL and XML and thought you might be interested in some xml tools I developed.

    I am the founder of a very small software startup in the XML space. Here is the URL to my website if you’re interested: http://www.xponentsoftware.com
    You can find details on XMLMax and my company from there. There are a lot of XML editors out there, but XMLMax has one distinguishing feature: it will
    display any size or structure xml in a treeview. And it does so extremely fast.

    I am beta testing CAX, a caching API for XML that is intended for use in transforming large xml documents without using XSLT. Details on CAX are also on the
    website.

    Please don’t hesitate to contact me if you have any questions.

    Regards,

    Bill Conniff
    Founder, Xponent LLC

  2. Felipe Says:

    Esta es una aplicacion que genera un archivo edifact VALIDO de tipo INVOIC: http://www.informaticaautonomos.com/demos/emitirInvoic.php. Puede ser util para quienes quieran generar dichos archivos.

    • dsrealtime Says:

      Hi Felipe… Thanks. EDIFACT, EDI, SWIFT, etc. are classic formats. Tools such as WebSphere TX handle these out-of-the-box, as do various legacy tools from the EAI days. I have spoken to sites who use the former ItemField (now part of Informatica) and SeeBeyond (was purchased by Sun, now may be still part of Oracle) to process these. These are in one of those grey areas, where some of the processing might be do-able in ETL, but legacy tools that already have “built in metadata” are advantageous because the initial mapping has already been done for specific record types that are standard in these systems. Often we see tooling being combined, as in the WebSphere TX (offered as MapStage when combined with DataStage) or ItemField scenarios.

      Ernie


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: