ETL tools were designed for back-room, nightly batch processing, right? Yes…maybe….I suppose. If you look at their history, with most ETL tooling born in the decision support and data warehousing world, the biggest challenges were for point-in-time refreshes and loading of vast amounts of information. However, requirements have evolved, missions have changed, and ETL is no longer used only for decision support. Indeed, a certain percentage of sites never have used ETL for data warehousing, even if that is admittedly still a large segment of the application for such tools and technologies. Today, ETL is a great choice for real-time, and it’s safe to say that the tools are now being designed for top notch real-time functionality. I’d like to just stop using the term “ETL” (or ELT, ETML and some of the other acronyms that have been floating around for years)! It’s not your father’s ETL anymore……..[but terms stick, so for now we’ll go with it unless any of you have better suggestions for us and our friends at the analysts 🙂 ].
If not ETL for Real Time, what else? A lot has already been written on ETL (Extract Transform Load) vs EAI (Enterprise Application Integration), with ETL generally being credited with better high volume abilities, and EAI better at complex, multi-construct (occurs, record types) sources and targets, and other pros and cons for either. As I learn more about how to manage this site I’ll create a page with my favorite links on this subject. In many of these comparisons, real-time often defaults to the EAI category.
However, one area that is often overlooked in this comparison are what you might call two “soft” issues — the user community, your teammates who will actually be doing the development, and the requirements for meta data management. While there are exceptions, ETL tools “tend” to be used by what I like to refer to as “data professionals.” These are folks who may have formal programming backgrounds, but gravitated to their role in the enterprise because they understand the business and they know the data. With their initial focus on business intelligence, ETL tools (I know, beauty is in the eye of the beholder) are often more inviting to this type of user. Not an “end-user” by any means, but also not the user who is typically comfortable with C header files, java types and code snippets. ETL vendors have competed for years on the usability issue. Their success with DBAs and more technical end users is a testament to their appeal.
The other “soft” issue worth noting as ETL moves into “real time” is the support for meta data. No longer is meta data something that people merely pay lip service to. Data lineage and impact analysis — the abilities to link a column name to a real-time Service, its rdbms target, its ERwin model AND its business intelligence report are unique to ETL tools. Most EAI type tools, until recently, could hardly spell metadata, let alone provide impact analysis and data lineage reporting from soup to nuts. This is changing, but deep metadata reporting has been a key component in the data warehousing space (and thus receiving massive investment from ETL vendors) for ten years or more.
Data Governance, regulatory compliance, and metadata management are on everyone’s minds. We can’t pay lip service to metadata and data lineage for any kind of data integration. SOA and real-time data integration need the deep metadata support provided by ETL tooling, as much as business intelligence applications do.
Increasingly, ETL tools, and the platforms they operate in are being chosen for real time data integration because of their support for meta data, and the preference of “data professionals” for these tools over their “closer-to-the-code” IDE tool cousins for programming development.