If you look over the last 20 years or so, data warehouses are well known as a fertile place to wreck your career. And yet, in the corporate word it’s only a matter of time before someone says it: “We need to build a data warehouse.”
Keeping up with scope creep is the killer. New sources of data, changes to existing sources, new requirements from consumers of the data warehouse…all these add up to a point where it’s impossible to keep up with them.
The ETL Challenge Is Easy to Underestimate
True story. We had a client who had a bunch of software systems. When we looked across all of them we found the term “customer” had about 56 different permutations. Sometimes it was a column header variation (cust_name v. customer); sometimes it was a value permutation (Bank of America vs BofA, vs Bank of A. This is but one little example of the geometric ETL challenge. The requirements go from zero to problems squared. One problem becomes 2 becomes 4 becomes 16…
For some reason, software developers are attracted to ETL problems like moths to a flame. It’s a mistake…every time. Here’s why. They are right. They can solve whatever ETL problem quickly with a little bit of code. Over time, a little bit of code turns into a lot of code. And now one and only one person knows how it all works. If you know how to code you should not touch ETL problems with a 10 foot pole. You will be handcuffed to this code for all time.
But There is Another Reason. ETL Tools Have Never Been Better.
You mean like Informatica? No, I do not mean like Informatica. That’s a first generation ETL tool. That is just coding in a different environment.
What I mean is new ETL tools that are business analyst / data analyst friendly. Call it No-Code or Low-Code ETL. The idea here is to open the aperture of ETL so that it’s not just coders who have a stake in the ETL game. Besides, it’s the operational business analysts chasing the problems down anyway. They best know the meaning of the data.
How Good are ETL Tools?
Let me put it this way. This is a golden opportunity for I.T. departments to to get out of the ETL business entirely. I.T.’s job is to deliver the raw data…that’s it. What the business does with it from there is up to them. The tools are entirely there to support this kind of thing. Let’s face it. Microsoft BI, Tableau and friends are eating the world. Let ‘em.
Operational users are happy because they are not waiting around for I.T. to get their code thing done. I.T. teams are happy because it gets them out of what is really a bad situation. Frankly, I.T. teams have been behind a rock and a hard place on this for over a decade. There are only so many coders a budget can afford. And the business just ends up complaining about the slow ETL turn-around.
Listen, K3 is not the only software out there doing this. All I can recommend is to do some shopping and compare vendors. I think, in the end, you will not be disappointed.
Or, it is possible to keep at it, coding away. It’s job security at least. But really, is this kind of data janitorial work what you had in mind when you got that CS degree?