Data Integrity, Part 2: Do You Validate?
Back in part 1, we noted the three integral tasks involved in maintaining data integrity. That earlier post dealt with duplication control. A later series will be all about data orchestration. Today, let’s discuss data validation. This is more than just auditing the correctness and completeness of the data coming into your organization.
What You Should Demand of Your Upstream Data
It’s too simple for a good coder to throw in an ETL fix. It solves the problem fast. But this action always throws a wrench into a firm’s data velocity. These little ETL fixes are almost always outside of a standard SDLC (software development life cycle) . Not to mention this organization’s ETL is now tied to a specific IT person or department. What happens when that developer gets busy on another project? What happens when that developer leaves the company>
Best practice: Choose an ETL tool that is sustainable now and in the future. Our K3 ETL solution and accompanying K3 connectors deliver data integrity and long-term sustainability with no coding required. Allowing non-coding business analysts to drive the majority of ETL functions is key to improving and maintaining overall data velocity.
What You Should Demand of Your Upstream Data
For customer resource management, enterprise resource planning, business response, or other downstream applications to use input data effectively, you have to extract, transform, and load (ETL) it properly. Because the quality of the data coming out of this process depends on the information going in, validation ensures that incoming data plays by three ground rules:
- There are no unexpected null values.
- The values filling each field is reasonable and within business expectations.
- The data is formatted in a way that can be used by downstream systems and analysis.
These are the basics of data validation and your ETL platform needs to be able to solve them.; This is exceptionally important, because the cost to your business for chasing these data errors is enormous. How many times has your data analysis team worked late into the night because some key process was fed null values. It hurts, but it just does not have to be that way.
How many times has your data analysis team worked late into the night because some key process was fed null values.
Data Validation the K3 Way
K3 takes data validation off Excel sheets and out of the hands of platforms better suited for data storage rather than data integration and transformation. K3 validates data at several stages, locations, and timeframes during each phase of the ETL process:
- Comparing incoming data from all sources whose information will be combined or cross-referenced. If one table formats the date as dd/mm/yy and another has it mm/dd/yyyy, K3 normalizes the values
- Deciding if the incoming data seems reasonable. If 99 percent of the order quantities fall between 1 and 10 and K3 suddenly encounters one for 2,500, the platform will flag the field for further evaluation. Is it an outlier or a data-input error?
- Checking for completeness. Does the incoming data contain all the columns the downstream application is expecting? Does it contain additional information? K3 performs this task across all contributing databases, even when it comes from different sources, siloes and external sources.
- Capturing new and changed data. K3 eliminates extraneous cost and processing time by employing change data capture (CDC) to only evaluate fields containing data that have changed since the last time it accessed the table. This streamlines the backend as well. Rather than flowing all data, deduping and processing it in downstream apps, K3 performs these tasks before data gets there, saving time and space.
K3 takes data validation off Excel sheets and out of the hands of platforms better suited for data storage than data integration and transformation. K3 validates data at several stages, locations, and timeframes during each phase of the ETL process