Common Enterprise wide Data Governance Issues #9: Data Migration and ETL projects are Metadata driven

This post is one of a series dealing with common Enterprise Wide Data Governance Issues.  Assess the status of this issue in your Enterprise by clicking here: Data Governance Issue Assessment Process

Too often, Data Migration and ETL projects are built on the basis of Metadata, without measuring what is actually contained in the source data fields.  This happens when the IT function build data ‘pipes’ on the basis of what the metadata says the source fields should contain, and don’t perform data content analysis, or data profiling, to find out what the source fields actually contain.

Impact:
The IT function turn the  ‘tap’ on, the data flows through the ‘pipes’ and the business express surprise, followed by denial, when expectations cannot be met due to data quality issues.  This is known as the ‘Load and Explode’ approach to data.

Solution:
To prevent ‘Load and Explode’ impacting the success of your data dependent projects, agree and apply the following policy:

Before building, or purchasing a system that is dependent on existing data, projects must complete the following process:

  1. Define what data is required.
  2. Define the quality requirements of the required data.
  3. Identify the source of the required data.
  4. Specify the data quality metrics to be captured.
  5. Measure the quality of the available source data.
  6. Understand the implications of the quality of available source data for the proposed system.
  7. If necessary, and if feasible, implement data quality improvement measures to raise the quality to the required level.
  8. Worst case – if the facts tell you data quality is too low and cannot be improved – Cancel the project and save yourself a ton of money!

Your experience:
Have you faced the above issue in your organisation, or while working with clients?  What did you do to resolve it?  Please share your experience by posting a comment – Thank you – Ken.

4 thoughts on “Common Enterprise wide Data Governance Issues #9: Data Migration and ETL projects are Metadata driven

  1. Great post as ever Ken.

    I would just add:

    9. Build in real-time data quality monitoring and alerts to enforce the data quality rules

    The one constant of ETL interfaces is that they change. The people who originally design these data gateways are seldom the same people left to manage them so there is often a “brain-drain” taking the real understanding of the logic and coding with them.

    In a personal example, I remember one company was pushing feeds onto a website for a week before they realised 70% of the data was missing because of a code change upstream.

    Both sides as you say have to agree the data quality rules required and regularly assess them.

    Really like the series you’ve created here Ken, great advice, look forward to the next installment.

    – Dylan

  2. Thanks for the comment Dylan,

    Your suggested addition is excellent advice – “Build in real-time data quality monitoring and alerts to enforce the data quality rules”

    Rgds Ken

  3. Pingback: Twitted by hlsdk

  4. Pingback: Russian Gas Pipe and Data Governance « Ken O'Connor Data Consultant

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s