Lego Blocks and data quality

Lego Plane

Lego Plane

Lego blocks allow the average person to build practically anything, because they come in standard sizes, and interconnect with ease.

Having built a model, one may later take it apart and reuse the standard blocks to build other models.  One may do this time and again, giving hours of enjoyment.

Plane carved from wood

Plane carved from wood

By contrast, few people have the skill to carve models from wood.

Once carved, it is practically impossible to ‘remodel’, and completely impossible to reuse any of the parts for other than firewood.

What has the above ‘common sense’ got to do with data quality?

Imagine trying to build a lego model using partially moulded lego blocks?  Imagine opening your lego model kit to discover that some of the pieces were missing.  Truly unimaginable.

We in the Data Quality Profession need to educate both Business and IT on the need to create “standard data components”, that can be easily interconnected to satisfy the information requirements of the business.

Currently, the focus of the Data Quality Industry is on data “Fixing” – remoulding data into parts that are more complete, and more useable.  I see this continuing for a long time, due to the vast quantity of legacy data.   I see the focus moving more towards “get it right first time’ with the emphasis on creating completely moulded, standard component parts from the outset.

My interview with Dylan Jones

Dylan Jones of DataQualityPro interviews me about the process I use to assess common Enterprise wide data issues. Use this process to assess the status of data governance within your organisation or that of a client.

Data Quality Pro interview with Ken O'Connor Data Consultant

Russian Gas Pipe and Data Governance

As you know, Russia supplies Gas to many European countries.

What's flowing through your critical data pipelines?

Do you know what’s in your critical data pipelines?

Could you imagine Italy purchasing gas from Russia without checking what exactly was flowing through the pipe?  I’m no expert on gas pipelines, but I know that before completing the agreement to purchase the gas, Italy and Russia would have agreed metrics such as:

  • Volume of Gas
  • Calorific value (Energy content)
  • etc.

So what? What else would one expect?  Applied common sense… yes?

Why is it that such common sense is often lacking in Data Migration and Data Population projects?  Why do some Enterprises continue to perform data population of, and ongoing data entry to, critical data repositories without fully understanding the data they are pumping into the repository?

A simple example involves Date of Birth.  The business ask the IT function to populate Date of Birth in the new AML / BASEL II / CRM / other repository. Some time later, when data population is complete, the business begin to express concerns:

  • “We never realised we had so many customers aged over 100 ???”
  • “I thought we had more Student customers”
  • “How come so many of our customers share the same birthday ?”
  • “These are not the results we expected”
  • etc.

Performing data population on the basis of what the source data “should contain”, without analysing what exactly it does contain is known as ‘Load and Explode’ approach to Data Population.  I cover this Enterprise Wide Data Issue in more detail here.

We in the “Data Governance”, “Data Quality” industry need to educate the business community on the “common sense” parts of data governance, and the need to engage “Data Governance Professionals”  to ensure that “Data Quality Common Sense” is actually applied.

Feedback welcome – Ken

Business users right to good data plumbing

Jim Harris hosted an excellent debate on his blog centred on Rick Sherman’s quote “Data quality is primarily about context not accuracy.  Accuracy is part of the equation, but only a very small portion.”

From my experience, the bottom line from a business perspective is always that “just enough is good enough.” The challenge for the Data Quality profession is to make the business case for “just enough” to be clearly defined in terms of measurable dimensions.

Let me give you a simple analogy. Suppose there is a requirement to water a new lawn. The context is that 500 litres (or liters for our US friends) must be sprayed on the new lawn. One might assume that so long as the water is delivered, the requirement is met…

However, what if:
– A well had to be dug to provide the water?
– The water contains contaminants that will kill the new lawn?
– The hose contains many leaks, and leaks 5,000 litres in delivering 500 (incurring 10 times the water charges)
– etc.

Water sprinkler

Watering a lawn is such an everyday occurrence, that one reasonably assumes that the required ‘plumbing’ is in place to deliver clean water in a cost effective manner. Similarly, business people have the right to assume that they can readily access the information they require. Business people have the right to assume that the required ‘plumbing’ is in place to deliver ‘clean’, ‘complete’, ‘accurate’ ‘timely’, ‘relevant’ ‘usable’ information in a cost effective manner.

Thus we need to split the “plumbing’, which should be standard across all applications, from the business specific, “bespoke by nature” part of data / information management. The business specific stuff, the ‘context’, the ‘really important stuff’ simply cannot happen if the ‘plumbing’ is not in place.

Atacama Desert So, which is more important, the context or the accuracy? Which is more important, the chicken or the egg? Does the data plumbing in your Enterprise support a lush lawn of quality information?  Or is your data plumbing so inadequate that your business users are left scratching around in a barren desert of information? To learn how to assess the data plumbing in your Enterprise click here:

He who knows not and knows not that he knows not…

According to the proverb, one should shun “He who knows not and knows not that he knows not”, since “he is a fool”.

As data quality professionals, seeking to earn a crust in challenging times, we do not have the luxury of shunning potential clients.

Most Enterprises know that mission critical decisions are based on accurate, timely, reliable information.  Unfortunately, too many Enterprises:

  • Know not that accurate, timely, reliable Information depends on the quality of the underlying data.
  • Know not that they know not the importance of data quality to their mission critical decision making.

However, these Enterprises are no fools.  They are successful Enterprises.  They could be even more successful, armed with information that is more accurate, more timely and more reliable.

One of the greatest challenges that faces our profession is to “inform”, “teach”, and “sell” the criticality of data quality to those Enterprises who currently “know not and know not that they know not”.