Data is the new oil – what grade is yours?

Bill Bryson’s book “One Summer: America 1927” provides a fascinating insight into the world of Aviation in the “roaring 20’s”. Aviators were vying to be the first to cross the Atlantic from New York to Paris, a challenge that took many lives, most of which were European.  

Bryson tells us “The American flyers also had an advantage over their European counterparts that nobody yet understood. They all used aviation fuel from California, which burned more cleanly and gave better mileage. No one knew what made it superior because no one yet understood octane ratings – that would not come until the 1930s – but it was what got most American planes across the ocean while others were lost at sea.

Once octane ratings were understood, fuel quality was measured and lives were saved.

We’ve all heard that data is the new oil. To benefit from this “new oil”, you must ensure you use “top grade” only. It can make the difference between business success and failure. It is also a prerequisite for Regulatory compliance, (Solvency II, FATCA, Dodd Frank, Basel III, BCBS 239 etc.). Thankfully, like octane ratings, we know how to measure data quality using 6 primary dimensions: completeness; validity; accuracy; uniqueness; timeliness and consistency. For more details see my post: Major step forward in Data Quality Measurement.

I also explore this topic in my post Russian Gas Pipe and Data Governance.

What happens in your organisation? Do you measure the quality of your most critical data, or do you fly on a wing and a prayer? Please add your comments below.

Major step forward in Data Quality Measurement

How tall are you?
What is the distance between Paris and Madrid?
How long should one cook a 4.5Kg turkey for – and at what temperature?

Quality data is key to a successful business. To manage data quality, you must measure it


We can answer the above questions thanks to “standard dimensions”:

Height: Metres / Feet
Distance: Kilometres / Miles
Time: Hours & Minutes
Temperature: Degrees Celsius / Farenheit

Life would be impossible without the standard dimensions above, even though the presence of “alternate” standards such as metric Vs Imperial can cause complexity.

We measure things for a reason. Based on the measurements, we can make decisions and take action. Knowing our neck size enables us to decide which shirt size to choose. Knowing our weight and our waist size may encourage us to exercise more and perhaps eat less.

We measure data quality because poor data quality has a negative business impact that affects the bottom line.  Rectifying data quality issues requires more specific measurement than anecdotal evidence that data quality is “less than satisfactory”.

The great news is that 2013 marked a major step forward in the agreement of standard dimensions for data quality measurement.

In October 2013, following an 18 month consultative process DAMA UK published a white paper called DAMA UK DQ Dimensions White Paper R3 7.

The white paper lists 6 standard data quality dimensions and provides worked examples. The 6 are:

1. Completeness
2. Uniqueness
3. Timeliness
4. Validity
5. Accuracy
6. Consistency

The dimensions are not new. I referred to 5 of them in a blog post in 2009 There is little understanding among senior management of what “Data Quality” means.
The good news is that this white paper pulls together the thinking of many DQ professionals and provides a full explanation of the dimensions. More importantly, it emphasises the criticality of assessing the organisational impact of poor data quality. I include a quote below:

“Examples of organisational impacts could include:
• incorrect or missing email addresses would have a significant impact on any marketing campaigns
• inaccurate personal details may lead to missed sales opportunities or a rise in customer complaints
• goods can get shipped to the wrong locations
• incorrect product measurements can lead to significant transportation issues i.e. the product will not fit into a lorry, alternatively too many lorries may have been ordered for the size of the actual load
Data generally only has value when it supports a business process or organisational decision making.”

I would like to thank DAMA UK for publishing this whitepaper. I expect to refer to it regularly in my day to day work. It will help me build upon my thoughts in my blog post Do you know what’s in the data you’re consuming?

Hopefully regulators worldwide will refer to this paper when considering data quality management requirements.

Some excellent articles / blog posts / videos referring to this whitepaper include:

Nicola Askham – Data Quality Dimensions

3-2-1 Start Measuring Data Quality ()

Great Data Debate (2) Danger in Dimensions, Kenneth MacKinnon

How do you expect this paper will affect your work? Please share your thoughts. 

The growing demand for food and data provenance

In November 2012, I presented at the Data Management and Information Quality Europe 2012 conference, in London. My presentation was called Do you know what’s in the data you’re consuming.

In the presentation, I compare the data supply chain with the food supply chain.

I believe that data consumers have the right to be provided with facts about the content of the data they are consuming, just as food consumers are provided with facts about the food they are buying. The presentation provides guidelines on how you can improve your data supply chain.

Little did I realise that within 3 months the term “provenance” would be hitting the headlines due to the European horsemeat scandal.

There’s a silver lining in this food scandal for data quality management professionals. As financial regulators increasingly demand evidence of the provenance of the data provided to them, it is now easier for data quality management professionals to explain to their business colleagues and senior management what “data provenance” means, and what it requires.  Retailers, such as Tesco, must have controls in their supply chain that ensure that the food they sell to consumers only contains “what it says on the tin”. Similarly, financial services organisations providing data to financial regulators must have controls in their data supply chain that ensure the quality of the data they provide can be trusted. Regulators are now asking financial services organisations to demonstrate evidence that their data supply chain can be trusted. They require organisations to demonstrate evidence of their data provenance, as applied to their critical or material data.

But what exactly is “data provenance”? The best definition I have seen comes from Michael Brackett in his excellent book “Data Resource Simplexity“.

“Data Provenance is provenance applied to the organisation’s data resource. The data provenance principle states that the source of data, how the data were captured, the meaning of the data when they were first captured, where the data were stored, the path of those data to the current location, how the data were moved along that path, and how those data were altered along that path must be documented to ensure the authenticity of those data and their appropriateness for supporting the business”.

Enjoy your “beef” burger!

The dog and the frisbee and data quality management

The Wall Street journal reported it as the “Speech of the year“.

In a speech with the intriguing title “The dog and the frisbee“, Andrew Haldane, the Bank of England Director of Financial Stability has questioned whether the Emperor (in the form of ever increasing, ever more complex regulations such as Solvency II, BASEL III and Dodd Frank) is naked. He points out that the BASEL regulations, which have increased from 30 pages to over 600 pages completely failed to identify banks that were at risk of collapse, while a simple measure of the bank’s leverage ratio did identify them.

He also points out “Dodd-Frank makes Glass-Steagall look like throat-clearing.” The Glass-Steagall act of 1933, which separated commercial and investment banking, ran to a mere 37 pages; the Dodd-Frank act of 2010 ran to 848, and may spawn a further 30,000 pages of detailed rule-making by various agencies.

I recommend you read the speech yourself – his arguments, together with his wit are superb. I include a brief extract below:

‘In the UK, regulatory reporting was introduced in 1974. Returns could have around 150 entries. In the Bank of England archives is a memo to George Blunden, who was to become Deputy Governor, on these proposed regulatory returns. Blunden’s handwritten comment reads: “I confess that I fear we are in danger of becoming excessively complicated and that if so we may miss the wood from the trees”.

Today, UK banks are required to fill in more than 7,500 separate cells of data – a fifty-fold rise. Forthcoming European legislation will cause a further multiplication. Banks across Europe could in future be required to fill in 30–50,000 data cells spread across 60 different regulatory forms. There will be less risk of regulators missing the wood from the trees, but only because most will have needed to be chopped down.’

Brilliant !

Andrew Haldene is calling for more simple, basic rules. I agree with him,

I have worked in data management for over 30 years. The challenges I see today are the same challenges that arise time and time again. They are not Solvency II specific, BASEL specific, or Dodd Frank specific. They are universal. They apply to all critical data within all businesses.

The fundamental truth is “The data is unique, but the data management principles are universal”

It is time to stop writing specific data management and data quality management requirements into specific legislation.  Regulators should co-operate with the data management profession, via independent organisations such as DAMA International, to develop a common sense universal standard, and put the effort into improving such a standard.

What do you think? I welcome your comments.

The Queen’s Speech and Data Governance

The Queen of England made an historic and welcome visit to Ireland in 2011.  She delivered a memorable speech at the Irish State banquet, in which she said “With the benefit of historical hindsight, we can all see things which we wish had been done differently, or not at all”.

In real life, we cannot change the past.  The same does not apply to data created in the past.  Regulators now expect financial institutions to:

  • Identify data quality mistakes made in the past
  • Correct material mistakes
  • Implement data governance controls to prevent recurrences

I quote from the UK Financial Regulator’s requirement that all deposit holding financial institutions deliver a single customer view (SCV) of deposit holders:  “There may be a number of reasons why SCV data is not 100% accurate. This might be due to defects in the systems used to compile the SCV, but we would expect such defects to be picked up and rectified during the course of the systems’ development.”

Dodd-Frank, Solvency II, FATCA, BASEL III and many more regulations all require similar. Use this checklist to check if your organisation suffers any common Enterprise-Wide Data Governance Issues.

What data quality mistakes have you uncovered from the past, and how have you corrected them? I’d love to hear about them.

Do you know what’s in the data you’re consuming?

Standard facts are provided about the food we buy

These days, food packaging includes ingredients and a standard set of nutrition facts.  This is required by law in many countries.

Food consumers have grown accustomed to seeing this information, and now expect it. It enables them to make informed decisions about the food they buy, based on a standard set of facts.

Remarkable as it may seem, data consumers are seldom provided with facts about the data feeding their critical business processes.

Most data consumers assume the data input to their business processes is “right”, or “OK”.  They often assume it is the job of the IT function to ensure the data is “right”.  But only the data consumer knows the intended purpose for which they require the data.  Only the data consumer can decide whether the data available satisfies their specific needs and their specific acceptance criteria. To make an informed choice, data consumers need to be provided with facts about the data content available.

Data Consumers have the right to make informed decisions based on standard data content facts

The IT function, or a data quality function, can, and should provide standard “data content facts” about all critical data such as the facts shown in the example.

In the sample shown, a Marketing Manager wishing to mailshot customers in the 40-59 age range might find that the data content facts satisfy his/her data quality acceptance criteria.

The same data might not satisfy the acceptance criteria for a manager in the Anti Money Laundering (AML) area requesting an ETL process to populate a new AML system.

Increasing regulation means that organisations must be able to demonstrate the quality and trace the origin of the data they use in critical business processes.

In Europe, Solvency II requires insurance and re-insurance undertakings to demonstrate the data they use for solvency calculations is as complete, appropriate and accurate as required for the intended purpose. Other regulatory requirements such as Dodd Frank in the USA, BASEL III and BCBS 239 are also seeking increasing transparency regarding the quality of data underpinning our financial system.

While regulation may be a strong driving force for providing standard data content facts, an even stronger one is the business benefit that to be gained from being informed.  Some time ago Gartner research showed that approximately 70% of CRM projects failed.  I wonder were the business owners of the proposed CRM system shown data content facts about the data available to populate the proposed CRM system?

In years to come, we will look back on those crazy days when data consumers were not shown data content facts about the data they were consuming.

Incomplete loan data puts €8.2billion at risk

Ireland’s leading business newspaper, The Sunday Business Post, reported on 13th Nov 2011 that incomplete loan documentation data could complicate banks’ ability to take security on €8.2billion worth of loans, in the event of a default.  (Click here to see the full article).

Data quality measurement can detect incomplete data

Central bank researchers discovered incomplete data in 78,000 of 688,000 loans surveyed. The researchers were producing a paper for a conference on the Irish mortgage market on October 13 2011. They found 10,094 loans lacked a property identifier, 35,044 had no initial valuation, 15,413 had no valuation date, and 18,628 specified no geographic data.

Similar issues with bad loan data let to greater haircuts for the banks when the National Asset Management Agency (NAMA) transferred billions in assets in 2009 and 2010.  In the US, banks have been stopped from pursuing delinquent borrowers where loan data was incomplete or missing.

How could such a situation arise?  How can similar problems be prevented?

Front line staff are often under pressure to complete a sale, and “sort out the details later” (previously discussed here) .  Hence even the most robust and vigorous data validation processes often provide a “bypass” facility. This is normal business practice, and perfectly acceptable.   In many instances, critical documentation for a loan (or other product), may not be available at the time of data entry. Problems only arise if no one goes back to “sort out the details later”.  One or two loans with incomplete data may not pose a major risk, – but incomplete data in 10% of a loan book spells serious trouble.

Common sense data quality management steps can prevent similar problems arising in your organisation. Data validation alone is insufficient.  Data quality measurement, and on-going data quality monitoring is required.

In the case study reported above, central bank researchers used data quality measurement to detect the incomplete loan data.  Similar data quality measurement can and should be incorporated into all business critical systems.  Regular monitoring could generate an alert when the % of loans with incomplete data exceeds a threshold – say 2%.  Alternatively, monitoring could generate an alert when the time limit for “sorting the details out later” has been exceeded.

This case study highlights the difference between data validation and data quality measurement.  I will deal with this topic in my next post.

Feedback, as always, most welcome.