Data is the new oil – what grade is yours?

Bill Bryson’s book “One Summer: America 1927” provides a fascinating insight into the world of Aviation in the “roaring 20’s”. Aviators were vying to be the first to cross the Atlantic from New York to Paris, a challenge that took many lives, most of which were European.  

Bryson tells us “The American flyers also had an advantage over their European counterparts that nobody yet understood. They all used aviation fuel from California, which burned more cleanly and gave better mileage. No one knew what made it superior because no one yet understood octane ratings – that would not come until the 1930s – but it was what got most American planes across the ocean while others were lost at sea.

Once octane ratings were understood, fuel quality was measured and lives were saved.

We’ve all heard that data is the new oil. To benefit from this “new oil”, you must ensure you use “top grade” only. It can make the difference between business success and failure. It is also a prerequisite for Regulatory compliance, (GDPR, Solvency II, FATCA, Dodd Frank, Basel III, BCBS 239 etc.). Thankfully, like octane ratings, we know how to measure data quality using 6 primary dimensions: completeness; validity; accuracy; uniqueness; timeliness and consistency. For more details see my post: Major step forward in Data Quality Measurement.

I also explore this topic in my post Russian Gas Pipe and Data Governance.

What happens in your organisation? Do you measure the quality of your most critical data, or do you fly on a wing and a prayer? Please add your comments below.

Major step forward in Data Quality Measurement

How tall are you?
What is the distance between Paris and Madrid?
How long should one cook a 4.5Kg turkey for – and at what temperature?

Image of measuring tapes courtesy of pixabay
Quality data is key to a successful business. To manage data quality, you must measure it – Image courtesy of Pixabay.com

We can answer the above questions thanks to “standard dimensions”:

 

Height: Metres / Feet
Distance: Kilometres / Miles
Time: Hours & Minutes
Temperature: Degrees Celsius / Farenheit

Life would be impossible without the standard dimensions above, even though the presence of “alternate” standards such as metric Vs Imperial can cause complexity.

We measure things for a reason. Based on the measurements, we can make decisions and take action. Knowing our neck size enables us to decide which shirt size to choose. Knowing our weight and our waist size may encourage us to exercise more and perhaps eat less.

We measure data quality because poor data quality has a negative business impact that affects the bottom line.  Rectifying data quality issues requires more specific measurement than anecdotal evidence that data quality is “less than satisfactory”.

The great news is that 2013 marked a major step forward in the agreement of standard dimensions for data quality measurement.

In October 2013, following an 18 month consultative process DAMA UK published a white paper called DAMA UK DQ Dimensions White Paper R3 7.

The white paper lists 6 standard data quality dimensions and provides worked examples. The 6 are:

1. Completeness
2. Uniqueness
3. Timeliness
4. Validity
5. Accuracy
6. Consistency

The dimensions are not new. I referred to 5 of them in a blog post in 2009 There is little understanding among senior management of what “Data Quality” means.
The good news is that this white paper pulls together the thinking of many DQ professionals and provides a full explanation of the dimensions. More importantly, it emphasises the criticality of assessing the organisational impact of poor data quality. I include a quote below:

“Examples of organisational impacts could include:
• incorrect or missing email addresses would have a significant impact on any marketing campaigns
• inaccurate personal details may lead to missed sales opportunities or a rise in customer complaints
• goods can get shipped to the wrong locations
• incorrect product measurements can lead to significant transportation issues i.e. the product will not fit into a lorry, alternatively too many lorries may have been ordered for the size of the actual load
Data generally only has value when it supports a business process or organisational decision making.”

I would like to thank DAMA UK for publishing this whitepaper. I expect to refer to it regularly in my day to day work. It will help me build upon my thoughts in my blog post Do you know what’s in the data you’re consuming?

Hopefully regulators worldwide will refer to this paper when considering data quality management requirements.

Some excellent articles / blog posts / videos referring to this whitepaper include:

Nicola Askham – Data Quality Dimensions

3-2-1 Start Measuring Data Quality ()

Great Data Debate (2) Danger in Dimensions, Kenneth MacKinnon

How do you expect this paper will affect your work? Please share your thoughts. 

The dog and the frisbee and data quality management

The Wall Street journal reported it as the “Speech of the year“.

In a speech with the intriguing title “The dog and the frisbee“, Andrew Haldane, the Bank of England Director of Financial Stability has questioned whether the Emperor (in the form of ever increasing, ever more complex regulations such as Solvency II, BASEL III and Dodd Frank) is naked. He points out that the BASEL regulations, which have increased from 30 pages to over 600 pages completely failed to identify banks that were at risk of collapse, while a simple measure of the bank’s leverage ratio did identify them.

He also points out “Dodd-Frank makes Glass-Steagall look like throat-clearing.” The Glass-Steagall act of 1933, which separated commercial and investment banking, ran to a mere 37 pages; the Dodd-Frank act of 2010 ran to 848, and may spawn a further 30,000 pages of detailed rule-making by various agencies.

I recommend you read the speech yourself – his arguments, together with his wit are superb. I include a brief extract below:

‘In the UK, regulatory reporting was introduced in 1974. Returns could have around 150 entries. In the Bank of England archives is a memo to George Blunden, who was to become Deputy Governor, on these proposed regulatory returns. Blunden’s handwritten comment reads: “I confess that I fear we are in danger of becoming excessively complicated and that if so we may miss the wood from the trees”.

Today, UK banks are required to fill in more than 7,500 separate cells of data – a fifty-fold rise. Forthcoming European legislation will cause a further multiplication. Banks across Europe could in future be required to fill in 30–50,000 data cells spread across 60 different regulatory forms. There will be less risk of regulators missing the wood from the trees, but only because most will have needed to be chopped down.’

Brilliant !

Andrew Haldene is calling for more simple, basic rules. I agree with him,

I have worked in data management for over 30 years. The challenges I see today are the same challenges that arise time and time again. They are not Solvency II specific, BASEL specific, or Dodd Frank specific. They are universal. They apply to all critical data within all businesses.

The fundamental truth is “The data is unique, but the data management principles are universal”

It is time to stop writing specific data management and data quality management requirements into specific legislation.  Regulators should co-operate with the data management profession, via independent organisations such as DAMA International, to develop a common sense universal standard, and put the effort into improving such a standard.

What do you think? I welcome your comments.

Risk data aggregation and risk reporting (BCBS 239) – Board and senior management responsibilities

Post #2 in my series on Data aggregation and reporting principles (BCBS 239) – applied common sense

I was saddened to hear of the death on July 16th of Steven Covey, author of The Seven Habits of Highly Effective PeopleI have found the 7 habits very useful in my work as a data consultant.

Two of the habits apply directly to this blog post.

  • Habit 1: Be Proactive
  • Habit 2: Begin with the End in Mind

I imagine the authors of BCBS 239, “Principles for effective risk data aggregation and reporting principles” are also familiar with the 7 habits, since the principles appear to be based on them.

Habit 1: Be Proactive

Regulatory supervisors expect the board and senior management to “be proactive” in taking responsibility for risk data aggregation and risk reporting.  The following quotes from the document illustrate my point:

Section I. “Overarching governance and infrastructure”

Paragraph 20: “… In particular, a bank’s board and senior management should take ownership of implementing all the risk data aggregation and risk reporting principles and have a strategy to meet them within a timeframe agreed with their supervisors… by 2016 at the latest.”

Paragraph 21. “A bank’s board and senior management should promote the identification, assessment and management of data quality risks as part of its overall risk management framework…. A bank’s board and senior management should review and approve the bank’s group risk data aggregation and risk reporting and ensure that adequate resources are deployed.”

Habit 2: Begin with the End in Mind

I advise my clients to “Begin with the end in mind” – by defining clear, measurable and testable requirements.

The authors of the Basel principles appear to agree.  The board and senior management are the people who must assess the risks faced by the financial institution, therefore they are the people who must specify the information they want in the risk reports. Don’t take my word for it – the following quotes from the document illustrate my point:

Principle 9: Clarity

Paragraph 53. “As one of the key recipients of risk management reports, the bank’s board is responsible for determining its own risk reporting requirements.

Paragraph 55: “Senior management is one of the key recipients of risk reports and is also responsible for determining its own risk reporting requirements.”

What is the impact of the above? 

Regulators will expect to see evidence of documented risk reporting requirements, signed off by the board and senior management.

Where are yours?

Data aggregation and reporting principles – applied common sense

Principles for effective risk data aggregation and risk reporting

Basel Consultative Document
Data aggregation and reporting principles (BCBS 239)

Those of you familiar with my blog will know that I am a fan of common sense.

I believe that data quality management requires one to apply common sense principles and processes to your data.  I believe that the same common sense principles apply regardless of the industry you are in.

Your data will be unique, but the common sense questions you must ask yourself will be the same.  They include:

  • What MI reports do we need to run our business?
  • What critical data do we need in our MI reports?
  • Who owns and is responsible for gathering the critical data we need in our MI reports?
  • What should our critical data contain?
  • What metrics do we have to verify our critical data contains what it should?
  • etc…

Click on the image to see a document that lists what I regard as “common sense” data aggregation and reporting principles.  They were published as a consultative document on 26th June 2012 by the Basel committee on Banking Supervision (BCBS). The principles are commonly known as BCBS 239. The committee invited comments from interested parties, which are available at http://www.bis.org/publ/bcbs222/comments.htm. I co-operated with a group of fellow independent data professionals to comment and you may see our comments at http://www.bis.org/publ/bcbs222/idpg.pdf. You may see the final version at http://www.bis.org/publ/bcbs239.pdf. The largest banks in the world (known as Global Systemically Important Banks, or G-SIBS) must comply by Jan 2016. Other, “Domestic Systemically important banks”, or D-SIBS, must reach compliance three years after the date on which they were so designated, which varies by bank. Many received their designation during 2014.

While the document is targeted at risk management within the banking industry, the principles apply to all industries. The document explicitly refers to “Risk data aggregation and risk reporting” – I suggest you ignore the word risk and read it as “data aggregation and reporting principles”.

Over the next while I plan to explore some the principles proposed in the document. I plan to explore the practical challenges that arise when one seeks to implement common sense data quality management principles. I welcome your input.  If you have a specific question – let me know – I will do my best to answer it.

Risk data aggregation and risk reporting – Board and senior management responsibilities

BCBS 239 compliance D-Day – Data Quality Risk Checklist

Basel Committee issues “Principles for effective risk data aggregation and risk reporting – final document” (aka BCBS 239)

FSA imposes £2.4 million fine for inadequate risk reporting systems

 

Do you know what’s in the data you’re consuming?

Standard facts are provided about the food we buy

These days, food packaging includes ingredients and a standard set of nutrition facts.  This is required by law in many countries.

Food consumers have grown accustomed to seeing this information, and now expect it. It enables them to make informed decisions about the food they buy, based on a standard set of facts.

Remarkable as it may seem, data consumers are seldom provided with facts about the data feeding their critical business processes.

Most data consumers assume the data input to their business processes is “right”, or “OK”.  They often assume it is the job of the IT function to ensure the data is “right”.  But only the data consumer knows the intended purpose for which they require the data.  Only the data consumer can decide whether the data available satisfies their specific needs and their specific acceptance criteria. To make an informed choice, data consumers need to be provided with facts about the data content available.

Data Consumers have the right to make informed decisions based on standard data content facts

The IT function, or a data quality function, can, and should provide standard “data content facts” about all critical data such as the facts shown in the example.

In the sample shown, a Marketing Manager wishing to mailshot customers in the 40-59 age range might find that the data content facts satisfy his/her data quality acceptance criteria.

The same data might not satisfy the acceptance criteria for a manager in the Anti Money Laundering (AML) area requesting an ETL process to populate a new AML system.

Increasing regulation means that organisations must be able to demonstrate the quality and trace the origin of the data they use in critical business processes.

In Europe, Solvency II requires insurance and re-insurance undertakings to demonstrate the data they use for solvency calculations is as complete, appropriate and accurate as required for the intended purpose. Other regulatory requirements such as Dodd Frank in the USA, BASEL III and BCBS 239 are also seeking increasing transparency regarding the quality of data underpinning our financial system.

While regulation may be a strong driving force for providing standard data content facts, an even stronger one is the business benefit that to be gained from being informed.  Some time ago Gartner research showed that approximately 70% of CRM projects failed.  I wonder were the business owners of the proposed CRM system shown data content facts about the data available to populate the proposed CRM system?

In years to come, we will look back on those crazy days when data consumers were not shown data content facts about the data they were consuming.

How to deal with Gobbledygook requirements

In my last post I had a bit of a rant about the Gobbledygook “eligibility requirements” provided by the UK Financial Services Compensation Scheme.

The reality is that business requirements can come from many places, they are often vague, and often overly complex.  They are often imposed on you from outside, as in the case of regulatory requirements, like the UK regulatory requirement to deliver a Single Customer View.

So… life can be tough – you have to get on with it, and deal with “less than perfect” requirements.

Well defined requirements are clear, measurable and testable.  You cannot expect business experts to be expert in defining requirements. As a Data Quality professional, one of your roles is to work with business experts to clarify and simplify their requirements.

Let us look at the “eligibility requirements” provided by the UK Financial Services Compensation Scheme

In essence, some customer types are eligible for compensation, while others are not.  You must use your “parsing skills” to parse the overly complex rules – to “sort the apples from the oranges” so to speak.   Start by listing unique customer types, which include:

  • Sole Trader
  • Credit Union
  • Collective Investment Scheme
  • Trustee of a Collective Investment Scheme
  • Operator of a Collective Investment Scheme

Having done that, you can begin the task of finding out whether you can currently identify these customer types within your data.

The above is just a starting point, I hope it helps.

Feedback welcome, as always.

The Ryanair Data Entry Model

I was prompted to write about the “Ryanair Data Entry Model” by an excellent post by Winston Chen on “How to measure Data Accuracy”.

Winston highlights the data quality challenge posed by incorrect data captured at point of entry.  He illustrates one cause as the use of default drop down selection options. He cites an example of a Canadian law enforcement agency that saw a disproportionately high occurrence of “pick pocketing” within crime statistics.  Further investigation revealed that “pick pocketing” was the first option in a drop down selection of crime types.

Winston provides excellent suggestions on how to identify and prevent this source of data quality problems.  Dylan Jones of Dataqualitypro.com and others have added further great tips in the comments.

I believe you need to make Data Quality “matter” to the person entering the data – hence I recommend the use of what I call the “Ryanair Data Entry Model”.   This is the data entry model now used by most low cost airlines. As passengers, we are required to enter our own data. We take care to ensure that each piece of information we enter is correct – because it matters to us.  The same applies when we make any online purchase.

With Ryanair, it is impossible to enter an Invalid date (e.g. 30Feb), but it is easy to enter the “wrong date” for our needs. E.g. We may wish to Fly on a Sunday, but by mistake we could enter the date for Monday.

We ensure that we select the correct number of bags, since each one costs us money. We try to avoid having to pay for insurance, despite Ryanair’s best efforts to force it on us.

It may not be easy to have data entry “matter” to the persons performing it in your organisation – but this is what you must do if you wish to “stop the rot” and prevent data quality problems “at source”. To succeed, you must measure data quality at the point of entry, provide immediate feedback to the data entry person (helping them to get it right first time). Where possible, you should include data entry quality in a person’s performance review – reward for good data quality, and lack of reward for poor data quality.

Poor quality data entered at source is a common Data Governance issue, which I discuss further here:

Have you encountered examples of poor data quality entered at source?  Have you succeeded in identifying and preventing this problem? Please share your success (and horror !) stories.

Solvency II mandates Data Governance

Welcome to part 3 of Solvency II Standards for Data Quality – common sense standards for all businesses.

Regardless of the industry you work in, you make critical business decisions based on the information available to you.  You would like to believe the information is accurate.  I suggest the CEIOPS’ standards for “Accuracy”apply to your business, and your industry, just as much as they apply to the insurance industry.  I would welcome your feedback…

The CEIOPS (now renamed EIOPA) advice makes it clear that Solvency II requires you to have Data Governance in place (which CEIOPS / EIOPA refers to as “internal systems and procedures”).   The following sections of the document make this clear:

3.32 In order to ensure on a continuous basis a sufficient quality of the data used in the valuation of technical provisions, the undertaking should have in place internal systems and procedures covering the following areas:

• Data quality management;

• Internal processes on the identification, collection, and processing of data; and

• The role of internal/external auditors and the actuarial function.

3.1.4.1 Data quality management – Internal processes

3.33 Data quality management is a continuous process that should comprise the following steps:

a) Definition of the data;

b) Assessment of the quality of data;

c) Resolution of the material problems identified;

d) Monitoring data quality.

I will explore the above further in my next post.  Meanwhile, what Data Quality Management processes do you have in place?  Do you suffer from common Enterprise-Wide Data Governance Issues?

What does complete appropriate and accurate mean?

Welcome to part 2 of Solvency II Standards for Data Quality – common sense standards for all businesses.

The Solvency II Standards for Data Quality run to 22 pages and provide an excellent substitute to counting sheep if you suffer from insomnia. They are published by The Committee of European Insurance and Occupational Pensions Supervisors (CEIOPS) (now renamed as EIOPA).

Solvency II Data Quality Standards – not as page turning as a Dan Brown novel

I accept that Data Quality Standards cannot aspire to be as page turning as a Dan Brown novel – but plainer English would help.

Anyway – enough  complaining.  As mentioned in part 1, the standards require insurance companies to provide evidence that their Solvency II submissions are based on data that is “as complete, appropriate, and accurate as possible”.  In this post, I will explore what the regulator means by “complete”, “appropriate” and “accurate”.  I will look at the terms in the context of data quality for Solvency II, and will highlight how the same common sense standards apply to all organisations.

APPROPRIATE: “Data is considered appropriate if it is suitable for the intended purpose” (page 19, paragraph 3.62).

Insurance companies must ensure they can provide for insurance claims. Hence, to be “appropriate”, the data must relate to the risks covered, and the value of the capital they have to cover potential claims.  Insurance industry knowledge is required to identify the “appropriate” data, just as Auto Industry knowledge is required to identify data “appropriate” to the Auto industry etc.

COMPLETE: (This one is pretty heavy, but I will include it verbatim, and then seek to simplify – all comments, contributions and dissenting opinions welcome) (page 19, paragraph 3.64)

“Data is considered to be complete if:

  • it allows for the recognition of all the main homogeneous risk groups within the liability portfolio;
  • it has sufficient granularity to allow for the identification of trends and to the full understanding of the behaviour of the underlying risks; and
  • if sufficient historical information is available.”

As I see it, there must be enough data, at a low enough level of detail, to provide a realistic picture of the main types of risks covered. Enough Historical data is also required, since history of past claims provides a basis for estimating the scale of future claims.

As with the term “Appropriate”,  I believe that Insurance industry knowledge is required to identify the data required to ensure that data is “complete”.

ACCURATE: I believe this one is “pure common sense”, and applies to all organisations, across all industries. (page 19, paragraph 3.66)

Data is considered accurate if:

  • it is free from material mistakes, errors and omissions;
  • the recording of information is adequate, performed in a timely manner and is kept consistent across time;
  • a high level of confidence is placed on the data; and
  • the undertaking must be able to demonstrate that it recognises the data set as credible by using it throughout the undertakings operations and decision-making processes.

Update – In October 2013, following an 18 month consultative process, DAMA UK published a white paper explaining 6 primary data quality dimensions.

1. Completeness
2. Uniqueness
3. Timeliness
4. Validity
5. Accuracy
6. Consistency

For more details see my blog post, Major step forward in Data Quality Measurement