The Ryanair Data Entry Model

I was prompted to write about the “Ryanair Data Entry Model” by an excellent post by Winston Chen on “How to measure Data Accuracy”.

Winston highlights the data quality challenge posed by incorrect data captured at point of entry.  He illustrates one cause as the use of default drop down selection options. He cites an example of a Canadian law enforcement agency that saw a disproportionately high occurrence of “pick pocketing” within crime statistics.  Further investigation revealed that “pick pocketing” was the first option in a drop down selection of crime types.

Winston provides excellent suggestions on how to identify and prevent this source of data quality problems.  Dylan Jones of Dataqualitypro.com and others have added further great tips in the comments.

I believe you need to make Data Quality “matter” to the person entering the data – hence I recommend the use of what I call the “Ryanair Data Entry Model”.   This is the data entry model now used by most low cost airlines. As passengers, we are required to enter our own data. We take care to ensure that each piece of information we enter is correct – because it matters to us.  The same applies when we make any online purchase.

With Ryanair, it is impossible to enter an Invalid date (e.g. 30Feb), but it is easy to enter the “wrong date” for our needs. E.g. We may wish to Fly on a Sunday, but by mistake we could enter the date for Monday.

We ensure that we select the correct number of bags, since each one costs us money. We try to avoid having to pay for insurance, despite Ryanair’s best efforts to force it on us.

It may not be easy to have data entry “matter” to the persons performing it in your organisation – but this is what you must do if you wish to “stop the rot” and prevent data quality problems “at source”. To succeed, you must measure data quality at the point of entry, provide immediate feedback to the data entry person (helping them to get it right first time). Where possible, you should include data entry quality in a person’s performance review – reward for good data quality, and lack of reward for poor data quality.

Poor quality data entered at source is a common Data Governance issue, which I discuss further here:

Have you encountered examples of poor data quality entered at source?  Have you succeeded in identifying and preventing this problem? Please share your success (and horror !) stories.

Solvency II mandates Data Governance

Welcome to part 3 of Solvency II Standards for Data Quality – common sense standards for all businesses.

Regardless of the industry you work in, you make critical business decisions based on the information available to you.  You would like to believe the information is accurate.  I suggest the CEIOPS’ standards for “Accuracy”apply to your business, and your industry, just as much as they apply to the insurance industry.  I would welcome your feedback…

The CEIOPS (now renamed EIOPA) advice makes it clear that Solvency II requires you to have Data Governance in place (which CEIOPS / EIOPA refers to as “internal systems and procedures”).   The following sections of the document make this clear:

3.32 In order to ensure on a continuous basis a sufficient quality of the data used in the valuation of technical provisions, the undertaking should have in place internal systems and procedures covering the following areas:

• Data quality management;

• Internal processes on the identification, collection, and processing of data; and

• The role of internal/external auditors and the actuarial function.

3.1.4.1 Data quality management – Internal processes

3.33 Data quality management is a continuous process that should comprise the following steps:

a) Definition of the data;

b) Assessment of the quality of data;

c) Resolution of the material problems identified;

d) Monitoring data quality.

I will explore the above further in my next post.  Meanwhile, what Data Quality Management processes do you have in place?  Do you suffer from common Enterprise-Wide Data Governance Issues?

What does complete appropriate and accurate mean?

Welcome to part 2 of Solvency II Standards for Data Quality – common sense standards for all businesses.

The Solvency II Standards for Data Quality run to 22 pages and provide an excellent substitute to counting sheep if you suffer from insomnia. They are published by The Committee of European Insurance and Occupational Pensions Supervisors (CEIOPS) (now renamed as EIOPA).

Solvency II Data Quality Standards – not as page turning as a Dan Brown novel

I accept that Data Quality Standards cannot aspire to be as page turning as a Dan Brown novel – but plainer English would help.

Anyway – enough  complaining.  As mentioned in part 1, the standards require insurance companies to provide evidence that their Solvency II submissions are based on data that is “as complete, appropriate, and accurate as possible”.  In this post, I will explore what the regulator means by “complete”, “appropriate” and “accurate”.  I will look at the terms in the context of data quality for Solvency II, and will highlight how the same common sense standards apply to all organisations.

APPROPRIATE: “Data is considered appropriate if it is suitable for the intended purpose” (page 19, paragraph 3.62).

Insurance companies must ensure they can provide for insurance claims. Hence, to be “appropriate”, the data must relate to the risks covered, and the value of the capital they have to cover potential claims.  Insurance industry knowledge is required to identify the “appropriate” data, just as Auto Industry knowledge is required to identify data “appropriate” to the Auto industry etc.

COMPLETE: (This one is pretty heavy, but I will include it verbatim, and then seek to simplify – all comments, contributions and dissenting opinions welcome) (page 19, paragraph 3.64)

“Data is considered to be complete if:

  • it allows for the recognition of all the main homogeneous risk groups within the liability portfolio;
  • it has sufficient granularity to allow for the identification of trends and to the full understanding of the behaviour of the underlying risks; and
  • if sufficient historical information is available.”

As I see it, there must be enough data, at a low enough level of detail, to provide a realistic picture of the main types of risks covered. Enough Historical data is also required, since history of past claims provides a basis for estimating the scale of future claims.

As with the term “Appropriate”,  I believe that Insurance industry knowledge is required to identify the data required to ensure that data is “complete”.

ACCURATE: I believe this one is “pure common sense”, and applies to all organisations, across all industries. (page 19, paragraph 3.66)

Data is considered accurate if:

  • it is free from material mistakes, errors and omissions;
  • the recording of information is adequate, performed in a timely manner and is kept consistent across time;
  • a high level of confidence is placed on the data; and
  • the undertaking must be able to demonstrate that it recognises the data set as credible by using it throughout the undertakings operations and decision-making processes.

Update – In October 2013, following an 18 month consultative process, DAMA UK published a white paper explaining 6 primary data quality dimensions.

1. Completeness
2. Uniqueness
3. Timeliness
4. Validity
5. Accuracy
6. Consistency

For more details see my blog post, Major step forward in Data Quality Measurement


Common Enterprise wide Data Governance Issues – #12. No Enterprise wide Data Dictionary.

This post is one of a series dealing with common Enterprise Wide Data Governance Issues.    Assess the status of this issue in your Enterprise by clicking here:  Data Governance Issue Assessment Process

No Idea What This Means

Anyone know what this acronym means?

An excellent series of blog posts from Phil Wright (Balanced approach to scoring data quality) prompted me to restart this series.  Phil tells us that in his organisation, “a large amount of time and effort has been applied to ensure that the business community has a definitive business glossary, containing all the terminology and business rules that they use within their reporting and business processes. This has been published, and highly praised, throughout the organisation.” I wish other organisations were like Phil’s.

Not only do some organisations lack “a definitive business glossary” as Phil describes above, complete with business rules….
Some organisations have no Enterprise wide Data Dictionary.  What is worse – there is no appreciation within senior management of the need for an Enterprise wide Data Dictionary (and therefore no budget to develop one).

Impact(s):

  • No business definition, or contradictory business definitions of the intended content of critical fields.
  • There is an over dependence on a small number of staff with detailed knowledge of some databases.
  • Incorrect or non-ideal sources of required data are identified – because the source of required data is determined by personnel with expertise in specific systems only.
  • New projects, dependent on existing data, are left ‘flying blind’.  The impact is similar to landing in a foreign city, with no map and not speaking the language.
  • Repeated re-invention of the wheel, duplication of work, with associated costs.

Solution:

CIO to define and implement the following Policy:  (in addition to the policies listed for Data Governance Issue #10):

  • An Enterprise wide Data Dictionary will be developed covering critical Enterprise wide data, in accordance with industry best practice.

Does your organisation have an “Enterprise wide Data Dictionary” – if so, how did you achieve it?  If not, how do new projects that depend on existing data begin the process of locating that data?  Please share your experience.

My interview with Dylan Jones

Dylan Jones of DataQualityPro interviews me about the process I use to assess common Enterprise wide data issues. Use this process to assess the status of data governance within your organisation or that of a client.

Data Quality Pro interview with Ken O'Connor Data Consultant

Russian Gas Pipe and Data Governance

As you know, Russia supplies Gas to many European countries.

What's flowing through your critical data pipelines?

Do you know what’s in your critical data pipelines?

Could you imagine Italy purchasing gas from Russia without checking what exactly was flowing through the pipe?  I’m no expert on gas pipelines, but I know that before completing the agreement to purchase the gas, Italy and Russia would have agreed metrics such as:

  • Volume of Gas
  • Calorific value (Energy content)
  • etc.

So what? What else would one expect?  Applied common sense… yes?

Why is it that such common sense is often lacking in Data Migration and Data Population projects?  Why do some Enterprises continue to perform data population of, and ongoing data entry to, critical data repositories without fully understanding the data they are pumping into the repository?

A simple example involves Date of Birth.  The business ask the IT function to populate Date of Birth in the new AML / BASEL II / CRM / other repository. Some time later, when data population is complete, the business begin to express concerns:

  • “We never realised we had so many customers aged over 100 ???”
  • “I thought we had more Student customers”
  • “How come so many of our customers share the same birthday ?”
  • “These are not the results we expected”
  • etc.

Performing data population on the basis of what the source data “should contain”, without analysing what exactly it does contain is known as ‘Load and Explode’ approach to Data Population.  I cover this Enterprise Wide Data Issue in more detail here.

We in the “Data Governance”, “Data Quality” industry need to educate the business community on the “common sense” parts of data governance, and the need to engage “Data Governance Professionals”  to ensure that “Data Quality Common Sense” is actually applied.

Feedback welcome – Ken