I was prompted to write about the “Ryanair Data Entry Model” by an excellent post by Winston Chen on “How to measure Data Accuracy”.
Winston highlights the data quality challenge posed by incorrect data captured at point of entry. He illustrates one cause as the use of default drop down selection options. He cites an example of a Canadian law enforcement agency that saw a disproportionately high occurrence of “pick pocketing” within crime statistics. Further investigation revealed that “pick pocketing” was the first option in a drop down selection of crime types.
Winston provides excellent suggestions on how to identify and prevent this source of data quality problems. Dylan Jones of Dataqualitypro.com and others have added further great tips in the comments.
I believe you need to make Data Quality “matter” to the person entering the data – hence I recommend the use of what I call the “Ryanair Data Entry Model”. This is the data entry model now used by most low cost airlines. As passengers, we are required to enter our own data. We take care to ensure that each piece of information we enter is correct – because it matters to us. The same applies when we make any online purchase.
With Ryanair, it is impossible to enter an Invalid date (e.g. 30Feb), but it is easy to enter the “wrong date” for our needs. E.g. We may wish to Fly on a Sunday, but by mistake we could enter the date for Monday.
We ensure that we select the correct number of bags, since each one costs us money. We try to avoid having to pay for insurance, despite Ryanair’s best efforts to force it on us.
It may not be easy to have data entry “matter” to the persons performing it in your organisation – but this is what you must do if you wish to “stop the rot” and prevent data quality problems “at source”. To succeed, you must measure data quality at the point of entry, provide immediate feedback to the data entry person (helping them to get it right first time). Where possible, you should include data entry quality in a person’s performance review – reward for good data quality, and lack of reward for poor data quality.
Poor quality data entered at source is a common Data Governance issue, which I discuss further here:
Have you encountered examples of poor data quality entered at source? Have you succeeded in identifying and preventing this problem? Please share your success (and horror !) stories.