If Carlsberg did Data Governance

In this part of the world, we’re treated to wonderful TV ads from Carlsberg, with the theme “If Carlsberg did…, it would probably be the best…. in the world”. One of my favourites is “If Carlsberg did Haircuts”.

aaeaaqaaaaaaaawkaaaajdm1otkymjyzlwuyzdctngiwyi1hm2ixltq4zwqwzme3ndlmnq

Carlsberg TV Ad – If Carlsberg did haircuts

This led me to think, what if Carlsberg did Data Governance?
Picture the scene…  Your CEO is looking for a new report, and she has tasked you with providing it, by close of business tomorrow! Where do you start?

In steps the “Data Waiter”, who presents you with a complete menu of the data in your organisation, suggests the data required for the new report and then prompts you with friendly questions like:

  • How would you like your data sliced?
    Would you like it by Geography, or Business line? Perhaps by Product, or by Customer Type?
  • What time period would you like your data to cover?
    For a light snack, I can recommend a “Point in Time” snapshot? Or perhaps you would like to include the past month? I can recommend the house special, which is a “Trend over time” for the past year.
  • How up to date would you like your data to be?
    The early-bird menu has some lovely data we captured 2 years ago. For a $10 supplement you can have data from 1 year ago. On the a la carte menu, you can choose from a wide range, from 6 months old to near-real-time.
  • How often would you like your data?
    Would you prefer a once-off or perhaps a weekly / monthly data extract? We do a lovely daily extract, or perhaps you would like real-time data-streaming?
  • What level of trust does your CEO need in the report you’re preparing?
    The early-bird menu has a fresh slice of big data. It’s a beautiful visualisation that looks really pretty – your CEO will love it. I’ve been assured that the data was so big that there’s no need to worry about the quality of it. (Editor note: Beware of Big Data Bullshit – Look up “Veracity” which is the critical, but often overlooked, 4th “V” of Big Data).
    If your CEO needs a higher level of trust in your report, we have a complete selection of data that we’ve traced from data entry to our own reporting data warehouse, complete with data quality metrics along the data supply chain.

Having selected the data you need, the data waiter scans your retina, confirms you have appropriate access authority, and then delivers the data to your preferred location. You prepare your report and your CEO is so delighted that she promotes you to the senior management team… Happy days! Scene ends.

What services would you like from “The best Data Governance in the world”?

For more about “Trust in data” see my blog post “The growing demand for food and data provenance“.

This article originally appeared on LinkedIn Pulse

Santa’s secret tips for a successful “Secret Santa”

If you’ve ever organised a “Secret Santa” you’ll know that “data quality” is critical to its smooth running. Santa is the acknowledged world leader in data quality management, given his success managing the names and addresses of billions of children worldwide. He coined the data quality industry motto “Make a list, then check it twice”, which is a Critical Success Factor (CSF) to his “Naughty” and “Nice” segmentation process. Santa Claus

Santa has kindly shared some of his secret tips… In risk management terms, he tells us that we need to “manage the risk that the critical data required for the success of the (Secret Santa) programme is not fit for purpose”

He suggests that we apply 4 of his 6 favourite data quality dimensions:

  1. Completeness: Ensure you put a name on your gift
  2. Accuracy: Ensure you put the correct (accurate)name on your gift (Check against the slip of paper you pulled out)
  3. Uniqueness: Ensure you put First Name and Surname on your gift (just in case there are two Johns, or Marvins or Oprahs)
  4. Timeliness: Ensure you deliver your gift, with its associated critical data, to the secret santa organiser in good time

Update December 2018 – in response to GDPR
Much has changed in the four years since Santa shared the tips above. Santa has asked me personally to update this post to let people know that he welcomes the strengthening of data protection laws that came into effect with the GDPR on May 25th this year.

Santa Claus has always respected the privacy and security of the personal data of the billions of children he has delivered gifts to throughout history. He never has, and never will, share personal data with anyone other than the elves, cats, dogs, birds and other animals that keep him informed throughout the year on how boys and girls are behaving.  

He personally maintains the naughty and nice list, without the use of Artificial Intelligence, Machine Learning or any other algorithmic processing. 

Data privacy professionals across the world have been asking Santa Claus how he manages his data privacy processing. Santa’s response: “It’s all part of the magic of Christmas” 

#IBelieve, #Humour

Ken

Opportunity to apply lessons learnt in my new job

This week I started a new job as Head of Customer Information at Bank of Ireland in Dublin. I am excited at the prospect of applying the lessons I have learnt for the benefit of our customers.

I would like to take this opportunity to thank my fellow data management professionals worldwide for generously sharing their experience with me. I started to write this blog in 2009. My objective was to “Share my experience and seek to learn from the experience of others”. I have certainly learnt from the experience of others, and I hope to continue to do so.

The opinions I express on this blog will continue to be my own. I look forward to continuing to hear yours.

The link between horse meat in beef burgers and data quality management

It was reported today that Horse DNA was detected in tests performed on frozen “beef” burgers in a number of UK and Irish supermarkets. This has come as a shock to consumers, who assumed that quality controls were in place to ensure that food contains only what it says on the label. It appears the quality controls did not include a specific test for the presence of horse meat.

Photograph: Matt Cardy/Getty Images

In an earlier blog post, I asked “Do you know what’s in the data you’re consuming?” In that post, I proposed that, as data consumers, we have the right to expect facts about the business critical data we consume – just as food consumers are provided with nutritional facts. Today’s news reminds us to be clear about the data quality facts we ask for. 

The old adage applies “If you don’t measure, you can’t manage”.

The dog and the frisbee and data quality management

The Wall Street journal reported it as the “Speech of the year“.

In a speech with the intriguing title “The dog and the frisbee“, Andrew Haldane, the Bank of England Director of Financial Stability has questioned whether the Emperor (in the form of ever increasing, ever more complex regulations such as Solvency II, BASEL III and Dodd Frank) is naked. He points out that the BASEL regulations, which have increased from 30 pages to over 600 pages completely failed to identify banks that were at risk of collapse, while a simple measure of the bank’s leverage ratio did identify them.

He also points out “Dodd-Frank makes Glass-Steagall look like throat-clearing.” The Glass-Steagall act of 1933, which separated commercial and investment banking, ran to a mere 37 pages; the Dodd-Frank act of 2010 ran to 848, and may spawn a further 30,000 pages of detailed rule-making by various agencies.

I recommend you read the speech yourself – his arguments, together with his wit are superb. I include a brief extract below:

‘In the UK, regulatory reporting was introduced in 1974. Returns could have around 150 entries. In the Bank of England archives is a memo to George Blunden, who was to become Deputy Governor, on these proposed regulatory returns. Blunden’s handwritten comment reads: “I confess that I fear we are in danger of becoming excessively complicated and that if so we may miss the wood from the trees”.

Today, UK banks are required to fill in more than 7,500 separate cells of data – a fifty-fold rise. Forthcoming European legislation will cause a further multiplication. Banks across Europe could in future be required to fill in 30–50,000 data cells spread across 60 different regulatory forms. There will be less risk of regulators missing the wood from the trees, but only because most will have needed to be chopped down.’

Brilliant !

Andrew Haldene is calling for more simple, basic rules. I agree with him,

I have worked in data management for over 30 years. The challenges I see today are the same challenges that arise time and time again. They are not Solvency II specific, BASEL specific, or Dodd Frank specific. They are universal. They apply to all critical data within all businesses.

The fundamental truth is “The data is unique, but the data management principles are universal”

It is time to stop writing specific data management and data quality management requirements into specific legislation.  Regulators should co-operate with the data management profession, via independent organisations such as DAMA International, to develop a common sense universal standard, and put the effort into improving such a standard.

What do you think? I welcome your comments.

Data Governance – Did you drop something?

Welcome to part 5 of Solvency II Standards for Data Quality – common sense standards for all businesses.

Solvency II Data Quality - Is your data complete?

Solvency II Data Quality – Is your data complete?

I suspect C-level management worldwide believe their organisation has controls in place to ensure the data on which they base their critical decisions is “complete”. It’s “applied common sense”.

Therefore, C-level management would be quite happy with the Solvency II data quality requirement that states: “No relevant data available is excluded from consideration without justification (completeness)” (Ref: CP 56 paragraph 5.181).

So… what could go wrong?

In this post, I discuss one process at high risk of inadvertently excluding relevant data – the “Data Extraction” process.

“Data Extraction” is part of the most common business process in the world, the “Extract, Transform, Load process”, or ETL for short. Data required by one business area (e.g. Regulatory reporting) is present in different (source) systems. The source systems are often operational systems. Data is commonly “extracted” from “operational systems” and fed into “informational systems” (which I refer to as “End of Food Chain Systems”).

If the data extraction can be demonstrated to be a complete copy – there is no risk of inadvertently omitting relevant data. In my experience, few data extractions are complete copies.

In most instances, data extractions are “selective”.  In the insurance industry for example, the selection may be done based on product type, or perhaps policy status.  This is perfectly acceptable – so long as any “excluded data” is justified.

Over time, new products may be added to the operational system(s). There is a risk that the data extraction process is not updated, the new products are inadvertently excluded, and never make it to the “end of food chain” informational system (CRM, BI, Solvency II, Anti-Money Laundering, etc.)

So… what can be done to manage this risk.

I propose a “Universal Data Governance Principle” – namely: “Within the data extraction process, the decision to EXCLUDE data is equally important to the decision to INCLUDE data.”

To implement the principle, all data extractions (regardless of industry) should include the following control.

  1. Total population (of source data)
  2. Profile of source data based on the selection field (e.g. product type)
  3. Inclusion selection list (e.g. product types to be included)
  4. Exclusion selection list (e.g. product types to be excluded) – with documented justification
  5. Generate an alert when a value is found in the “selection field” that is NOT in either list (e.g. new product type).
  6. Monitor the control regularly to verify it is working
So – ask yourself – Can you demonstrate that your “data extractions” don’t overlook anything – can you demonstrate that “No relevant data available is excluded from consideration without justification (completeness)”?
Feedback welcome – as always.

FSA SII progress review findings – More Data Governance required

February 2011 – UK Financial Services Authority publishes findings of their Solvency II Internal Model Approval Process (IMAP) thematic review. 

Worryingly, but not surprising are the findings that data management, data quality and data governance are areas requiring most attention: I include specific paragraphs below:

3.2 Data management appeared to be one area where firms still have comparatively more to do to achieve the likely Solvency II requirements.

3.15 Data quality: Few firms provided sufficient evidence to show that data used in their internal model was accurate, complete and appropriate.

6.10 We witnessed little challenge or discussion on data quality at board level. We expect issues and reporting on data governance to find a regular place within board and committee discussions. Firms need to ensure that adequate and up-to-date quality management information is produced. It is important that the board has the necessary skills to ask probing questions.

See the full report at:

http://www.fsa.gov.uk/pubs/international/imap_final.pdf

Know your data

You must know your data.

Do you know what’s in your data box of chocolates?

You must know where it is, what it should contain and what it actually contains.

When your data does not contain what it should, you must have a process for correcting it.

CEOs, CFOs and CROs often take the above as “given”.  They make business critical decisions using information derived from data within their organisation.  After all, its applied common sense.

For the insurance industry, Solvency II requires evidence that you are applying common sense.

If you operate in the EU market or process the personal data of EU data subjects, you must comply with the EU General Data Protection Regulation (GDPR) or face severe fines. To comply, you must “know your (personal) data” and how you manage it.

In my experience, data is like a box of chocolates “You never know what you’re gonna get.”

Do you know your data?

Charter of Data Consumer rights and responsibilities

Time for charter of Data Consumer rights and responsibilities

There are many rights enshrined in law that benefit all of us. One example is the UN Charter of Human Rights.  Another example is the “Consumer Rights” protection most countries enforce to guarantee us, the buying public, the right to expect goods and services that are of good quality and “fit for purpose”.  As buyers of goods and services, we also have responsibilities.  If you or I buy a “Rolex watch” for $10 from a casual street vendor, we cannot claim consumer protection rights if the watch stops working within a week. “Let the buyer beware” or “Caveat Emptor” is the common sense responsibility that we, as consumers must observe.

I have previously written about business users’ right to expect good data plumbing. Business users (of data) have responsibilities also.  I believe its time to agree a charter of rights and responsibilities for them.  Business users of data are “Data Consumers” – people who use data to perform their work, whatever work that may be.  Data Consumers make decisions based on the data or information available to them. Examples can range from a doctor prescribing medication based on the information in a patient’s health records, to a multi-national chief executive deciding to buy a business based on the performance figures available, to an actuary developing an internal model to determine Solvency II Capital Requirements.

What rights and responsibilities should data consumers have?

Here’s my starter set:

  • The right to expect data that is “fit for purpose”, data that is complete, appropriate and accurate.
  • The responsibility to define what “fit for purpose” data means to them.
  • The right to expect guidance and assistance in defining what constitutes complete, appropriate and accurate data for them.
  • The responsibility to explain the impact that “sub-standard” data would have on the work they do.
  • The right to be informed of the actual quality of the data they use.
  • The right to expect controls in place that verify the quality of the data they use meets the standard they require.

What do you think ? Please feedback your suggestions:

The Ryanair Data Entry Model

I was prompted to write about the “Ryanair Data Entry Model” by an excellent post by Winston Chen on “How to measure Data Accuracy”.

Winston highlights the data quality challenge posed by incorrect data captured at point of entry.  He illustrates one cause as the use of default drop down selection options. He cites an example of a Canadian law enforcement agency that saw a disproportionately high occurrence of “pick pocketing” within crime statistics.  Further investigation revealed that “pick pocketing” was the first option in a drop down selection of crime types.

Winston provides excellent suggestions on how to identify and prevent this source of data quality problems.  Dylan Jones of Dataqualitypro.com and others have added further great tips in the comments.

I believe you need to make Data Quality “matter” to the person entering the data – hence I recommend the use of what I call the “Ryanair Data Entry Model”.   This is the data entry model now used by most low cost airlines. As passengers, we are required to enter our own data. We take care to ensure that each piece of information we enter is correct – because it matters to us.  The same applies when we make any online purchase.

With Ryanair, it is impossible to enter an Invalid date (e.g. 30Feb), but it is easy to enter the “wrong date” for our needs. E.g. We may wish to Fly on a Sunday, but by mistake we could enter the date for Monday.

We ensure that we select the correct number of bags, since each one costs us money. We try to avoid having to pay for insurance, despite Ryanair’s best efforts to force it on us.

It may not be easy to have data entry “matter” to the persons performing it in your organisation – but this is what you must do if you wish to “stop the rot” and prevent data quality problems “at source”. To succeed, you must measure data quality at the point of entry, provide immediate feedback to the data entry person (helping them to get it right first time). Where possible, you should include data entry quality in a person’s performance review – reward for good data quality, and lack of reward for poor data quality.

Poor quality data entered at source is a common Data Governance issue, which I discuss further here:

Have you encountered examples of poor data quality entered at source?  Have you succeeded in identifying and preventing this problem? Please share your success (and horror !) stories.