The growing demand for food and data provenance

In November 2012, I presented at the Data Management and Information Quality Europe 2012 conference, in London. My presentation was called Do you know what’s in the data you’re consuming.

In the presentation, I compare the data supply chain with the food supply chain.

I believe that data consumers have the right to be provided with facts about the content of the data they are consuming, just as food consumers are provided with facts about the food they are buying. The presentation provides guidelines on how you can improve your data supply chain.

Little did I realise that within 3 months the term “provenance” would be hitting the headlines due to the European horsemeat scandal.

There’s a silver lining in this food scandal for data quality management professionals. As financial regulators increasingly demand evidence of the provenance of the data provided to them, it is now easier for data quality management professionals to explain to their business colleagues and senior management what “data provenance” means, and what it requires.  Retailers, such as Tesco, must have controls in their supply chain that ensure that the food they sell to consumers only contains “what it says on the tin”. Similarly, financial services organisations providing data to financial regulators must have controls in their data supply chain that ensure the quality of the data they provide can be trusted. Regulators are now asking financial services organisations to demonstrate evidence that their data supply chain can be trusted. They require organisations to demonstrate evidence of their data provenance, as applied to their critical or material data.

But what exactly is “data provenance”? The best definition I have seen comes from Michael Brackett in his excellent book “Data Resource Simplexity“.

“Data Provenance is provenance applied to the organisation’s data resource. The data provenance principle states that the source of data, how the data were captured, the meaning of the data when they were first captured, where the data were stored, the path of those data to the current location, how the data were moved along that path, and how those data were altered along that path must be documented to ensure the authenticity of those data and their appropriateness for supporting the business”.

Enjoy your “beef” burger!

Do you know what’s in the data you’re consuming?

Standard facts are provided about the food we buy

These days, food packaging includes ingredients and a standard set of nutrition facts.  This is required by law in many countries.

Food consumers have grown accustomed to seeing this information, and now expect it. It enables them to make informed decisions about the food they buy, based on a standard set of facts.

Remarkable as it may seem, data consumers are seldom provided with facts about the data feeding their critical business processes.

Most data consumers assume the data input to their business processes is “right”, or “OK”.  They often assume it is the job of the IT function to ensure the data is “right”.  But only the data consumer knows the intended purpose for which they require the data.  Only the data consumer can decide whether the data available satisfies their specific needs and their specific acceptance criteria. To make an informed choice, data consumers need to be provided with facts about the data content available.

Data Consumers have the right to make informed decisions based on standard data content facts

The IT function, or a data quality function, can, and should provide standard “data content facts” about all critical data such as the facts shown in the example.

In the sample shown, a Marketing Manager wishing to mailshot customers in the 40-59 age range might find that the data content facts satisfy his/her data quality acceptance criteria.

The same data might not satisfy the acceptance criteria for a manager in the Anti Money Laundering (AML) area requesting an ETL process to populate a new AML system.

Increasing regulation means that organisations must be able to demonstrate the quality and trace the origin of the data they use in critical business processes.

In Europe, Solvency II requires insurance and re-insurance undertakings to demonstrate the data they use for solvency calculations is as complete, appropriate and accurate as required for the intended purpose. Other regulatory requirements such as Dodd Frank in the USA, BASEL III and BCBS 239 are also seeking increasing transparency regarding the quality of data underpinning our financial system.

While regulation may be a strong driving force for providing standard data content facts, an even stronger one is the business benefit that to be gained from being informed.  Some time ago Gartner research showed that approximately 70% of CRM projects failed.  I wonder were the business owners of the proposed CRM system shown data content facts about the data available to populate the proposed CRM system?

In years to come, we will look back on those crazy days when data consumers were not shown data content facts about the data they were consuming.

Common Enterprise wide Data Governance Issues – #14. No Enterprise wide Data Model

I was reading David Loshin’s excellent post How Do You Know What Data is Master Data? and I thought “I know – I’ve covered that question in my blog” – but I hadn’t.  So here it is.

Your “Enterprise Wide Data Model” tells you what data is Master Data.

Unfortunately, most organisations lack an Enterprise Wide Data Model. Worse still, there is often little appreciation among senior management of the need for an Enterprise wide Data Model.

Impact:
The absence of a Enterprise wide Data Model makes it difficult for even technical experts to locate data.  The data model would distinguish between Master data and replicas, and would clarify whether the data in the model is currently in place, or planned for.  Without an Enterprise Wide Data Model, data dependent projects (e.g. BASEL II, Anti Money Laundering, Solvency II) must locate data (especially Master Data) from first principles, and face the risk of not finding the data, or identifying inappropriate sources.   New projects dependent on existing data take longer than necessary to complete, and face serious risk of failure.

Solution:
The CIO should define and implement the following Data policy:

An Enterprise wide Data Model will be developed covering critical Enterprise wide data, in accordance with industry best practice.

Time to sing from the same hymn sheet

One notable exception to the norm:
This is not a plug for IBM…. merely an observation based on my experience.

I worked in an IBM development lab in Dublin during the 90’s. At that time IBM developed a “Financial Services Data Model” (FSDM). Dublin was IBM’s “FSDM centre of excellence”. BASEL II turned FSDM into an “Overnight success”- TEN YEARS after it was developed. Organisations that had adopted IBM’s FSDM found it relatively easy to locate the data required by their BASEL II compliance programme.

I forsee a future in which all financial services organisations will use the same data model, including Financial Regulator(s).  “Singing from the same hymn sheet” will make communication far simpler, and less open to misinterpretation.

The lack of an Enterprise Wide Data Model is just one of the many data governance issues that affect organisations today.  Assess the status of this issue in your Enterprise by clicking here:  Data Governance Issue Assessment Process

Does your organisation have an “Enterprise wide Data Model” – if so, how did you achieve it?  Did you build it from scratch, or start with a vendor supplied model? Please share your experience.


Achieving Regulatory Compliance – the devil is in the data

I will be sharing my experience and ideas on “Achieving Regulatory Compliance – the devil is in the data” at an IDQ Seminar Series event in Dublin next month.  I would like you to help me prepare.

I would like you to share your past experience with me, your ideas on the current situation, and most important, your view of the future.

Is Regulatory Compliance a mere box ticking execise?

What industry do you work in?

Is regulation increasing in your industry?

Is regulation merely a box ticking exercise?  Does the regulator simply accept what you say.

What role does data quality play?

What role does data governance play?

My initial thoughts are as follows:

  • Regulation is increasing across all industries
    e.g. Within Financial Services, the list includes:

    • SOLVENCY II
    • BASEL II
    • Anti Money Laundering AML
    • Anti Terrorist Financing AFT
    • Sarbanes Oxley SOX
    • MFID
  • Regulatory compliance is often seen as a box ticking exercise, since it is physically impossible for the regulator to check all the information provided.
  • Regulators will increasingly seek to challenge, audit and query the Data Governance processes used to gather the information, and critically the controls applied within those processes.  (I have written a series of posts on common Data Governance Issues – see Data Governance Issue Assessment Process)

I hope to write a number of posts expanding on the above ideas.  My argument is that “To achieve Regulatory Compliance, the devil is very definitely in the data, but the evidence is in the Data Governance process”.

Whether you agree, or disagree, I would like to hear from you.

Plug and Play Data – The future for Data Quality

The excellent IAIDQ World Quality Day webinar looked at what the Data Quality landscape might be like in 5 years time, in 2014.  This got me thinking.  Dylan Jones excellent article on The perils of procrastination made me think some more…

Plug and Play Data

Plug and Play Data

I believe that we data quality professionals need a paradigm shift in the way we think about data.  We need to make “Get data right first time” and  “Data Quality By Design” such no brainers that procrastination is not an option.   We need to promote a vision of the future in which all data is reusable and interchangeable – a world of “Plug and Play Data”.

Everybody, even senior business management, understand the concepts of “plug and play” and reusable play blocks.  For “plug and play” to succeed, interconnecting parts must be complete, fully moulded, and conform to clearly defined standards.  Hence “plug and play data” must be complete, fully populated, and conform to clearly defined standards (business rules).

How can organisations “get it right first time” and create “plug and play data”?
It is now relatively simple to invoke cloud based verification from any part of a system through which data enters.

For example, when opening a new “Student” bank account, cloud based verification might prompt the bank assistant with a message like “Mr. Jones’ date of birth suggests he is 48 years old.  Is his date of birth correct?  Is a “Student Account” appropriate for Mr. Jones”?

In conclusion:

We Data Quality Professionals need to educate both Business and IT on the need for, and the benefits of “plug and play data”.   We need to explain to senior management that data is no longer needed or used by only one application.  We need to explain that even tactical solutions within Lines of Business need to consider Enterprise demands for data such as:

  1. Data feed into regulatory systems (e.g Anti Money Laundering, BASEL II, Solvency II)
  2. Access from or data feed into CRM system
  3. Access from or data feed into Business Intelligence system
  4. Ad hoc provision of data to satisfy regulatory requests
  5. Increasingly – feeds to and from other organisations in the supply chain
  6. Ultimate replacement of application with newer generation system

We must educate the business on the increasingly dynamic information requirements of the Enterprise – which can only be satisfied by getting data “right first time” and by creating “plug and play data” that can be easily reused and interconnected.

What do you think?

Common Enterprise wide Data Governance Issues #11: No ownership of Cross Business Unit business rules

This post is one of a series dealing with common Enterprise Wide Data Governance Issues.  Assess the status of this issue in your Enterprise by clicking here:  Data Governance Issue Assessment Process

Business Units often disagree

I'm right, he's wrong!

Different Business Units sometimes use different business rules to perform the same task.

Withing retail banking for example, Business Unit A might use “Account Type” to distinguish personal accounts from business accounts, while Business Unit B might use “Account Fee Rate”.


Impact(s) can include:

  1. Undercharging of Business Accounts mistakenly identified as Personal Accounts, resulting in loss of revenue.
  2. Overcharging of Personal Accounts mistakenly identified as Business Accounts, which could lead to a fine or other sanctions from the Financial Regulator.
  3. Anti Money Laundering (AML) system generates false alerts on Business Accounts mistakenly identified as Personal Accounts.
  4. AML system fails to generate alert on suspicious activity (e.g. large cash lodgements) on a personal account misidentified as a Business Account, which could lead to a regulatory fine.
  5. Projects dependent on existing data (e.g. AML, CRM, BI) discover that the business rules they require are inconsistent.

Solution:
Agree and implement the following Policy:  (in addition to the policies listed for Data Governance Issue #10)

  • Responsibility for resolving cross business unit business rule discrepancies lies with the Enterprise Data Architect.

For further details on Business rules – see Business Rules Case Study.

Your experience:
Have you faced a situation in which different business units use different business rules?   Please share your experience by posting a comment – Thank you – Ken.

Russian Gas Pipe and Data Governance

As you know, Russia supplies Gas to many European countries.

What's flowing through your critical data pipelines?

Do you know what’s in your critical data pipelines?

Could you imagine Italy purchasing gas from Russia without checking what exactly was flowing through the pipe?  I’m no expert on gas pipelines, but I know that before completing the agreement to purchase the gas, Italy and Russia would have agreed metrics such as:

  • Volume of Gas
  • Calorific value (Energy content)
  • etc.

So what? What else would one expect?  Applied common sense… yes?

Why is it that such common sense is often lacking in Data Migration and Data Population projects?  Why do some Enterprises continue to perform data population of, and ongoing data entry to, critical data repositories without fully understanding the data they are pumping into the repository?

A simple example involves Date of Birth.  The business ask the IT function to populate Date of Birth in the new AML / BASEL II / CRM / other repository. Some time later, when data population is complete, the business begin to express concerns:

  • “We never realised we had so many customers aged over 100 ???”
  • “I thought we had more Student customers”
  • “How come so many of our customers share the same birthday ?”
  • “These are not the results we expected”
  • etc.

Performing data population on the basis of what the source data “should contain”, without analysing what exactly it does contain is known as ‘Load and Explode’ approach to Data Population.  I cover this Enterprise Wide Data Issue in more detail here.

We in the “Data Governance”, “Data Quality” industry need to educate the business community on the “common sense” parts of data governance, and the need to engage “Data Governance Professionals”  to ensure that “Data Quality Common Sense” is actually applied.

Feedback welcome – Ken