My interview with Dylan Jones

Dylan Jones of DataQualityPro interviews me about the process I use to assess common Enterprise wide data issues. Use this process to assess the status of data governance within your organisation or that of a client.

Data Quality Pro interview with Ken O'Connor Data Consultant

Russian Gas Pipe and Data Governance

As you know, Russia supplies Gas to many European countries.

What's flowing through your critical data pipelines?

Do you know what’s in your critical data pipelines?

Could you imagine Italy purchasing gas from Russia without checking what exactly was flowing through the pipe?  I’m no expert on gas pipelines, but I know that before completing the agreement to purchase the gas, Italy and Russia would have agreed metrics such as:

  • Volume of Gas
  • Calorific value (Energy content)
  • etc.

So what? What else would one expect?  Applied common sense… yes?

Why is it that such common sense is often lacking in Data Migration and Data Population projects?  Why do some Enterprises continue to perform data population of, and ongoing data entry to, critical data repositories without fully understanding the data they are pumping into the repository?

A simple example involves Date of Birth.  The business ask the IT function to populate Date of Birth in the new AML / BASEL II / CRM / other repository. Some time later, when data population is complete, the business begin to express concerns:

  • “We never realised we had so many customers aged over 100 ???”
  • “I thought we had more Student customers”
  • “How come so many of our customers share the same birthday ?”
  • “These are not the results we expected”
  • etc.

Performing data population on the basis of what the source data “should contain”, without analysing what exactly it does contain is known as ‘Load and Explode’ approach to Data Population.  I cover this Enterprise Wide Data Issue in more detail here.

We in the “Data Governance”, “Data Quality” industry need to educate the business community on the “common sense” parts of data governance, and the need to engage “Data Governance Professionals”  to ensure that “Data Quality Common Sense” is actually applied.

Feedback welcome – Ken

Business Rules Case Study – Part II

In part one of this case study,  I  discussed questions like:

  1. Why are Business Rules necessary?
  2. What exactly is a Business Rule?
  3. What should happen if the data fails a Business Rule?

I would like to thank the following people for contributing to the discussion to date:

Jim Harris @ocdqblog shared his experience on data migration and data integration projects, and concluded “Sadly, the most common problem was that no business rules were defined at all and the data would be blindly migrated or integrated without even at least some superficial validation checks.” more here.

In Henrik Liliendahl Sørensen’s @hlsdk experience, Business rules divide into External and Internal Business Rules:

  • “External rules that are defined outside your organisation – mostly laws and other regulations you must follow when doing business in a given country (or group of countries like the EU).
  • Internal rules that are defined by and for your business alone – made to make your business competitive.” more here.

Marianne Colwell @emx5 shared recent wins on the project she is currently working on, in which they have captured business rules in a requirements management repository, more here.

Phil Allen would like to know what the most popular choices of software are for handling the recording of Business Rules and what experiences people have had more here.

In part two, I plan to explore:

I will continue to use a case study from an Anti Money Laundering (AML) programme. However, in my experience, all data migration / data population projects face the same challenges.

 

What controls should you have in place to manage Business Rules?

In Sarbannes Oxley (SOX) terms: “If it’s not written down, it doesn’t exist”.  In my experience, you need the following controls to manage business rules:

  1. Business owner (Business responsibility)
    There must be a defined business owner (business area) with responsibility for the data item, and for the business rule(s) relating to it.  The definition must include details of who to contact (the title of a person) with queries regarding the data.
  2. Location of Business rule(s) (Business responsibility)
    The Business owner must identify where the Master business rules are formally documented, and subject to Change Management.  The business owner must also identify where copies of the business rules are held, since they must all be updated when the master copy is updated.
  3. Change Management process for the Master business rules, and copies. (Business responsibility).
    The Business owner must have a documented Change Management process for updates to the Master business rules, and for ensuring that all copies of the business rules are also updated.
  4. Location of source data (Business accountability – Technical responsibility)
    The Business owner must satisfy him/herself that the providers of IT services to the business have a control process in place that identifies where the actual data is held (i.e. the physical location).  If there are a number of physical locations, they should all be recorded, together with details of which is the Master source, which is a replica, and details of the replication process.

Where should you look for Business Rules (if your Enterprise has no Master Business Rules Repository)

Too often, I have worked on data migration/population projects for which there was no master business rules repository.  We had to research the business rules from first principles. If you have to research business rules from first principles, I suggest you consider the following locations.

  • Business Operations Manuals
    Most organisations have some form of operations manuals – in hard or softcopy. Business rules are commonly embedded in this documentation.  Be careful, they are often out of date.
  • Computer System prompt screens / help screens
    The possible/permitted values for a given field are often provided on help screens.
  • Internet sites belonging to the Enterprise
    Internal and external websites are a rich source of business rules.  They can hold product details, fee rates, etc.
    Unfortunately, they are too often out of sync with the Master Business Rules (wherever they are).
  • Data Warehouse(s) within Enterprise
    If you are lucky enough to have a single “Enterprise Data Warehouse”, this is the logical place to find business rules.  In my experience, many enterprises have a number of data bases (often in the Marketing area), at least one of which is referred to as a ‘data warehouse’.
  • Data Protection Area
    In most countries, customers may request details of the data held about them by an Enterprise.  Many Enterprises have a “Data Protection Area” to coordinate gathering the details held about the customer.  Often, the details held contain internal codes, which the Data Protection Area must ‘translate’ into something meaningful for the customer.  In my experience, the “Data Protection Area” translation process is a rich source of Business Rules.
  • Business Rules are often coded into application systems such as:
    • Anti Money Laundering (AML)
    • BASEL II
    • CRM
    • Single view of customer database

The above are all potential sources of Business Rules…however, they share a common characteristic – they are all typically ‘copies’ or replicas of the master business rules.   My experience suggests the following (I look forward to reading your feedback on this ):

  • The ‘Master Copy’ should be the copy used by the production application system (e.g. to apply an interest rate, e.g. to calculate fees due).
    Rationale:
    – The production application system copy dictates the customer experience (e.g. interest rate charged or given).
    – Production ‘Master copies’ are already subject to ‘IT Change Management Processes’ that ensure all changes are authorised by the business, and tested prior to going live.
  • Unfortunately, many production ‘IT Change Management Processes’ do not attempt to identify ‘replica copies’ of the product information, and I believe this is a ‘Gap’ in the process.
  • I recommend that production ‘Change Management Processes’ should be extended as follows:
    • Replica copies of business rules must be identified, together with the business owners of the replica copies.
      (This can be a once-off process).
    • The Business area requesting and authorising a change must contact the business owner of each replica copy, and receive confirmation that the proposed change is understood and accepted.
    • The change to the production ‘Master Copy’ must be synchronised with the change to all ‘replica copies’. e.g. If the interest rate on a product is changed from 3% to 4% – The product information on a website must change at the same time that the rate is changed (probably within 24 hours).
    • Copy ‘owners’ should also perform a periodic control; every 6 or 12 months, to verify that changes made to the ‘production master’ have been reflected in their replica copies.
      (The copy owners require a means of displaying both the master and copy details).

What has all of the above got to do with an AML programme?

My most recent encounter with researching business rules from first principles was on an AML programme.  An AML programme is an “End of food-chain” programme, as are most Data Migration and Data Population programmes like Euro Changeover, Basel II, CRM and Single View of Customer programmes.

End of food-chain programmes share the following characteristics:

  • They depend on pre-existing data
  • They have no control over the quality of existing data they depend on
  • They have no control over the data entry processes by which the data they require is captured.
  • The data they require may have been captured many years previously.

[Update August 2017: Achieving compliance with the EU General Data Protection Regulation (GDPR) faces all of the above challenges of a classic “end of food chain programme”. However, GDPR differs in that it requires organisations to demonstrate that they are in control of their Personal Data Supply Chain. They must be able to show that they:

  • Know the personal data they process and where they store it
  • Know the data entry processes by which they capture personal data
  • Know where the data goes within their organisation; who may and who has seen it
  • Know what personal data they receive from or provide to third parties
  • Know the quality of the personal data they hold and have control processes in place to maintain that quality
  • Understand the legal basis upon which they may process the personal data they hold]

What has your experience been?  Have you identified other places to look for business rules? Please share your experience by posting a comment.   Thank you, Ken.

Business Rules Case Study Part I

I would like to start a discussion about Business Rules.  I hope you will join in.  Over a series of posts I plan to explore questions like:

  1. Why are Business Rules necessary?
  2. What exactly is a Business Rule?
  3. What should happen if the data fails a Business Rule?
  4. What controls should you have in place to manage Business Rules?
  5. Where should you look for Business Rules (if your Enterprise has no Master Business Rules Repository)

I will use a case study from an Anti Money Laundering (AML) programme.

In this AML programme, the client selected a “Best in breed AML vendor solution”.   The vendor specified the data required, and the client was responsible for locating the data to populate the new AML repository, and for the quality of the data entered in the repository.

Why are Business Rules necessary?

A standard AML business requirement is the requirement to monitor “Minor Accounts” (accounts held by customers under 18 years of age) for ‘unusual transaction activity’.  This high level requirement would result in a number of more specific business requirements, such as:

“Generate an AML alert when the total value of Cash lodged in a month, to an account held by a minor, exceeds a predefined amount, say EUR5000”

Having  agreed the above business requirement, the vendor asked the client to provide the Business Rule for identifying a ‘Minor Account’.

So:

1. Why are Business Rules necessary?
Business rules are required to distinguish between product types, customer types, car parts etc. etc.  AML systems require business rules in order to apply different alert rules to different account holder types.

AML business staff are AML experts, not business rules experts.  It was unclear who owned the data and it took a long time for the IT department to research the business rule(s) for the vendor.  Q:  How do business users in your enterprise get details of Business Rules?  Do your business users find it difficult to access the data they require?

Let us suppose the Business Rule supplied to the vendor was:
A minor account may be identified as follows:
1. Account Type: Personal
2. Account SubType:  Minor
3. Customer Age:  Less than 18

The age check was required to manage the risk that an account opened when a customer was a Minor was not converted to a Standard Personal account when the customer reached his/her 18th birthday.

So:

2. What exactly is a Business Rule?

A Business rule provides critical details about data, including the ‘business’ name of the field, the business purpose of the field, the values it may hold, the business meaning of each value, and interdependencies with other data.  Let’s explore this a little further:

  1. Business name of the data field(s):
    In the above example, three data fields are used in the Business Rule:
    ‘Account Type’, ‘Account Subtype,’ and ‘Age’ (probably determined from Date of Birth).’
  2. Business purpose of the data field:
    e.g. ‘Account SubType’ is used to identify different account types, such as ‘Minor’, ‘Mature years’ etc.
  3. Permitted values (also known as enumerations):
    e.g. Permitted values for Account Subtype are 101 to 199.
  4. Business meaning of each permitted value:
    e.g. ‘Account SubType’ value 101 means Minor Account
  5. Interdependencies with other data:
    e.g. ‘Account SubType’ depends on ‘Account Type’
    ‘Account SubType’ value 101 means Minor Account, when Account Type is ‘Personal’
  6. Field precedence:
    This defines the order in which the fields should be interrogated
    e.g.  First check Account Type, then Account Sub Type

The AML vendor configured the AML tool to apply the “MINOR” rule when Account Type was personal, Account SubType =101 (Minor), and Customer Age less than 18.

During testing, few alerts were generated on Minor accounts.  From an AML business perspective, the less alerts generated the better, since the workload for the AML staff is dictated by the number of alerts generated.

The AML business area was pleased with the low number of alerts, and the vendor was pleased that the alert worked ‘as specified’.

However, it was common knowledge that Date of Birth was not populated 100% of the time, so what was happening when there was no Date of Birth present?  There was no culture of  data quality measurement in the Enterprise, and no facilities for data profiling. Custom built SQL queries against the new AML repository identified multiple instances in which the actual data failed to conform to the Business Rules.

So:

3. What should happen if the data fails a Business Rule?
In our AML example, what do you think should happen when:
a) Account Subtype is ‘101’ indicating a MINOR account, but the customer is aged over 18?
b) Account Subtype is ‘101’ indicating a MINOR account, but date of birth is not populated for this customer?

Business Rules define what data fields “should” contain.  On this AML programme, as in all real world data programmes, the actual data content did not always match what was expected.

This only became apparent as a result of custom built data profiling.  Based on the actual content of the data, the AML business area had to ask the vendor to implement Exception Rules to handle the non-conforming data.  In an ideal world, the data would have been corrected.  In the real world of “achieve compliance by a given date, or face a regulatory fine”, workarounds are quite normal.

So – what are Exception Rules?
Exception rules define what must happen when an account contains data that fails to comply with a business rule.

This post is already far longer than I had planned – I hope it hasn’t bored you to tears.
In my next post, I will explore:

Please share your experience by posting a comment – Thank you.

Common Enterprise wide Data Governance Issues #9: Data Migration and ETL projects are Metadata driven

This post is one of a series dealing with common Enterprise Wide Data Governance Issues.  Assess the status of this issue in your Enterprise by clicking here: Data Governance Issue Assessment Process

Too often, Data Migration and ETL projects are built on the basis of Metadata, without measuring what is actually contained in the source data fields.  This happens when the IT function build data ‘pipes’ on the basis of what the metadata says the source fields should contain, and don’t perform data content analysis, or data profiling, to find out what the source fields actually contain.

Impact:
The IT function turn the  ‘tap’ on, the data flows through the ‘pipes’ and the business express surprise, followed by denial, when expectations cannot be met due to data quality issues.  This is known as the ‘Load and Explode’ approach to data.

Solution:
To prevent ‘Load and Explode’ impacting the success of your data dependent projects, agree and apply the following policy:

Before building, or purchasing a system that is dependent on existing data, projects must complete the following process:

  1. Define what data is required.
  2. Define the quality requirements of the required data.
  3. Identify the source of the required data.
  4. Specify the data quality metrics to be captured.
  5. Measure the quality of the available source data.
  6. Understand the implications of the quality of available source data for the proposed system.
  7. If necessary, and if feasible, implement data quality improvement measures to raise the quality to the required level.
  8. Worst case – if the facts tell you data quality is too low and cannot be improved – Cancel the project and save yourself a ton of money!

Your experience:
Have you faced the above issue in your organisation, or while working with clients?  What did you do to resolve it?  Please share your experience by posting a comment – Thank you – Ken.