Cleaning Data and Improving Data Qaulity

This project was part of an initiative to get all our systems using the same data and to streamliine various order handling process's etc. The first part of the process was to get all our systems using the same customer data and to have a single point of entry for this data. This sounds fairly straight forward but when a company uses several different names it gets tricky and then we have the point where a compannies legal name differs from what we are using. Another problem was that the data was input in several different countries and the comapny data being input may not be a company situated in that country. We also had differnet addressing schemes to work with ie, we don't all use post codes that look like SW18 4DD. To cut a long story short it was a complex project.

The first dataset came from Oracle financials and after cleansing the following systems where to be updated with the new data:

  1. Remedy, This was the order handling system.
  2. Progressor, Billing system.
  3. Xpercom, CRS
  4. FMS Fault Management System

The cleansing process involved determining trusted areas where we were sure the data was going to be of a high quality. We then came up with some rules to determine the validity of the data ie, if we found the address identical or with just spelling errors in two of the trusted areas then this would be accepted as good data. We then used these rules to build a table of clean customer data. Any discrepencies where then compared to data held in Customers House to see where we could salvage as much as possible. Please note, the the data from Companies House is not actually that good but it was another sanity check and where we found similarities we could then investigate other avenues to build the record.

My job in all of this was to build the data from the 4 systems listed above and to compare it to the Oracle financials data after I had cleaned them both. I then had to present the cleaned data in various formats to the Operations people who in a co-ordinated effort loaded all the systems. This took place one Sunday morning in a pre-arranged window and thankfully it went well and was pretty uneventful.