Building a business case for data cleansing

How do you build a business case for cleaning up data? I have a client who wants to conduct a highly personalized CRM program; however, there is a significant problem with data quality. Any ideas to help them see the need for clean data to be accurate in CRM initiative?

I assume that you need to go further than the standard "Garbage In, Garbage Out" answer to this problem. In that case you need to do some analysis of the data in order to build your case. I would first look into the problem of missing data. Quantify how many customers are missing at lease one important piece of data (with "important" defined by your particular business). If you have a significant amount of missing data, your ability to segment and personalize your customers will be limited. If you don't know a customer's birthday, it will be difficult to personalize based on age. As the segments get finer and finer, you will need to evaluate more characteristics of a customer and you don't want to run into missing data roadblocks.

If you really want to dig into the problem, pull out some randomly selected data and then validate it manually. This can be expensive but you want to understand how dirty the data really is. Separate the data into multiple buckets, with one bucket for clean data and the other buckets for various levels of dirtiness (keeping track of both the correct and incorrect values for the dirty data). Then run all of the correct values through the personalization processes and record the predicted value of the interaction. By running the incorrect values through these same personalization processes, you can quantify the cost of the dirty data for various levels of dirtiness. At that point the client can evaluate the tradeoff between the cost of improving the quality of the data and the cost of leaving dirty data in the system.

