News Stay informed about the latest enterprise technology news and product updates.

Is there hidden knowledge in 'dirty' data?

As companies rush to clean up data warehouses and delete dirty data, are they losing valuable intelligence about customers?

When it comes to data cleansing initiatives, are companies throwing out the baby with the bath water? Or, rather, is there knowledge about the baby that can be gained from the bath water?

Companies may lose valuable intelligence about their customers when they "scrub" or delete information like alternate spellings of names and old addresses. Saving and analyzing this outdated information, often called dirty data, can reduce marketing costs, help detect fraud and bolster intelligence to provide the elusive 360-degree view of the customer. Throwing away data could be throwing away knowledge, said David Loshin, president and principal consultant of Silver Spring Md.-based Knowledge Integrity Inc.

"If you clean the data, then you might remove the interesting stuff about it that gives you extra information," Loshin said.

Since individuals take on many identities in real life, going by a given name, nickname or initials, depending on the situation, Loshin argued that saving only one version of the name doesn't offer a complete picture. Keeping only the most recent name variation or address on file may be a hindrance when it comes to matching records across databases or acquired lists. Loshin also believes that intelligence can be gained from "contextual metadata," such as what other marketing lists an individual has appeared on, past employers and even former titles.

But Loshin cautioned, "Don't think you're going to start getting valuable insights from data if you don't have any data quality process in place to begin with or sound data management processes." He said the concept is only useful if a company is capable of managing and processing the data in a meaningful way. Furthermore, companies must evaluate the costs versus the benefits of revamping systems and implementing tools to make use of historical data.

Processing this data is how vendors like Old Greenwich, Conn.-based Identity Systems, an Intellisync company, hope to add value. Identity Systems provides data search and matching software, using algorithms and fuzzy logic, which finds and groups identity data across systems. The software acts like a business user, making matches that might be obvious to a human, but not to a computer.

Identity Systems got its start working with law enforcement and intelligence agencies, which have a culture of saving all data they receive about a person or event, said Ramesh Menon, vice president, North America operations. In those areas, he said, the belief is that a little bit of data -- even outdated data -- could be useful in the future, especially when coupled with information from a new incident or knowledge from another agency.

Menon believes that private businesses can benefit from this kind of thinking as well. Acquisitions, database sharing and regulatory compliance are activities where identity matching can add value and reduce costs, he said.

For example, many companies that learn of a customer who has recently moved or married will simply update their information in the corporate database and delete the prior history. If that company acquires a database with the old name and address on it, it might incorrectly perceive the individual as a new prospect. Had they stored the original customer data, they could have recognized the record as another variation of an existing customer. This kind of issue can have hard costs, associated with it, like duplicate mailings, wasted sales efforts and other implications like customer annoyance.

But when Identity Systems recommends to customers that they "save everything," the idea is not generally well-received, Menon said. Implementing revised database structures, and storing and processing more data are obvious issues, but there's more.

"There's a philosophical hurdle as well," Menon said. "There's a body of thinking these days that data has to be clean and that if it's not clean, it's worthless."

That said, Menon concedes that saving everything is not always worth it for all companies.

"It depends on the potential value of the customer to the organization," Menon said. Or, he added, there may be value in knowing who to turn away, as might be the case with companies regulated by the Patriot Act that are subject to heavy fines for engaging in business with people on watch lists. Fraud detection is a reason many companies should keep historical data.

Ted Friedman, a research vice president at Stamford, Conn.-based Gartner Inc., agrees that there may be some value in historical data. But for most companies, he said, the cost-benefit analysis probably wouldn't support the revamp of systems.

"The degree to which you decide to keep some of the older versions of data is really very industry and application specific," Friedman said, citing potential uses of historical data for law enforcement, financial institutions and fraud detection. "I don't believe that everyone should keep everything."

Dig Deeper on Customer data management

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.