Expert: Data quality is misunderstood

While many businesses today have an abundant supply of data, many of them lack good processes for keeping it clean and accurate. Larry English, president of Information Impact International, an information quality consultancy based in Brentwood, Tenn., believes too many companies focus on correcting bad data instead of eliminating it altogether. English, famed for his development of the Total Information Quality Management (TIQM,®) model, recently spoke with and provided some pointers for companies that seek to improve information quality.

Larry English, president of Information Impact International
Larry English What is the best way for companies to get started with data quality programs?
The first step is to assess whether the company understands the principles of quality management as applied to data and information. Sometimes organizations have a real technology bias to implementing data quality. They tend to implement profiling tools to discover the patterns in the data or data cleansing tools that address the correction of data. But information quality management is more than just data in the database.

"Most organizations don't understand real quality management principles. Data quality is not just data cleanup."
Larry English, What is the most common mistake organizations make when it comes to data quality improvement?
Most organizations don't understand real quality management principles. Data quality is not just data cleanup. Data cleanup or data correction is the equivalent of "information scrap and rework" with defective products in manufacturing. You have to either fix them or scrap them. Data is subject to information quality decay -- people move, people get married, people get divorced -- and if we do not have processes in place to capture that updated information, then we will be condemning ourselves to continual data correction. Real information quality improvement applies process improvement. When you find defective data, you need to determine the root cause of the problem and once you do that, you can define processes to prevent the recurrence of those defects. What are some techniques for assessing information quality?
Profiling to discover anomalies is one. Another, validity assessment, is to define business rules and measure for conformance. But one of the most important techniques is accuracy assessment. With customer information, for example, you have to verify the data that you have against the real world subject that the data represents. It requires actually taking a sample of customer records and then contacting the customers to verify the information. When one of my clients -- a large financial organization -- assessed their customer data, they did a validity test. That is, they measured whether the customers' marital status codes had valid values ("M" for married, "S" for single, etc.). There were virtually no errors -- no invalid values. However, when they contacted the customers to verify it, they learned that 23.3% of those codes, while valid, were inaccurate. So it takes a combination of sound processes -- including measuring for accuracy and not just validity -- and then a process improvement, not just data correction or cleansing.

Best practices for customer information quality

Larry English recommends applying these best practices:


  • Define data consistently across the enterprise.
  • Implement data defect-prevention processes and tools.
  • Allow customers to update their data directly.
  • Develop easy-to-follow data capture procedures.
  • Create a "last verified date" attribute for customer data.
  • Use points of contact to verify customer knowledge and capture observed complaints. Measure information quality, provide feedback and improve processes to prevent defect recurrence.

More tips available on the Information Impact Web site.

Right. One problem is that we have automated silos of information and processes, which means a lot of redundancy. If all of the databases had data that was named and defined consistently with consistent values, we'd have a much easier time of doing reconciliation and identifying duplicate customers. But most of the time, they are defined uniquely to a given departmental line of business. Managing the business down the silos creates fiefdoms and that creates politics. The politics are there because most organizations reward behavior that is individual. But when it comes to creating information for other parts of the business, there is not a willingness because there is no reward. So we have to reorient the organization to a horizontal, value-chain view. What exactly does that entail?
English: We have to reorient the organization to a horizontal, value-chain view. Information tends to be produced in one part of the business, but used in another part, usually downstream. The real accountability is not with an appointed data steward but with the manager who oversees processes that create or update data. That manager needs to be accountable for the quality of information produced by their processes for others in the organization who depend on it. It's a supplier/customer relationship. And there are forward-thinking companies that are putting that accountability into managers' job descriptions. We have the technology, but we have not understood the principles of the Information Age. The Information Age requires management of processes horizontally across the value chains, and it requires holding managers accountable for their information. As a result, we have forced our knowledge workers to become information hunters and gatherers. Do you think vendors understand this "value chain approach" to data quality?
Some of them do. But in many cases, even if software providers do have a process methodology, it tends to be based on the features and functionality of their tools and not on quality management principles, per se. What should companies look for when evaluating data quality tools and technology?
There are a variety of information quality software functionalities such as profiling or cleansing. Defect prevention is another example and it's actually the highest form of software capability in the information quality space. But companies need to understand the problems they are seeking to solve. Stop focusing on fixing the defects (i.e., "scrap and rework") because it's costly. Experts like [W. Edwards] Deming have taught us that there's a better way: designing quality in. So my advice is to look for tools that help solve the process problem -- tools not just for cleansing, but also for defect prevention capabilities. And it's important to understand the limitations of technology -- electronically you cannot correct all data and you may or may not be able to guarantee the accuracy of the data provided. The best place to solve the problems with defective data is right at the point of knowledge capture -- verify the information is correct and complete. The little bit of time it takes to do that prevents so much grief and customer alienation.

Dig Deeper on Customer relationship management (CRM)

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.