When your entire business is based on analyzing hundreds of millions of consumer data records, that data had better be clean.
Hingham, Mass.-based Intellidyn Corp. houses one of the largest repositories of U.S. consumer data -- about 700 million records of combined information from credit bureau databases, data providers and real estate databases. It had to seriously consider performance and scalability in its data quality tools decision, according to Peter Harvey, president and chief executive officer.
The company builds master consumer profiles and customized marketing data warehouses for clients and provides data mining and analytics services. Companies of all sizes, including financial institutions, travel agents and retailers, use the services to drive acquisition, retention or cross-sell activities.
Keeping the massive data set in order is no trivial matter, Harvey said. Intellidyn regularly gets huge update files from its data providers. Each credit bureau update, for example, contains about 190 million records that need to be matched, compared and potentially updated in the master database. Until five years ago, the company managed this process with a custom-built data quality system, and it took about three to five days to integrate each update file. A need for increased speed and scalability sent them searching for alternatives, Harvey said.
Intellidyn evaluated 10 data quality tools in its quest. Many systems required proprietary platforms, which Intellidyn didn't want because it needed to run the software on hardware that could support cleansing and mining around 40 terabytes of data, Harvey explained.
The company settled on Cary, N.C.-based DataFlux Corp., partly because of its multi-processor technology. In 2001, Intellidyn implemented the new data quality tools on a hardware platform of integrated Sun servers and Hitachi storage. The DataFlux technology enabled Intellidyn to multi-thread data quality jobs across processors, which increased the speed and volume of data that it could put through the system. Now, it takes only 12 hours -- rather than three to five days -- to integrate the updated files into Intellidyn's databases, Harvey said. This translates into faster, more current data for its customers and less time and effort required for data cleansing.
The data quality tools also help Intellidyn match and group consumers by household, at the same address, a practice known as "householding" that helps marketers understand which of their customers or targets live together. Householding isn't always as simple as matching last name or address, and it often requires matching across other attributes -- another time-consuming process made faster with the DataFlux tools.
Cleansed, accurate and current consumer data is the foundation of Intellidyn's market data management business, Harvey said.
"It's completely different from what I call '1970s marketing,' where people used to just buy lists and solicit them. There is so much data available now about people, and you can use the data to discriminate what makes a buyer different from a non-buyer," he said.
Intellidyn often starts by analyzing a company's current customer data and past marketing campaign performance. Using data quality tools, it matches this information to its master database records, potentially appending information to the clients' lists. Then, using data mining tools from Cary, N.C.-based SAS Institute Inc., DataFlux's parent company, analysts review all of this data to create a "model," which details the attributes of someone likely to buy, Harvey explained. He said that campaigns with these kinds of analytical models created at the outset do three times better than non-model campaigns.
"The data hygiene drives the match rates. The match rates drive the model. The model drives how well the marketing campaigns will do," Harvey said.
After a marketing campaign is completed, companies send Intellidyn a file with the results of the campaign or which consumers responded. Again, the data provider matches the list to the database. Data quality tools are critical to this process, catching things that might not be evident in a less sophisticated review. For example, Harvey asked, what if a solicitation sent to a man at home reaches his wife, who responds using her business address and maiden name? The matching algorithms and householding feature of data quality tools helps Intellidyn ascertain that the two records are in fact related, despite looking different on the surface.
The next frontier for Intellidyn is real-time services, Harvey said. Intellidyn's data quality platform handles volume well, but he sees it becoming faster and integrating with clients via a service-oriented architecture to provide instantaneous consumer information. Ultimately, he envisions a scenario where a consumer would visit a client's Web site and provide a tiny bit of information. On the backend, that data would go to Intellidyn, which would match it to the master record, append additional data -- such as the consumer's phone number -- and pass it back to the sales office in near real time.
This article originally appeared on SearchDataManagement.com