Problem solve Get help with specific problems with your technologies, process and projects.

Processes and techniques in data mining

Can you please explain the differences between the process and techniques used in data mining? Which technique is best suited for what type of data ?

There are a wide variety of data mining techniques currently available. They include neural networks, decision trees, support vector machines, Bayesian networks, nearest neighbor classification, and many others. In addition, each technique typically contains a number of flavors, such as the CHAID, CART, and C4.5 types of decision trees.

The underlying process used to build models is the same regardless of the regardless of the specific technique chosen (modulo some minor variations primarily in the pre and post processing steps). The basic idea is that you take your data and split it into two parts. The first part is fed to the data mining system so that the model can be built. Once the completed model is available, the second data set is fed to the model to evaluate the model's quality. After the model builder is satisfied with the quality of the model, it can be applied to new data in a process called scoring. The model is then re-used over and over on new data, until the predictions that it makes are no longer of sufficient quality (this determination involves analyzing the actual behavior predicted by the model and comparing the two).

In terms of selecting the best technique for a particular type of data, there are no rules which can be relied on reliably. Some techniques might not be a good match for the raw data but most data mining systems can cope by transforming the data (e.g., neural networks typically need the inputs to be numbers between 0 and 1 so categorical data needs to be manipulated before a neural network model can be created). The best approach to model creation is to try a number of different techniques and then statistically compare the results. This involves some work, setting up the experiments, but it is the only way to consistently generate good models.

Dig Deeper on Customer data management

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.