Validating cluster structures in data mining tasks

04 Apr

Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters.

Clustering is a main task of explorative data mining, and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.

It uses routines, often called "validation rules" or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system.

The rules may be implemented through the automated facilities of a data dictionary, or by the inclusion of explicit application program validation logic.

In molecular biology protein structure describes the various levels of organization of protein molecules.

You use cross-validation after you have created a mining structure and related mining models to ascertain the validity of the model.This wealth of choices means that you can easily produce many sets of different results that must then be compared and analyzed.This section provides information to help you configure cross-validation appropriately.Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure.It is often necessary to modify data preprocessing and model parameters until the result achieves the desired properties.