The purpose of feature selection in machine learning is at least two-fold - saving measurement acquisition costs and reducing the negative effects of the curse of dimensionality with the aim to improve the accuracy of the models and the classification rate of classifiers with respect to previously unknown data. Yet it has been shown recently that the process of feature selection itself can be negatively affected by the very same curse of dimensionality - feature selection methods may easily over-fit or perform unstably. Such an outcome is unlikely to generalize well and the resulting recognition system may fail to deliver the expectable performance. In many tasks, it is therefore crucial to employ additional mechanisms of making the feature selection process more stable and resistant the curse of dimensionality effects. In this paper we discuss three different approaches to reducing this problem. We present an algorithmic extension applicable to various feature selection methods, capable of reducing excessive feature subset dependency not only on specific training data, but also on specific criterion function properties. Further, we discuss the concept of criteria ensembles, where various criteria vote about feature inclusion/removal and go on to provide a general definition of feature selection hybridization aimed at combining the advantages of dependent and independent criteria. The presented ideas are illustrated through examples and summarizing recommendations are given.
The paper gives an overview of feature selection techniques in statistical pattern recognition with particular emphasis on methods developed within the Institute of Information Theory and Automation research team throughout recent years. Besides discussing the advances in methodology since times of Perez's pioneering work the paper attempts to put the methods into a taxonomical framework. The methods discussed include the latest variants of the optimal algorithms, enhanced sub-optimal techniques and the simultaneous semi-parametric probability density function modelling and feature space selection method. Some related issues are illustrated on real data by means of the Feature Selection Toolbox software.
Considering the statistical recognition of multidimensional binary observations we approximato the unknown class-conditioiial probability distributions by multivariate Bernoulli mixtures. We show that both the parameter optimization and the resulting Bayesian decision-making can be realized by a probabilistic neural network having strictly modular properties. In particular, the process of learning based on the EM algorithm can be perfomied by means of a sequential autonomous adaptation of neurons involving only the infomiation from the input synapses and the interior of neurons. In this sense the probabilistic neural network can be designed automatically. The properties of the sequential strictly modular learning procedure are illustrated by mumerical exainples.