Aiming to hide the real money gains and to avoid taxes, fictive prices are sometimes recorded in the real estate transactions. This paper is concerned with artificial neural networks based screening of real estate transactions aiming to categorize them into "clear" and "fictitious" classes. The problem is treated as an outlier detection task. Both unsupervised and supervised approaches to outlier detection are studied here. The soft minimal hyper-sphere support vector machine (SVM) based novelty detector is employed to solve the task without the supervision. In the supervised case, the effectiveness of SVM, multilayer perceptron (MLP), and a committee based classification of the real estate transactions are studied. To give the user a deeper insight into the decisions provided by the models, the real estate transactions are not only categorized into "clear" and "fictitious" classes, but also mapped onto the self organizing map (SOM), where the regions of "clear", "doubtful" and "fictitious" transactions are identified. We demonstrate that the stability of the regions evolved in the SOM during training is rather high. The experimental investigations performed on two real data sets have shown that the categorization accuracy obtained from the supervised approaches is considerably higher than that obtained from the unsupervised one. The obtained accuracy is high enough for the technique to be used in practice.
Different approaches have been proposed to determine the possible outliers existing in a dataset. The most widely used consists in the application of the data snooping test over the least squares adjustment results. This strategy is very likely to succeed for the case of zero or one outliers but, contrary to what is often assumed, the same is not valid for the multiple outlier case, even in its iterative application scheme. Robust estimation, computed by iteratively reweighted least squares or a global optimization method, is other alternative approach which often produces good results in the presence of outliers, as is the case of exhaustive search methods that explore elimination of every possible set of observations. General statements, having universal validity, about the best way to compute a geodetic network with multiple outliers are impossible to be given due to the many different factors involved (type of network, number and size of possible errors, available computational force, etc.). However, we see in this paper that some conclusions can be drawn for the case of a leveling network, which has a certain geometrical simplicity compared with planimetric or three-dimensional networks though a usually high number of unknowns and relatively low redundancy. Among other results, we experience the occasional failure in the iterative application of the data snooping test, the relatively successful results obtained by both methods computing the robust estimator, which perform equivalently in this case, and the successful application of the exhaustive search method, for different cases that become increasingly intractable as the number of outliers approaches half the number of degrees of freedom of the network.