This paper addresses the problem of clustering in large sets discussed in the context of financial time series. The goal is to divide stock market trading rules into several classes so that all the trading rules within the same class lead to similar trading decisions in the same stock market conditions. It is achieved using Kohonen self-organizing maps and the K-means algorithm. Several validity indices are used to validate and assess the clustering. Experiments were carried out on 350 stock market trading rules observed over a period of 1300 time instants.
We consider the problem of separating noisy overcomplete sources from linear mixtures, i.e., we observe N mixtures of M > N sparse sources. We show that the ``Sparse Coding Neural Gas'' (SCNG) algorithm [8,9] can be employed in order to estimate the mixing matrix. Based on the learned mixing matrix the sources are obtained by orthogonal matching pursuit. Using synthetically generated data, we evaluate the influence of (i) the coherence of the mixing matrix, (ii) the noise level, and (iii) the sparseness of the sources with respect to the performance that can be achieved on the representation level. Our results show that if the coherence of the mixing matrix and the noise level are sufficiently small and the underlying sources are sufficiently sparse, the sources can be estimated from the observed mixtures. In order to apply our method to real-world data, we try to reconstruct each single instrument of a jazz audio signal given only a two-channel recording. Furthermore, we compare our method to the well-known FastICA [4] algorithm and show that in case of sparse sources and presence of additive noise, our method provides a superior estimation of the mixing matrix.
Kohonen's self-organizing maps are a recognized tool for finding representative data vectors and clustering the data. To what extent is it possible to preserve the topology of the data in the constructed planar map? We address the question looking at distal data points located at the peripherals of the data cloud and their position in the provided map. Several data sets have been investigated; we present the results for two of them: the Glass data (dimension d=7) and the Ionosphere data (dimension d=32). It was found that the distal points are reproduced either at the edges (borders) of the map, or at the peripherals of dark regions visualized in the maps.
P300 brain-computer interfaces (BCIs) have been gaining attention in recent years. To achieve good performance and accuracy, it is necessary to optimize both feature extraction and classification algorithms. This article aims at verifying whether supervised learning models based on self-organizing maps (SOM) or adaptive resonance theory (ART) can be useful for this task. For feature extraction, the state-of-the-art Windowed means paradigm was used. For classification, proposed classifiers were compared with state-of-the-art classifiers used in BCI research, such as Bayesian Linear Discriminant Analysis, or shrinkage LDA. Publicly available datasets from fifteen healthy subjects were used for the experiments. The results indicated that SOM-based models yield better results than ART-based models. The best performance was achieved by the LASSO model that was comparable to state-of-the-art BCI classifiers. Further possibilities for improvements are discussed.
This paper discusses the cluster analysis and visualisation tool, the self-organizing map (SOM). The pros and cons of different network sizes are discussed, in particular how they are suited to the purposes of direct data browsing and also the cluster analysis with U-matrices. The tree-structured SOM (TS-SOM) [4, 5] is proposed as a method of acquiring multi-resolution/multi-purpose mappings of a given input space. The TS-SOM is discussed in detail and a novel modification to the algorithm that improves its reliability as a multi-resolution visualization method is presented.
We introduce topographic versions of two latent class models for collaborative filtering. Topographic organization of latent classes makes orientation in rating/preference patterns captured by the latent classes easier and more systematic. Furthermore, since we deal with probabilistic models of the data, we can readily use tools from probability and information theories to interpret and visualize information extracted by the model. We apply our models to a large collection of user ratings for films.