The present work proposes a hybrid neural network based model for rainfall prediction in the Southern part of the state West Bengal of India. The hybrid model is a multistep method. Initially, the data is clustered into a reasonable number of clusters by applying fuzzy c-means algorithm, then for every cluster a separate Neural Network (NN) is trained with the data points of that cluster using well known metaheuristic Flower Pollination Algorithm (FPA). In addition, as a preprocessing phase a feature selection phase is included. Greedy forward selection algorithm is employed to find the most suitable set of features for predicting rainfall. To establish the ingenuity of the proposed hybrid prediction model (Hybrid Neural Network or HNN) has been compared with two well-known models namely multilayer perceptron feed-forward network (MLP-FFN) using different performance metrics. The data set for simulating the model is collected from Dumdum meteorological station (West Bengal, India), recorded with in the 1989 to 1995. The simulation results have revealed that the proposed model is significantly better than traditional methods in predicting rainfall.
Policy makers and researchers require raw data collected from agencies and companies for their analysis. Nevertheless, any transmission of data to third parties should satisfy some privacy requirements in order to avoid the disclosure of sensitive information. The areas of privacy preserving data mining and statistical disclosure control develop mechanisms for ensuring data privacy. Masking methods are one of such mechanisms. With them, third parties can do computations with a limited risk of disclosure. Disclosure risk and information loss measures have been developed in order to evaluate in which extent data is protected and in which extent data is perturbated. Most of the information loss measures currently existing in the literature are general purpose ones (i. e., not oriented to a particular application). In this work we develop cluster specific information loss measures (for fuzzy clustering). For this purpose we study how to compare the results of fuzzy clustering. I. e., how to compare fuzzy clusters.
A quality of centroid-based clustering is highly dependent on initialization. In the article we propose initialization based on the probability of finding objects, which could represent individual clusters. We present results of experiments which compare the quality of clustering obtained by k-means algorithm and by selected methods for fuzzy clustering: FCM (fuzzy c-means), PCA (possibilistic clustering algorithm) and UPFC (unsupervised possibilistic fuzzy clustering) with different initializations. These experiments demonstrate an improvement in the quality of clustering when initialized by the proposed method. The concept how to estimate a ratio of added noise is also presented.