Relations between two Boolean attributes derived from data can be
quantified by truth functions defined on four-fold tables corresponding to pairs of the attributes. Several classes of such quantifiers (implicational, double implicational, equivalence ones) with truth values in the unit interval were investigated in the frame of the theory of data mining methods. In the fuzzy logic theory, there are well-defined classes of fuzzy operators, namely t-norms representing various types of evaluations of fuzzy conjunction (and t-conorms representing fuzzy disjunction), and operators of fuzzy implications.
In the contribution, several types of constructions of quantifiers using fuzzy operators are described. Definitions and theorems presented by the author in previous contributions to WUPES workshops are summarized and illustrated by examples of well-known quantifiers and operators.
There is growing interest to analyze electroencephalogram (EEG) signals with the objective of classifying schizophrenic patients from the control subjects. In this study, EEG signals of 15 schizophrenic patients and 19 age-matched control subjects are recorded using twenty surface electrodes. After the preprocessing phase, several features including autoregressive (AR) model coefficients, band power and fractal dimension were extracted from their recorded signals. Three classifiers including Linear Discriminant Analysis (LDA), Multi-LDA (MLDA) and Adaptive Boosting (Adaboost) were implemented to classify the EEG features of schizophrenic and normal subjects. Leave-one (participant)-out cross validation is performed in the training phase and finally in the test phase; the results of applying the LDA, MLDA and Adaboost respectively provided 78%, 81% and 82% classification accuracies between the two groups. For further improvement, Genetic Programming (GP) is employed to select more informative features and remove the redundant ones. After applying GP on the feature vectors, the results are remarkably improved so that the classification rate of the two groups with LDA, MLDA and Adaboost classifiers yielded 82%, 84% and 93% accuracies, respectively.
Giant pandas (Ailuropoda melanoleuca) are now confined to fragmented habitats in western China, with more than 60 % of individuals inhabiting 63 protected areas. Knowledge of the environmental features required by giant pandas is critically important for protected area spatial arrangement and subsequent assessments. Here we developed a distribution model for giant pandas in the Tangjiahe Nature Reserve using Ecological Niche Factor Analysis (ENFA) model. We found that less than 40 % of this key reserve is of high suitability for giant pandas, highly suitable habitat being primarily characterized as coniferous forests away from roads within the reserve. Although there was a clear core zone occupied by giant pandas, which included the vast majority of known giant panda locations, only about 45 % of this zone was classified as highly suitable habitat (suitable and optimal). Therefore, the spatial arrangement within the reserve may need to be modified to effectively manage the remaining population of giant pandas. Of particular concern are several tourism proposals being considered by local government, which, if implemented, will increase the isolation of the local population from those in the surrounding area. Our analysis identifies Caijiaba and Baixiongping as areas that should become conservation priorities. Our approach provides valuable data to advise conservation policy and could be easily replicated across other protected areas.
Integrated Population Modelling (IPMs) is a computational method for estimating population and demographic parameters that can improve precision relative to traditional methods. Here we compare the precision of IPM to traditional mark-recapture analysis to estimate population parameters in the common dormouse (Muscardinus avellanarius). This species is relatively rare across its European range and field estimation of demographic parameters can be challenging, as several parts of the life history are difficult to observe in the field. We develop an IPM model incorporating dormouse nest counts and offspring counts, which is data often recorded as a standard part of dormouse nest box monitoring. We found a significant improvement in precision in the estimation of demographic parameters using IPM compared to standard mark-recapture estimation. We discuss our results in the context of common dormouse conservation monitoring.
Mass releases of Trichogramma confusum Viggiani and T. maidis Pintureau & Voegele are widely used to control lepidopterous pests. They have long been considered to be the subspecies of T. chilonis Ishii and T. brassicae Bezdenko, respectively. To re-examine the taxonomic status of these closely related Trichogramma species, the internally transcribed spacer 2 (ITS2) of ribosomal DNA was used as a molecular marker to detect between-species differences. The ITS2 regions of 7 different Trichogramma species collected from China, Germany and France were sequenced and the inter-species distances were calculated. To quantify within-species sequence variation, the ITS2 regions of 6 geographical populations of T. dendrolimi Matsumura collected from across China were sequenced and compared. The results show that the ITS2 sequences of T. confusum and T. maidis are sufficiently different from those of T. chilonis and T. brassicae, respectively, that it is difficult to group them as cryptic species, whereas there are only minor differences between the T. dendrolimi populations. The ITS2 sequences identified in this study, coupled with 67 ITS2 sequences from a wide geographical distribution retrieved from GenBank, were then used for phylogenetic analyses. The results support previous records of minor within-species ITS2 sequence divergence and distinct interspecies differences. The cladograms show the T. maidis sequence clustered within T. evanescens Westwood, while the ITS2 sequences of T. confusum and T. chilonis are clustered in different branches. Taken together, these data suggest that T. maidis is not T. brassicae, but a cryptic or sibling species of T. evanescens; T. confusum and T. chilonis are not cryptic species but two closely related sister species.
Several algorithms have been developed for time series forecasting. In this paper, we develop a type of algorithm that makes use of the numerical methods for optimizing on objective function that is the Kullbak-Leibler divergence between the joint probability density function of a time series xi, X2, Xn and the product of their marginal distributions. The Grani-charlier expansion is ušed for estimating these distributions.
Using the weights that have been obtained by the neural network, and adding to them the Kullback-Leibler divergence of these weights, we obtain new weights that are ušed for forecasting the new value of Xn+k.
The paper deals with application of MF-ARTMAP neural network on
financial fraud data. The focus was on classification of data into 5 types of fraud based on expert knowledge with the aim to achieve the tool with highest classification accuracy. The fraud was characterized by 22 features and the verbal features were encoded into numerical values to be able to use them in the classification proceduře. The results show that in the čase of sufficient data (fraud) representation neural networks could be used with success; in case there are rather small examples, expert generated rules are preferred.
The goal of this paper is to demonstrate the usefulness of path dependence theory to explain the convergence of housing regimes among post-socialist countries, both at the beginning and in the later phases of housing-regime transformation. We especially seek to show the selected common traps that were recently created by the legacy of giveaway privatisation and the super-homeownership regime, traps that increase intergenerational inequality, which to now has been effectively mitigated by within-family financial transfers.
Hospitals must index each case of inpatient medical care with codes from the International Classification of Diseases, 9th Revision (ICD-9), under regulations from the Bureau of National Health Insurance. This paper aims to investigate the analysis of free-textual clinical medical diagnosis documents with ICD-9 codes using state-of-the-art techniques from text and visual mining fields. In this paper, ViSOM and SOM approaches inspire several analyses of clinical diagnosis records with ICD-9 codes. ViSOM and SOM are also used to obtain interesting patterns that have not been discovered with traditional, nonvisual approaches. Furthermore, we addressed three principles that can be used to help clinical doctors analyze diagnosis records effectively using the ViSOM and SOM approaches. The experiments were conducted using real diagnosis records and show that ViSOM and SOM are helpful for organizational decision-making activities.
Credit risk assessment, credit scoring and loan applications approval are one of the typical tasks that can be performed using machine learning or data mining techniques. From this viewpoint, loan applications evaluation is a classification task, in which the final decision can be either a crisp yes/no decision about the loan or a numeric score expressing the financial standing of the applicant. The knowledge to be used is inferred from data about past decisions. These data usually consist off both socio-demographic and economic characteristics of the applicant (e.g., age, income, and deposit), the characteristics of the loan, and the loan approval decision. A number of machine learning algorithms can be used for this purpose. In this paper we show how this task can be performed using the LISp- Miner system, a tool that is under development at the University of Economics, Prague. LISp-Miner is primarily focused on mining for various types of association rules, but unlike "classical" association rules proposed by Agrawal, LISp-Miner in- troduces a greater variety of different types of relations between the left-hand and right-hand sides of a rule. Two other procedures that can be used for classification task are implemented in LISp-Miner as well. We describe the 4ft-Miner and KEX procedures and show how they can be used to analyze data related to loan applications. We also compare the results obtained using the presented algorithms with results from standard rule-learning methods.