Subject: data mining - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Subject data mining

1. A comparative study of two methodologies for binary datasets analysis

Creator:: Frolov, Alexander, Húsek , Dušan, Polyakov , Pavel Y., and Řezanková, Hana
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: Dimension reduction, statistics, data mining, Boolean factor analysis, Boolean matrix factorization, information gain, likelihood-maximization, and bars problem
Language:: English
Description:: Studied are differences of two approaches targeted to reveal latent variables in binary data. These approaches assume that the observed high dimensional data are driven by a small number of hidden binary sources combined due to Boolean superposition. The first approach is the Boolean matrix factorization (BMF) and the second one is the Boolean factor analysis (BFA). The two BMF methods are used for comparison. First is the M8 method from the BMDP statistical software package and the second one is the method suggested by Belohlavek \& Vychodil. These two are compared to BFA, especially with the Expectation-maximization Boolean Factor Analysis we had developed earlier has, however, been extended with a binarization step developed here. The well-known bars problem and the mushroom dataset are used for revealing the methods' peculiarities. In particular, the reconstruction ability of the computed factors and the information gain as the measure of dimension reduction was under scrutiny. It was shown that BFA slightly loses to BMF in performance when noise-free signals are analyzed. Conversely, BMF loses considerably to BFA when input signals are noisy.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

2. An optimized evolutionary conditional independence Bayesian Classifier induction process

Creator:: Hruschka Jr., Estevam R., dos Santos , Edimilson B., and de O. Galvao, Sebastian D. C.
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: Bayesian learning, genetic algorithms, data mining, and hybrid intelligent systems
Language:: English
Description:: Bayesian Networks (BNs) are graphical models which represent multivariate joint probability distributions which have been used successfully in several studies in many application areas. BN learning algorithms can be remarkably effective in many problems. The search space for a BN induction, however, has an exponential dimension. Therefore, finding the BN structure that better represents the dependencies among the variables is known to be a NP problem. This work proposes and discusses a hybrid Bayes/Genetic collaboration (VOGAC-MarkovPC) designed to induce Conditional Independence Bayesian Classifiers from data. The main contribution is the use of MarkovPC algorithm in order to reduce the computational complexity of a Genetic Algorithm (GA) designed to explore the Variable Orderings (VOs) in order to optimize the induced classifiers. Experiments performed in a number of datasets revealed that VOGAC-MarkovPC required less than 25% of the time demanded by VOGAC-PC on average. In addition, when concerning the classification accuracy, VOGAC-MarkovPC performed as well as VOGAC-PC did.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

3. Cellular generic programming algorithm applied to classification task

Creator:: Takac, Alexandra
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: genetic programming, cellular genetic algorithms, data mining, classification, evolutionary algorithms, knowledge discovery in databases, and machine learning
Language:: English
Description:: The focus of this paper is the application of the genetic programming framework in the problem of knowledge discovery in databases, more precisely in the task of classification. Genetic programming possesses certain advantages that make it suitable for application in data mining, such as robustness of the algorithm or its convenient structure for rule generation to name a few. This study concentrates on one type of parallel genetic algorithms - cellular (diffusion) model. Emphasis is placed on the improvement of efficiency and scalability of the data mining algorithm, which could be achieved by integrating the algorithm with databases and employing a cellular framework. The cellular model of genetic programming that exploits SQL queries is implemented and applied to the classification task. The results achieve are presented and compared with other machine learning algorithms.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

4. Clustering of large number of stock market trading rules

Creator:: Lipinski , Piotr
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: Clustering, self-organizing maps, data mining, stock market financial time series, and stock market trading rules
Language:: English
Description:: This paper addresses the problem of clustering in large sets discussed in the context of financial time series. The goal is to divide stock market trading rules into several classes so that all the trading rules within the same class lead to similar trading decisions in the same stock market conditions. It is achieved using Kohonen self-organizing maps and the K-means algorithm. Several validity indices are used to validate and assess the clustering. Experiments were carried out on 350 stock market trading rules observed over a period of 1300 time instants.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

5. Data mining

Creator:: Zelinka, Ivan and Procházka, Michal
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: data mining and dolování dat
Language:: Czech
Description:: Data mining is a set of methods for data processing with the aim to obtain non-trivial information not apparent at first glance usually due to the huge data volume or their complexity. This new scientific discipline helps to solve problems of this kind. and Data mining, neboli dolování dat, je soubor metod sloužících ke zpracování dat a získání netriviálních informací v nich obsažených, které nejsou na první pohled zřejmé a ani zkušení odborníci je nejsou schopni odhalit, zejména z důvodu velikosti datových souborů nebo komplexnosti vazeb. Proto vznikl data mining jako vědní obor, který za pomoci moderní výpočetní techniky řeší podobné problémy.
Rights:: http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public

6. Discovering stock price prediction rules using rough sets

Creator:: Al-Qaheri, Hameed, Hassanien, Aboul Ella, and Abraham , Ajith
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: Rough set analysis, rule generation, stock price prediction, data mining, and rough reducts
Language:: English
Description:: The use of computational intelligence systems such as neural networks, fuzzy set, genetic algorithms, etc., for stock market predictions has been widely established. This paper presents a generic stock pricing prediction model based on a rough set approach. To increase the efficiency of the prediction process, rough sets with Boolean reasoning discretization algorithm is used to discretize the data. The rough set reduction technique is applied to find all the reducts of the data which contains the minimal subset of attributes that are associated with a class label for prediction. Finally, rough sets dependency rules are generated directly from all generated reducts. Rough confusion matrix is used to evaluate the performance of the predicted reducts and classes. Using a data set consisting of the daily movements of a stock traded in Kuwait Stock Exchange, a preliminary assessment indicates that rough sets are shown to be applicable and is an effective tool to achieve this goal. For comparison, the results obtained using the rough set approach were compared to that of the neural networks algorithm and it was shown that the Rough set approach has a higher overall accuracy rate and generates more compact and fewer rules than the neural networks.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

7. GUHA method supported classification of EEG data

Creator:: Coufal, David
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: micro-sleep, EEG analysis, data mining, and classification trees
Language:: English
Description:: The research reported in the paper is a part of a large project aiming at designing an automatic device for the micro-sleep events detection. In the paper we are interested in the classification of EEG spectrograms with respect to the level of attention (mentation, relaxation, micro-sleep) of a monitored person (a proband). Data mining techniques are ušed for developing a classification model. Namely, GUHA method is employed for this purpose. It is a method of exploratory data analysis established on logical and statistical bases that has been continuously developed for last 40 years in the Czech Republic.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

8. Implication, equivalence and agreement: Kappa coefficient as a measure of degree of equivalence

Creator:: Pokorny, Dan
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: kappa coefficient, agreement measures, equivalence, automated hypothesis formation, data mining, GUHA, and ASSOC
Language:: English
Description:: The Cohen’s kappa coefficient is a widely accepted mecisure of agreement on categorical variables and has replaced some older simpler measures. Observational and statistical properties of the kappa coefficient in 2 x 2 tables are investigated. The asymmetrical measure “Cohenized implication” is proposed. The decomposition of the symmetrical measure kappa into two asymmetrical components is shown. These statistically motivated measures are discussed as weakened forms of strict logical notions of equivalence and implication. Applications of kappa and “Cohenized implication” are recommended; on the one hand in the medical research as a supplement to traditional measures of sensitivity and speciíity, on the other hand as quantifiers in the GUHA proceduře ASSOC as a statistically contemporary operationalization of the weakened equivalence.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

9. Improved hybrid intelligent intrusion detection system using AI technique

Creator:: Shanmugam , Bharanidharan and Idris , Norbik Bashah
Type:: model:article and TEXT
Subject:: Intrusion detection, network security, data mining, fuzzy logic, and attribute selection
Language:: English
Description:: Intrusion detection systems are increasingly a key part of systems defense. Various approaches to intrusion detection are currently being used, but they are relatively ineffective. Artificial Intelligence plays a driving role in security services. This paper proposes a dynamic model of intelligent intrusion detection system, based on a specific AI approach for intrusion detection. The techniques that are being investigated include fuzzy logic with network profiling, which uses simple data mining techniques to process the network data. The proposed hybrid system combines anomaly and misuse detection. Simple fuzzy rules allow us to construct if-then rules that reflect common ways of describing security attacks. We use DARPA dataset for training and benchmarking.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

10. McNaughton theorem of fuzzy logic from a data-mining point of view

Creator:: Holeňa, Martin
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: Łukasiewicz logic, McNaughton theorem, data mining, rules extraction from data, and piecewise-linear neural networks
Language:: English
Description:: The paper recalls the McNaughton theorem of fuzzy logic and the algorithms underlying its constructive proofs. It then shows how those algorithms can be combined with the algorithm underlying recent extension of the theorem to piecewise-linear functions with rational coefficients, and points out potential importance of the resulting combined algorithm for data mining. That result is immediately weakened through a complexity analysis of the algorithm that reveals that its worst-case complexity is doubly-exponential.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

1. A comparative study of two methodologies for binary datasets analysis

2. An optimized evolutionary conditional independence Bayesian Classifier induction process

3. Cellular generic programming algorithm applied to classification task

4. Clustering of large number of stock market trading rules

5. Data mining

6. Discovering stock price prediction rules using rough sets

7. GUHA method supported classification of EEG data

8. Implication, equivalence and agreement: Kappa coefficient as a measure of degree of equivalence

9. Improved hybrid intelligent intrusion detection system using AI technique

10. McNaughton theorem of fuzzy logic from a data-mining point of view

Limit your search

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Coverage

Creator

Show values starting with

Format

Language

Rights

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from