As the volume and variety of information sources, especially on the World Wide Web (WWW), continue to grow, the requirements imposed on search applications are steadily increasing. The amount of available data is growing and so do the user demands. Search application should provide the users with accurate, sensible responses to their requests. It is difficult to provide information that accurately matches user information needs. Search effectiveness can be seen as the accuracy of matching user information needs against the retrieved information. There are problems emerging: users often do not present search queries in the form that optimally represents their information need, the measure of a document's relevance is often highly subjective between different users, and information sources might contain heterogeneous documents, in multiple formats and the representation of documents is not unified. This contribution presents a proposal to improve web search effectiveness via evolutionary optimization of the Boolean and vector search queries based on individual user models.
Linear ordering problem is a well-known optimization problem attractive for its complexity (it is an NP-hard problem), rich library of test data and variety of real world applications. In this paper, we investigate the use and performance of two variants of genetic algorithms, mutation only genetic algorithms and higher level chromosome genetic algorithm, on the linear ordering problem. Both methods are tested and evaluated on a library of real world and artificial linear ordering problem instances.
Since their appearance in 1993, first approaching the Shannon limit, turbo codes have given a new direction in the channel encoding field, especially since they have been adopted for multiple norms of telecommunications such as deeper communication. A robust interleaver can significantly contribute to the overall performance a turbo code system. Search for a good interleaver is a complex combinatorial optimization problem. In this paper, we present genetic algorithms and differential evolution, two bio-inspired approaches that have proven the ability to solve non-trivial combinatorial optimization tasks, as promising optimization methods to find a well-performing interleaver for large frame sizes.
Descriptive analysis of the magnitude and situation of road safety in general and road accidents in particular is important, but understanding of data quality, factors related with dangerous situations and various interesting patterns in data is of even greater importance. Under the umbrella of information architecture research for road safety in developing countries, the objective of this machine learning experimental research is to explore data quality issues, analyze trends and predict the role of road users on possible injury risks. The research employed TreeNet, Classification and Adaptive Regression Trees (CART), Random Forest (RF) and hybrid ensemble approach. To identify relevant patterns and illustrate the performance of the techniques for the road safety domain, road accident data collected from Addis Ababa Traffic Office is subject to several analyses. Empirical results illustrate that data quality is a major problem that needs architectural guideline and the prototype models could classify accidents with promising accuracy. In addition, an ensemble technique proves to be better in terms of predictive accuracy in the domain under study.
Matrix factorization or factor analysis is an important task helpful in the analysis of high dimensional real world data. There are several well known methods and algorithms for factorization of real data but many application areas including information retrieval, pattern recognition and data mining require processing of binary rather than real data. Unfortunately, the methods used for real matrix factorization fail in the latter case. In this paper we introduce background and initial version of Genetic Algorithm for binary matrix factorization.
This article presents an application of evolutionary fuzzy rules to the modeling and prediction of power output of a real-world Photovoltaic Power Plant (PVPP). The method is compared to artificial neural networks and support vector regression that were also used to build predictors in order to analyse a time-series like data describing the production of the PVPP. The models of the PVPP are created using different supervised machine learning methods in order to forecast the short-term output of the power plant and compare the accuracy of the prediction.