Information retrieval systems depend on Boolean queries. Proposed evolution of Boolean queries should increase the performance of the information retrieval system. Information retrieval systems quality are measured in terms of two different criteria, precision and recall. Evolutionary techniques are widely applied for optimization tasks in different areas including the area of information retrieval systems. In information retrieval applications both criteria have been combined in a single scalar fitness function by means of a weighting scheme 'harmonic mean'. Usage of genetic algorithms in the Information retrieval, especially in optimizing a Boolean query, is presented in this paper. Influence of both criteria, precision and recall, on quality improvement are discussed as well.
The focus of this paper is the application of the genetic programming
framework in the problem of knowledge discovery in databases, more precisely in the task of classification. Genetic programming possesses certain advantages that make it suitable for application in data mining, such as robustness of the algorithm or its convenient structure for rule generation to name a few. This study concentrates on one type of parallel genetic algorithms - cellular (diffusion) model. Emphasis is placed on the improvement of efficiency and scalability of the data mining algorithm, which could be achieved by integrating the algorithm with databases and employing a cellular framework. The cellular model of genetic programming that exploits SQL queries is implemented and applied to the classification task. The results achieve are presented and compared with other machine learning algorithms.
Constant evaluation is a key problem for symbolic regression, one solved by means of genetic programming. For constant evaluation, other evolutionary methods are often used. Typical examples are some variants of genetic programming or evolutionary systems, all of which are stochastic. The article compares these methods with a deterministic approach using exponentiated gradient descent. All the methods were tested on single sample function to maintain the same conditions and results are presented in graphs. Finally, three different tasks (ten times each) are compared to check the reliability of the methods tested in the article.
An information retrieval (IR) system (IRs) (search engine) is said to be efficient, to the degree that always evaluates each object in the information base (database, document base, web,...) like the expert. The ability of IRs's is to retrieve mostly all relevant objects (measured by the recall), and only the (most) relevant objects (measured by the precision) from the collection queried.
Recall and precision measures provide the classical measure of the retrieval efficiency. They measure the degree to which the query answer (the set of documents that retrieved by IRs as response to the user query). Where, the query answer is the set of relevant documents in the information based queried.
Retrieving most relevant documents to the user query in IRs was one of the most important methods of World Wide Web (WWW) search engines used in the world now. So the searchers aim to use genetic programming (GP) and fuzzy optimization to optimize the user search query in the Boolean IRs model and in the fuzzy IRs model; and to use more Boolean operators (AND, OR, XOR, OF, and NOT) instead of using the standard operators (AND, OR, and NOT), and to use weights for terms and for Boolean operators. Weights are used to give the users more relaxation in defining how much the importance of the terms and of the Boolean operators is. The terms and the Boolean operators' weights are used in fuzzy IRs model. In addition, it investigates extensions of the classical measurement of effectiveness in IRs, precision; recall and harmonic mean.
The researchers use harmonic mean measure as an objective function which uses both measures precision and recall at once for evaluating the results of the two IRs models to grow up the precision-recall relationship curve.