Creator: Klawonn, Frank - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Creator Klawonn, Frank

Creator:: Montvida, Olga and Klawonn, Frank
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: classifier, performance evaluation, misclassification costs, cost curves, ROC curves, and AUC
Language:: English
Description:: Performance evaluation of classifiers is a crucial step for selecting the best classifier or the best set of parameters for a classifier. Receiver Operating Characteristic (ROC) curves and Area Under the ROC Curve (AUC) are widely used to analyse performance of a classifier. However, the approach does not take into account that misclassification for different classes might have more or less serious consequences. On the other hand, it is often difficult to specify exactly the consequences or costs of misclassifications. This paper is devoted to Relative Cost Curves (RCC) - a graphical technique for visualising the performance of binary classifiers over the full range of possible relative misclassification costs. This curve provides helpful information to choose the best set of classifiers or to estimate misclassification costs if those are not known precisely. In this paper, the concept of Area Above the RCC (AAC) is introduced, a scalar measure of classifier performance under unequal misclassification costs problem. We also extend RCC to multicategory problems when misclassification costs depend only on the true class.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

Creator:: Klawonn, Frank
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: MAD, standard deviation, small samples, and significance test
Language:: English
Description:: Modern biology is interested in better understanding mechanisms within cells. For this purpose, products of cells like metabolites, peptides, proteins or mRNA are measured and compared under different conditions, for instance healthy cells vs. infected cells. Such experiments usually yield regulation or expression values - the abundance or absence of a cell product in one condition compared to another one - for a large number of cell products, but with only a few replicates. In order to distinguish random fluctuations and noise from true regulations, suitable significance tests are needed. Here we propose a simple model which is based on the assumption that the regulation factors follow normal distributions with different expected values, but with the same standard deviation. Before suitable significance tests can be derived from this model, a reliable estimation for the standard deviation in the context of many small samples is needed. We therefore also include a discussion on the properties of the sample MAD ({\bf M}edian {\bf A}bsolute {\bf D}eviation from the median) and the sample standard deviation for small samples sizes.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

Search